Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[v23.1.x] storage: make compaction_backlog more efficient for large segment counts #11718

Merged

Conversation

vbotbuildovich
Copy link
Collaborator

Backport of PR #11681
Fixes #11717,

jcsp added 3 commits June 27, 2023 12:28
It was unnecessary to accumulate all segments in the topic,
since each term is calculated separately.  Do the per-term
calculation inline.

This still has a memory footprint that scales linearly with
the number of segments in a term, but not the total number
of segments in a partition.

(cherry picked from commit 73619eb)
This was an O(N^2) loop, which is gratuitously expensive for
something that is only used to adjust I/O priorities.

Clamp the number of segments that each segment will be
compared to.  This will have identical results in the typical
case where the number of segments in a compacted topic term
is low (because compaction itself keeps the number of segments
low), while preventing O(N^2) runtime in the worst case
of a user creating many segments in a single term.

A realistic scenario would be for a user to write many thousands of
segments to a topic while importing data, and then enable
compaction at the end.  Previously, this would have resulted in
~64M pow() calls for a partition with a terabyte of data in 128MiB
segments.

(cherry picked from commit 4f0197b)
The previous commit limited the computational complexity
of the per-term calculation.

This commit limits the max allocation size done while building
the list of segments in a term.

As with the previous commit, this makes no difference in typical
compaction use cases, but protects us from pathological cases.

(cherry picked from commit 730c8b4)
@vbotbuildovich vbotbuildovich added this to the v23.1.x-next milestone Jun 27, 2023
@vbotbuildovich vbotbuildovich added the kind/backport PRs targeting a stable branch label Jun 27, 2023
Copy link
Member

@BenPope BenPope left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@jcsp jcsp marked this pull request as ready for review July 7, 2023 13:06
@jcsp jcsp merged commit 82256d9 into redpanda-data:v23.1.x Jul 7, 2023
21 checks passed
@BenPope BenPope modified the milestones: v23.1.x-next, v23.1.14 Aug 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/redpanda kind/backport PRs targeting a stable branch
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants