Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[v23.1.x] storage: make compaction_backlog more efficient for large segment counts #11718

Merged

Commits on Jun 27, 2023

  1. storage: reduce memory footprint of compaction_backlog

    It was unnecessary to accumulate all segments in the topic,
    since each term is calculated separately.  Do the per-term
    calculation inline.
    
    This still has a memory footprint that scales linearly with
    the number of segments in a term, but not the total number
    of segments in a partition.
    
    (cherry picked from commit 73619eb)
    jcsp authored and vbotbuildovich committed Jun 27, 2023
    Configuration menu
    Copy the full SHA
    ee5c290 View commit details
    Browse the repository at this point in the history
  2. storage: limit complexity of compaction backlog calculation

    This was an O(N^2) loop, which is gratuitously expensive for
    something that is only used to adjust I/O priorities.
    
    Clamp the number of segments that each segment will be
    compared to.  This will have identical results in the typical
    case where the number of segments in a compacted topic term
    is low (because compaction itself keeps the number of segments
    low), while preventing O(N^2) runtime in the worst case
    of a user creating many segments in a single term.
    
    A realistic scenario would be for a user to write many thousands of
    segments to a topic while importing data, and then enable
    compaction at the end.  Previously, this would have resulted in
    ~64M pow() calls for a partition with a terabyte of data in 128MiB
    segments.
    
    (cherry picked from commit 4f0197b)
    jcsp authored and vbotbuildovich committed Jun 27, 2023
    Configuration menu
    Copy the full SHA
    a21e2c2 View commit details
    Browse the repository at this point in the history
  3. storage: limit segments per term in compaction backlog calculation

    The previous commit limited the computational complexity
    of the per-term calculation.
    
    This commit limits the max allocation size done while building
    the list of segments in a term.
    
    As with the previous commit, this makes no difference in typical
    compaction use cases, but protects us from pathological cases.
    
    (cherry picked from commit 730c8b4)
    jcsp authored and vbotbuildovich committed Jun 27, 2023
    Configuration menu
    Copy the full SHA
    f3933b2 View commit details
    Browse the repository at this point in the history