Skip to content

Commit

Permalink
storage: limit complexity of compaction backlog calculation
Browse files Browse the repository at this point in the history
This was an O(N^2) loop, which is gratuitously expensive for
something that is only used to adjust I/O priorities.

Clamp the number of segments that each segment will be
compared to.  This will have identical results in the typical
case where the number of segments in a compacted topic term
is low (because compaction itself keeps the number of segments
low), while preventing O(N^2) runtime in the worst case
of a user creating many segments in a single term.

A realistic scenario would be for a user to write many thousands of
segments to a topic while importing data, and then enable
compaction at the end.  Previously, this would have resulted in
~64M pow() calls for a partition with a terabyte of data in 128MiB
segments.
  • Loading branch information
jcsp committed Jun 26, 2023
1 parent 73619eb commit 4f0197b
Showing 1 changed file with 6 additions and 1 deletion.
7 changes: 6 additions & 1 deletion src/v/storage/disk_log_impl.cc
Original file line number Diff line number Diff line change
Expand Up @@ -1688,6 +1688,11 @@ int64_t compaction_backlog_term(
std::vector<ss::lw_shared_ptr<segment>> segs, double cf) {
int64_t backlog = 0;

// Only compare each segment to a limited number of other segments, to
// avoid the loop below blowing up in runtime when there are many segments
// in the same term.
static constexpr size_t limit_lookahead = 8;

auto segment_count = segs.size();
if (segment_count <= 1) {
return 0;
Expand All @@ -1697,7 +1702,7 @@ int64_t compaction_backlog_term(
auto& s = segs[n - 1];
auto sz = s->finished_self_compaction() ? s->size_bytes()
: s->size_bytes() * cf;
for (size_t k = 0; k <= segment_count - n; ++k) {
for (size_t k = 0; k <= segment_count - n && k < limit_lookahead; ++k) {
if (k == segment_count - 1) {
continue;
}
Expand Down

0 comments on commit 4f0197b

Please sign in to comment.