-
Notifications
You must be signed in to change notification settings - Fork 574
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
High CPU when consuming from 1 partition out of many in topic #4410
Comments
I like your idea, but random thoughts: @travisdowns i bet this is common place
I wonder if we have a core-local cache of ntp -> uint64 (xxhash64) which checks on a (type of compressed/roaring) bitmap or smth so querying even larger topics is still fast. maybe the right model instead of elevating every query to the .... i think your fix makes sense btw, but i also think we may have this as a structural problem in multiple places. cc: @mmaslankaprv |
metadata_cache.contains() is a slow call with many partitions becase it scans the entire parition list looking for the specified one. On the fetch planning path we can avoid this cost simply by moving the check inside a shard_for check we do immediately after the metadata check: in a stable system the shard_for check will fail any time the metadata call will fail. Issue redpanda-data#4410
metadata_cache.contains() is a slow call with many partitions becase it scans the entire parition list looking for the specified one. On the fetch planning path we can avoid this cost simply by moving the check inside a shard_for check we do immediately after the metadata check: in a stable system the shard_for check will fail any time the metadata call will fail. Issue redpanda-data#4410
metadata_cache.contains() is a slow call with many partitions becase it scans the entire parition list looking for the specified one. On the fetch planning path we can avoid this cost simply by moving the check inside a shard_for check we do immediately after the metadata check: in a stable system the shard_for check will fail any time the metadata call will fail. Issue redpanda-data#4410
@senior7515 - you read our minds, haha. I discussed this earlier with @mmaslankaprv and we agree it is a structural problem (also applying to produce and other paths) and a better fix is just to have an index on these lists for so when we need to query them we don't need to do an exhaustive iteration. Initially I think it can just be something very simple like a hash map, and we can optimize from there to reach higher partition counts. The suggestion I had at the end was just a very quick patch to unblock certain use cases where 20k partitions are being used: if we go with that PR it won't close this issue. BTW, the time spent in contains is > 20x higher than anywhere else: |
metadata_cache.contains() is a slow call with many partitions becase it scans the entire parition list looking for the specified one. On the fetch planning path we can avoid this cost simply by moving the check inside a shard_for check we do immediately after the metadata check: in a stable system the shard_for check will fail any time the metadata call will fail. Issue redpanda-data#4410
nice :)
yeah, figured some simple index makes sense like a
or what structure were you all thinking.
🤯 - awesome find. interesting to see the next bottleneck on this. |
@senior7515 well I had my very weak "hide it behind another check" fix ready to go but @mmaslankaprv has a draft fix for it which is much better and fixes this same problem all over the place along the lines of what you suggested:
Interestingly, we find that |
Version & Environment
Redpanda version: dev fb0529d
What went wrong?
When consuming from 1 (or a few) partitions in a topic with 20k partition, high CPU use is observed in simple_fetch_planner::create_plan.
What should have happened instead?
Less CPU use, or at least CPU use should not have a strong correlation to the number of partitions in the topic (it is reasonable to expect that will vary depending on the number of partitions actually mentioned in the fetch, however).
How to reproduce the issue?
perf record
.Additional information
The CPU time is largely being spent in this check that the partition/topic exists at all. It is slow because
contains()
must iterate over the partitions in the top (20k, here) in order to check if the specified partition exists. For a partition at the end of the list means examining the entire list (not that this also means unbalanced behavior: nodes handling partitions near the start of the list will suffer significantly less CPU use here than those handling partitions higher in the list).As a quick fix, we could move this check down inside the
shard_for
check immediately above, since if the partition does not exist it also will not appear in the shard mapping (modulo races, which this code already doesn't avoid).The text was updated successfully, but these errors were encountered: