High CPU when consuming from 1 partition out of many in topic #4410

travisdowns · 2022-04-25T21:15:41Z

Version & Environment

Redpanda version: dev fb0529d

What went wrong?

When consuming from 1 (or a few) partitions in a topic with 20k partition, high CPU use is observed in simple_fetch_planner::create_plan.

What should have happened instead?

Less CPU use, or at least CPU use should not have a strong correlation to the number of partitions in the topic (it is reasonable to expect that will vary depending on the number of partitions actually mentioned in the fetch, however).

How to reproduce the issue?

Create a 20k partition topic.
Use a test app to consume slowly from this empty topic.
Profile redpanda with perf record.

Additional information

The CPU time is largely being spent in this check that the partition/topic exists at all. It is slow because contains() must iterate over the partitions in the top (20k, here) in order to check if the specified partition exists. For a partition at the end of the list means examining the entire list (not that this also means unbalanced behavior: nodes handling partitions near the start of the list will suffer significantly less CPU use here than those handling partitions higher in the list).

As a quick fix, we could move this check down inside the shard_for check immediately above, since if the partition does not exist it also will not appear in the shard mapping (modulo races, which this code already doesn't avoid).

The text was updated successfully, but these errors were encountered:

emaxerrno · 2022-04-25T21:35:21Z

I like your idea, but random thoughts:

@travisdowns i bet this is common place

The CPU time is largely being spent in this check that the partition/topic exists at all. It is slow because contains()

I wonder if we have a core-local cache of ntp -> uint64 (xxhash64) which checks on a (type of compressed/roaring) bitmap or smth so querying even larger topics is still fast.

maybe the right model instead of elevating every query to the ntp to say for this ns && this topic, evaluate all of these partitions

.... i think your fix makes sense btw, but i also think we may have this as a structural problem in multiple places. cc: @mmaslankaprv

metadata_cache.contains() is a slow call with many partitions becase it scans the entire parition list looking for the specified one. On the fetch planning path we can avoid this cost simply by moving the check inside a shard_for check we do immediately after the metadata check: in a stable system the shard_for check will fail any time the metadata call will fail. Issue redpanda-data#4410

travisdowns · 2022-04-26T00:16:12Z

@senior7515 - you read our minds, haha. I discussed this earlier with @mmaslankaprv and we agree it is a structural problem (also applying to produce and other paths) and a better fix is just to have an index on these lists for so when we need to query them we don't need to do an exhaustive iteration. Initially I think it can just be something very simple like a hash map, and we can optimize from there to reach higher partition counts.

The suggestion I had at the end was just a very quick patch to unblock certain use cases where 20k partitions are being used: if we go with that PR it won't close this issue.

BTW, the time spent in contains is > 20x higher than anywhere else:

metadata_cache.contains() is a slow call with many partitions becase it scans the entire parition list looking for the specified one. On the fetch planning path we can avoid this cost simply by moving the check inside a shard_for check we do immediately after the metadata check: in a stable system the shard_for check will fail any time the metadata call will fail. Issue redpanda-data#4410

emaxerrno · 2022-04-26T03:10:07Z

@senior7515 - you read our minds, haha. I discussed this earlier with @mmaslankaprv and we agree it is a structural

nice :)

problem (also applying to produce and other paths) and a better fix is just to have an index on these lists for so when we need to query them we don't need to do an exhaustive iteration. Initially I think it can just be something very simple like a hash map, and we can optimize from there to reach higher partition counts.

yeah, figured some simple index makes sense like a

node_hash<uint64_t, std::vector<uint32_t> > sorta thing. for a query of partitions of this namespace::topic that belong to this core`

or what structure were you all thinking.

The suggestion I had at the end was just a very quick patch to unblock certain use cases where 20k partitions are being used: if we go with that PR it won't close this issue.

BTW, the time spent in contains is > 20x higher than anywhere else:

🤯 - awesome find. interesting to see the next bottleneck on this.

travisdowns · 2022-04-26T21:53:35Z

@senior7515 well I had my very weak "hide it behind another check" fix ready to go but @mmaslankaprv has a draft fix for it which is much better and fixes this same problem all over the place along the lines of what you suggested:

#4443

interesting to see the next bottleneck on this.

Interestingly, we find that librdkafka actually has some scalability bottlenecks with many partitions in a similar vein: even when a consumer consumes from only a single partition, there are some operations which iterate over all partitions in the topic which shows up as a big hot spot with 20k partitions.

travisdowns added the kind/bug Something isn't working label Apr 25, 2022

travisdowns self-assigned this Apr 25, 2022

travisdowns mentioned this issue Apr 25, 2022

Move stale metadata check inside shard_for #4413

Closed

mmaslankaprv mentioned this issue Apr 27, 2022

Optimized partition metadata lookup #4443

Merged

travisdowns assigned mmaslankaprv May 6, 2022

mmaslankaprv closed this as completed in #4443 May 9, 2022

mmedenjak added the area/redpanda label Jul 12, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

High CPU when consuming from 1 partition out of many in topic #4410

High CPU when consuming from 1 partition out of many in topic #4410

travisdowns commented Apr 25, 2022

emaxerrno commented Apr 25, 2022 •

edited

Loading

travisdowns commented Apr 26, 2022 •

edited

Loading

emaxerrno commented Apr 26, 2022

travisdowns commented Apr 26, 2022 •

edited

Loading

High CPU when consuming from 1 partition out of many in topic #4410

High CPU when consuming from 1 partition out of many in topic #4410

Comments

travisdowns commented Apr 25, 2022

Version & Environment

What went wrong?

What should have happened instead?

How to reproduce the issue?

Additional information

emaxerrno commented Apr 25, 2022 • edited Loading

travisdowns commented Apr 26, 2022 • edited Loading

emaxerrno commented Apr 26, 2022

travisdowns commented Apr 26, 2022 • edited Loading

emaxerrno commented Apr 25, 2022 •

edited

Loading

travisdowns commented Apr 26, 2022 •

edited

Loading

travisdowns commented Apr 26, 2022 •

edited

Loading