Uniform message distribution among partitions in tests #4920

abhijat · 2022-05-25T08:29:57Z

Cover letter

Provides option in KafkaCliTools to enforce round robin distribution of records among partitions. Kafka producers in new versions by default use sticky partitioning where records are sent to partition until batch size or linger.ms is reached, this may cause uneven distribution of records among partitions as seen in #4886 logs:

 $ awk '/handling produce request/ {tot+=$19; lc+=1} END{print tot " in " lc " requests";}' docker-rp-17/redpanda.log 
9364926 in 121 requests

 $ awk '/handling produce request/ {tot+=$19; lc+=1} END{print tot " in " lc " requests";}' docker-rp-14/redpanda.log 
981500 in 121 requests

the data sent to partition 0 differs from data to partition 1 by a factor of 9. It results in SI never being triggered for partition 0 because the segment is never closed.

In tests where we want a roughly equal amount of data sent to partitions (eg to trigger SI uploads) we can use the roundrobin partitioner which ensures that each partition gets a roughly equal number of messages, at the cost of performance of producer.

when using kafka-producer-perf-test.sh via the KafkaCliTools utility, provide an option to force round robin distribution of records. In modern versions of kafka the producer by default uses sticky assignment of records where the producer sticks to a partition until batch size is reached or linger.ms is up. The next partition is sent messages when the batch is full. This provides better performance in real world uses but in some test cases we rely on messages being distributed equally among partitions for testing and assertions, eg when triggering shadow indexing uploads. The option round_robin_partition enables to use the older partitioner, which is potentially slower but ensures that each partition gets a roughly equal amount of messages.

abhijat · 2022-05-25T11:28:48Z

errors are related to changes in this PR

jcsp · 2022-05-25T13:25:04Z

Maybe it's better to adjust the test to use RpkProducer instead -- that uses random keys, to achieve a good distribution across partitions while still using the default partitioning scheme.

abhijat · 2022-05-25T14:07:17Z

Maybe it's better to adjust the test to use RpkProducer instead -- that uses random keys, to achieve a good distribution across partitions while still using the default partitioning scheme.

Yes, unfortunately the round robin partitioner seems to behave the exact opposite of expected and fails to distribute messages at all when there are two partitions involved, I found some details on this here but effectively, the sticky partitioner is much better than the round robin partitioner at distributing messages.

When I used the round robin partitioner with two partitions, all records go to a single partition every single time.

I will close this PR and change to RpkProducer as you suggested.

abhijat · 2022-05-25T14:08:35Z

closed this PR as the round robin partitioner implementation does not work as expected, will replace with rpkproducer.

abhijat added 2 commits May 25, 2022 13:34

tests: missionpartition uses roundrobin scheme

a43196f

abhijat requested review from dotnwat and NyaliaLui as code owners May 25, 2022 08:29

abhijat closed this May 25, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uniform message distribution among partitions in tests #4920

Uniform message distribution among partitions in tests #4920

abhijat commented May 25, 2022

abhijat commented May 25, 2022

jcsp commented May 25, 2022

abhijat commented May 25, 2022

abhijat commented May 25, 2022

Uniform message distribution among partitions in tests #4920

Uniform message distribution among partitions in tests #4920

Conversation

abhijat commented May 25, 2022

Cover letter

abhijat commented May 25, 2022

jcsp commented May 25, 2022

abhijat commented May 25, 2022

abhijat commented May 25, 2022