Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Uniform message distribution among partitions in tests #4920

Conversation

abhijat
Copy link
Contributor

@abhijat abhijat commented May 25, 2022

Cover letter

Provides option in KafkaCliTools to enforce round robin distribution of records among partitions. Kafka producers in new versions by default use sticky partitioning where records are sent to partition until batch size or linger.ms is reached, this may cause uneven distribution of records among partitions as seen in #4886 logs:

 $ awk '/handling produce request/ {tot+=$19; lc+=1} END{print tot " in " lc " requests";}' docker-rp-17/redpanda.log 
9364926 in 121 requests

 $ awk '/handling produce request/ {tot+=$19; lc+=1} END{print tot " in " lc " requests";}' docker-rp-14/redpanda.log 
981500 in 121 requests

the data sent to partition 0 differs from data to partition 1 by a factor of 9. It results in SI never being triggered for partition 0 because the segment is never closed.

In tests where we want a roughly equal amount of data sent to partitions (eg to trigger SI uploads) we can use the roundrobin partitioner which ensures that each partition gets a roughly equal number of messages, at the cost of performance of producer.

when using kafka-producer-perf-test.sh via the KafkaCliTools utility,
provide an option to force round robin distribution of records. In
modern versions of kafka the producer by default uses sticky assignment
of records where the producer sticks to a partition until batch size is
reached or linger.ms is up. The next partition is sent messages when the
batch is full. This provides better performance in real world uses but
in some test cases we rely on messages being distributed equally among
partitions for testing and assertions, eg when triggering shadow
indexing uploads.

The option round_robin_partition enables to use the older partitioner,
which is potentially slower but ensures that each partition gets a
roughly equal amount of messages.
@abhijat
Copy link
Contributor Author

abhijat commented May 25, 2022

errors are related to changes in this PR

@jcsp
Copy link
Contributor

jcsp commented May 25, 2022

Maybe it's better to adjust the test to use RpkProducer instead -- that uses random keys, to achieve a good distribution across partitions while still using the default partitioning scheme.

@abhijat
Copy link
Contributor Author

abhijat commented May 25, 2022

Maybe it's better to adjust the test to use RpkProducer instead -- that uses random keys, to achieve a good distribution across partitions while still using the default partitioning scheme.

Yes, unfortunately the round robin partitioner seems to behave the exact opposite of expected and fails to distribute messages at all when there are two partitions involved, I found some details on this here but effectively, the sticky partitioner is much better than the round robin partitioner at distributing messages.

When I used the round robin partitioner with two partitions, all records go to a single partition every single time.

I will close this PR and change to RpkProducer as you suggested.

@abhijat abhijat closed this May 25, 2022
@abhijat
Copy link
Contributor Author

abhijat commented May 25, 2022

closed this PR as the round robin partitioner implementation does not work as expected, will replace with rpkproducer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants