-
Notifications
You must be signed in to change notification settings - Fork 577
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
tests: update ManyPartitionsTest #5816
Conversation
This was outputting number, should have been outputting message.
The go dependencies are generally the fastest to build and should not get held up behind other things: - Move OMB (Java build) further up - split `kaf` install from unrelated non-go stuff. - move client-swarm build before go test utils
So that it shows the node name properly
This is for the benefit of scale tests, which would like to reduce their per-partition outputs to reflect how a user would configure the system, and to reduce any overhead from emitting millions of lines.
This wraps the new `kgo-repeater` traffic generator for scalable load generation.
It is helpful to print the error right at the point of failure, rather than after the (potentially long running) backtrace decode & log search jobs. It'll get printed again later as well, but this way I can search from the start of the file for the exception name, and jump straight to the timestamp of the failure.
This is a nasty failure mode where we deploy fresh packages and accidentally wip out our /var/lib/redpanda symlink, resulting in running tests on very slow drives.
This is an efficiency/quality of life improvement for working with tests that start larger numbers of nodes. Leave the default as serial startup, because it makes logs easier to read.
This is useful if a test is running longer than you expected and you'd like to know how far through it is without doing your own calculation of message counts.
When using this function to query leadership for partitions, it is not necessary to exclude partitions just because they failed to get some metadata from the leader (e.g. NOT_LEADER errors for offets during transient leaderhsip change). Add a `tolerant` flag that permits returning partially populated RpkPartition results that just show the leader of a partition.
The default mode is rather expensive for high partition counts, and complicates handling systems in transient states when one or more of the partitions is likely to be underoing leadership movement and therefore have NOT_LEADER errors etc in the default per-partition output. When all we want to know is the group's state, this lets us get that.
This enables: - Running on different instance types without hacking the test - Running on local docker while developing the test itself.
I think this is a bug with the workload generator (or, unlikely perhaps a problem with franz-go). It is usually only a few consumers that disappear from the group, so it doesn't hurt the validity of the overall scale test, and we can hunt it down separately.
I made it through it! Looks like a few nice scale test is shaping up. I had a few miscellaneous questions and suggestions, nothing major. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM to merge in this state. Apart from some idle questions which don't need to be addressed in the code, the remaining things were all minor nits or style fixes that can be considered optional.
Thanks for making it through a lengthy set of commits. All the comments I've silently marked resolved are addressed in #5970 |
CI failures were a transient issue apparently https://redpandadata.slack.com/archives/C02LZGSS66M/p1659967842254759 |
This is followup from PR redpanda-data#5816
This is followup from PR redpanda-data#5816
This is followup from PR redpanda-data#5816
/backport v22.2.x |
This is followup from PR redpanda-data#5816 (cherry picked from commit 1689a72)
Cover letter
Updates to ManyPartitionsTest:
Other improvements:
Fixes #5389
Backport Required
UX changes
None
Release notes