-
Notifications
You must be signed in to change notification settings - Fork 577
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CI Failure in PartitionBalancerTest.test_fuzz_admin_ops #5950
Comments
This one looks very similar:
https://buildkite.com/redpanda/redpanda/builds/13918#01828650-7bd3-4c4e-a58c-0d79399d2070 The stack trace is different, but looks like it might be the same underlying cause:
|
https://buildkite.com/redpanda/redpanda/builds/13959#01828b64-8232-4ed7-86a9-3e7b577fa83d
|
The problem (once again) was that the controller raft group had an election while we were waiting for the unavailability timeout to expire and the election reset the timeout. This is expected and test code should be more robust about it, I'll think about how to fix this. |
Seen an alternative failure mode (TimeoutError after 25 minutes) here: That is on a PR branch where the PR is to change redpanda startup to happen in parallel rather than serially, so it's possible that has something to do with it, but all other tests passed. Given that the usual runtime of the test when it passes is more like 5 minutes, it seems like we have an excessive timeout in here somewhere. ducktape times out internally after 30 minutes running a test, so timeouts need to be well within that |
This can drop out earlier when there is an error, rather than waiting in vain for the execution count to reach a target that it never will. Related: redpanda-data#5950
https://buildkite.com/redpanda/redpanda/builds/14049#01829076-c4f5-4770-8d90-33099a3f8078
https://buildkite.com/redpanda/redpanda/builds/14050#0182907f-ebac-4341-a9fb-e8577b90f1d5
|
This can drop out earlier when there is an error, rather than waiting in vain for the execution count to reach a target that it never will. Related: redpanda-data#5950
https://buildkite.com/redpanda/redpanda/builds/14018#01828dfb-4a16-42dd-8d8c-5e1e67a37de7
|
This can drop out earlier when there is an error, rather than waiting in vain for the execution count to reach a target that it never will. Related: redpanda-data#5950 (cherry picked from commit e0ff6b9)
More robust waiting for the quiescent state in partition balancer tests
fixed by #6007 |
Version & Environment
Redpanda version: dev
https://buildkite.com/redpanda/redpanda/builds/13918#01828650-7bd0-4ba1-af35-0b9b4ce7f959
What went wrong?
CI Failure
What should have happened instead?
CI Success
How to reproduce the issue?
???
Additional information
The text was updated successfully, but these errors were encountered: