-
Notifications
You must be signed in to change notification settings - Fork 577
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Failure in PartitionBalancerTest.test_unavailable_nodes #5471
Comments
Reason of Failure: To fix it we can add retries to status request |
@ZeDRoman : Thanks, but why the crash in RP? |
Last crash in log happens because we don't start node if test is failed in the middle of evaluation |
Thanks, @ZeDRoman : so the "crash" here is what we did to bring Redpanda down to test rebalancing. Is that correct? |
Yes. We crash nodes on our own to trigger rebalancing
As I know, now every test will show that log if redpanda node is down at the end of the test evaluation. |
Let's chat with the team on some ideas to make this better. For this ticket: so it is a test-issue that you plan to have a PR to retry status, yes? |
yes I will create pr soon |
As per the discussion with @mmaslankaprv this morning, We need to check for this situation (leader_id == self, but is_leader == false) in |
Seen in CI failures today here. |
There was a problem with completing rpc requests around this time. The problem is rather inexplicable because both nodes were online and rpc requests timed out only in one direction. So my bet would be on an rpc bug conjectured in #5608 (comment) |
BTW the test will be "fixed" once #5916 merges (admin server will return 503 which is a retryable error code). |
Admin server will now return 503 and deeper rpc problems are tracked in #6005. Going to close. |
Build: https://buildkite.com/redpanda/redpanda/builds/12554#0181fda7-1a9d-47cd-86fb-775767011acb
The text was updated successfully, but these errors were encountered: