-
Notifications
You must be signed in to change notification settings - Fork 579
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Failure in NodesDecommisioningTest.test_decommissioning_working_node
#2388
Comments
Pending redpanda-data#2388 Signed-off-by: John Spray <jcs@vectorized.io>
Pending redpanda-data#2388 Signed-off-by: John Spray <jcs@vectorized.io>
This looks like a genuine redpanda bug. On a passing run of the test, the time between "changing node {} membership state to: draining" and "decommissioning finished, removing node {} from cluster" is only 10 seconds. On this failure, we're never getting the second message, and the timeout is 120 seconds. I do notice that the leadership balancer is doing transfers at the same time as the decommissioning is going on. |
This area was very light on log messages, which makes investigating test failures hard. Related: redpanda-data#2388 Signed-off-by: John Spray <jcs@vectorized.io>
This test is now disabled in |
This is hard to reproduce. Has not failed in several days of nightly test-staging runs. The original failure still looks like an authentic issue. |
Here's a fresh failure from test-staging: |
Created #2478 for what is believed to be the underlying bug. |
Should be fixed by #3125 |
https://buildkite.com/vectorized/vtools/builds/306#3d5fa10a-5652-48fe-8e90-28832108c106
http://ci-artifacts.dev.vectorized.cloud/vtools/3d5fa10a-5652-48fe-8e90-28832108c106/vbuild/ducktape/results/2021-09-22--001/report.html
From the test log, it looks like the partitions were correctly moved away from the node, but the node itself remained in the list of brokers and the test timed out waiting for it to go away.
The text was updated successfully, but these errors were encountered: