Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Timeout waiting for partition balancer "ready" status in PartitionBalancerTest.test_full_nodes #5980

Closed
VladLazar opened this issue Aug 11, 2022 · 3 comments · Fixed by #6342

Comments

@VladLazar
Copy link
Contributor

VladLazar commented Aug 11, 2022

Looks like the partition balancer did not stabilise for some reason. No crash or backtrace,
so probably not a blocking issue. Still needs looking into though.

File "/root/tests/rptest/tests/partition_balancer_test.py", line 467, in test_full_nodes
    self.wait_until_status(create_waiter("ready"))
File "/root/tests/rptest/util.py", line 74, in wait_until
    raise TimeoutError(
ducktape.errors.TimeoutError: failed to wait until status condition

https://buildkite.com/redpanda/redpanda/builds/13988#01828d2e-b0ad-431b-8918-d3f07ec35a8e

@ztlpn
Copy link
Contributor

ztlpn commented Aug 16, 2022

@r-vasquez
Copy link
Contributor

@piyushredpanda
Copy link
Contributor

Requesting @ztlpn to take a look.

ztlpn added a commit to ztlpn/redpanda that referenced this issue Sep 8, 2022
Previously we did not heal network failures when exiting tests that use
this base class (namely, AvailabilityTests). As possible failure types
included netem failures, that meant that all subsequent tests in a test
run used a node with crippled network, leading to flakiness.

Fixes redpanda-data#5980
vbotbuildovich pushed a commit to vbotbuildovich/redpanda that referenced this issue Sep 9, 2022
Previously we did not heal network failures when exiting tests that use
this base class (namely, AvailabilityTests). As possible failure types
included netem failures, that meant that all subsequent tests in a test
run used a node with crippled network, leading to flakiness.

Fixes redpanda-data#5980

(cherry picked from commit c333fc7)
vbotbuildovich pushed a commit to vbotbuildovich/redpanda that referenced this issue Sep 9, 2022
Previously we did not heal network failures when exiting tests that use
this base class (namely, AvailabilityTests). As possible failure types
included netem failures, that meant that all subsequent tests in a test
run used a node with crippled network, leading to flakiness.

Fixes redpanda-data#5980

(cherry picked from commit c333fc7)
ballard26 pushed a commit to ballard26/redpanda that referenced this issue Sep 27, 2022
Previously we did not heal network failures when exiting tests that use
this base class (namely, AvailabilityTests). As possible failure types
included netem failures, that meant that all subsequent tests in a test
run used a node with crippled network, leading to flakiness.

Fixes redpanda-data#5980
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants