Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[v22.2.x] More robust waiting for the quiescent state in partition balancer tests #6032

Merged

Conversation

vbotbuildovich
Copy link
Collaborator

Backport from pull request: #6007.

Because unavailability timer resets every time the controller leader
changes, robustly waiting for the timer to elapse is hard. Instead we
simply wait until the unavailable node appears in the "violations"
status field.

(cherry picked from commit ce4a809)
Previously, when the controller leader node was suspended during the
test all status requests would fail with the timed-out error.
This was true for all nodes, not just the suspended one (because we
proxy the status request to the controller leader), so internal retries
in the admin API wrapper didn't help. We increase the timeout and add
504 to retriable status codes so that internal retries can handle this
situation.

(cherry picked from commit dc83a7b)
@vbotbuildovich vbotbuildovich added this to the v22.2.x-next milestone Aug 15, 2022
@vbotbuildovich vbotbuildovich added the kind/backport PRs targeting a stable branch label Aug 15, 2022
@ztlpn ztlpn marked this pull request as ready for review August 15, 2022 12:56
Copy link
Contributor

@ztlpn ztlpn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clean backport, waiting for ci before merging

@ztlpn ztlpn merged commit aab41ac into redpanda-data:v22.2.x Aug 15, 2022
@ztlpn
Copy link
Contributor

ztlpn commented Aug 15, 2022

Test failure is #5575 (well, the failure was in FetchTest.fetch_long_poll_test and the exception was ducktape.errors.TimeoutError: Cluster membership did not stabilize but the reason was that the seed node crashed)

@mmedenjak mmedenjak added kind/enhance New feature or request area/tests labels Aug 16, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/redpanda area/tests kind/backport PRs targeting a stable branch kind/enhance New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants