Skip to content

Commit

Permalink
tests/partition_balancer: more robust wait_until_status
Browse files Browse the repository at this point in the history
Previously, when the controller leader node was suspended during the
test all status requests would fail with the timed-out error.
This was true for all nodes, not just the suspended one (because we
proxy the status request to the controller leader), so internal retries
in the admin API wrapper didn't help. We increase the timeout and add
504 to retriable status codes so that internal retries can handle this
situation.

(cherry picked from commit dc83a7b)
  • Loading branch information
ztlpn authored and vbotbuildovich committed Aug 15, 2022
1 parent 68b54be commit 7ae38b0
Showing 1 changed file with 4 additions and 2 deletions.
6 changes: 4 additions & 2 deletions tests/rptest/tests/partition_balancer_test.py
Original file line number Diff line number Diff line change
Expand Up @@ -77,13 +77,15 @@ def node2partition_count(self):
return ret

def wait_until_status(self, predicate, timeout_sec=120):
admin = Admin(self.redpanda)
# We may get a 504 if we proxy a status request to a suspended node.
# It is okay to retry (the controller leader will get re-elected in the meantime).
admin = Admin(self.redpanda, retry_codes=[503, 504])
start = time.time()

def check():
req_start = time.time()

status = admin.get_partition_balancer_status(timeout=1)
status = admin.get_partition_balancer_status(timeout=10)
self.logger.info(f"partition balancer status: {status}")

if "seconds_since_last_tick" not in status:
Expand Down

0 comments on commit 7ae38b0

Please sign in to comment.