Skip to content

Commit

Permalink
tests/partition_balancer: more robust wait_until_status
Browse files Browse the repository at this point in the history
Previously, when the controller leader node was suspended during the
test all status requests would fail with the timed-out error.
This was true for all nodes, not just the suspended one (because we
proxy the status request to the controller leader), so internal retries
in the admin API wrapper didn't help. We increase the timeout and add
504 to retriable status codes so that internal retries can handle this
situation.
  • Loading branch information
ztlpn committed Aug 14, 2022
1 parent a8f56b5 commit dc83a7b
Showing 1 changed file with 4 additions and 2 deletions.
6 changes: 4 additions & 2 deletions tests/rptest/tests/partition_balancer_test.py
Original file line number Diff line number Diff line change
Expand Up @@ -77,13 +77,15 @@ def node2partition_count(self):
return ret

def wait_until_status(self, predicate, timeout_sec=120):
admin = Admin(self.redpanda)
# We may get a 504 if we proxy a status request to a suspended node.
# It is okay to retry (the controller leader will get re-elected in the meantime).
admin = Admin(self.redpanda, retry_codes=[503, 504])
start = time.time()

def check():
req_start = time.time()

status = admin.get_partition_balancer_status(timeout=1)
status = admin.get_partition_balancer_status(timeout=10)
self.logger.info(f"partition balancer status: {status}")

if "seconds_since_last_tick" not in status:
Expand Down

0 comments on commit dc83a7b

Please sign in to comment.