Skip to content

Commit

Permalink
DROP ME! test: account for metadata dissemination delay
Browse files Browse the repository at this point in the history
after shutting down a node we asserted that that node was not a leader
for any partition. however, it may take some time for elections and
metadata dissemination to propogate changes. we move the check below a
sleep to give the system time for all the changes to be propogated. if
after 1 minute the changes haven't been propogated then it would be a
worthy thing to investigate.

We can see some indirect evidence of this cause for the assertion
failure where back-to-back metadata queries hit two different
originating brokers where the one query was used to indicate that we'd
reached a desired state in the test, while the other resulted in the
assertion failure from what is apparent yet-to-be-updated metadata.

[DEBUG - 2022-08-19 19:04:18,966 - kafka_cat - _cmd_raw - lineno:54]:
{"originating_broker":{"id":3,"name":"docker-rp-2:9092/3"},"query":{"topic":"*"},"controllerid":1,"brokers":[{"id":3,"name":"docker-rp-2:9092"},{"id":2,"name":"docker-rp-15:9092"},{"id":1,"name":"docker-rp-3:9092"}],"topics":[{"topic":"topic-yhvslmrfme","partitions":[{"partition":0,"leader":2,"replicas":[{"id"

[DEBUG - 2022-08-19 19:04:18,975 - kafka_cat - _cmd_raw - lineno:54]:
{"originating_broker":{"id":2,"name":"docker-rp-15:9092/2"},"query":{"topic":"*"},"controllerid":1,"brokers":[{"id":3,"name":"docker-rp-2:9092"},{"id":2,"name":"docker-rp-15:9092"},{"id":1,"name":"docker-rp-3:9092"}],"topics":[{"topic":"topic-yhvslmrfme","partitions":[{"partition":0,"leader":2,"replicas":[{"id"

Signed-off-by: Noah Watkins <noahwatkins@gmail.com>
  • Loading branch information
dotnwat authored and rystsov committed Sep 7, 2022
1 parent 9930cf7 commit a7e822d
Showing 1 changed file with 5 additions and 3 deletions.
8 changes: 5 additions & 3 deletions tests/rptest/tests/leadership_transfer_test.py
Original file line number Diff line number Diff line change
Expand Up @@ -156,13 +156,15 @@ def all_partitions_present(num_nodes, per_node=None):
backoff_sec=2,
err_msg="Leadership did not move to running nodes")

leaders = self._get_leaders_by_node()
assert self.redpanda.idx(node) not in leaders

# sleep for a bit to avoid triggering any of the sticky leaderhsip
# optimizations
time.sleep(60)

# sanity check -- the node we stopped shouldn't be a leader for any
# partition after the sleep above as releection should have taken place
leaders = self._get_leaders_by_node()
assert self.redpanda.idx(node) not in leaders

# restart the stopped node and wait for 15 (out of 21) leaders to be
# rebalanced on to the node. the error minimization done in the leader
# balancer is a little fuzzy so it problematic to assert an exact target
Expand Down

0 comments on commit a7e822d

Please sign in to comment.