-
Notifications
You must be signed in to change notification settings - Fork 579
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
test: account for metadata dissemination delay #6322
Conversation
after shutting down a node we asserted that that node was not a leader for any partition. however, it may take some time for elections and metadata dissemination to propogate changes. we move the check below a sleep to give the system time for all the changes to be propogated. if after 1 minute the changes haven't been propogated then it would be a worthy thing to investigate. We can see some indirect evidence of this cause for the assertion failure where back-to-back metadata queries hit two different originating brokers where the one query was used to indicate that we'd reached a desired state in the test, while the other resulted in the assertion failure from what is apparent yet-to-be-updated metadata. [DEBUG - 2022-08-19 19:04:18,966 - kafka_cat - _cmd_raw - lineno:54]: {"originating_broker":{"id":3,"name":"docker-rp-2:9092/3"},"query":{"topic":"*"},"controllerid":1,"brokers":[{"id":3,"name":"docker-rp-2:9092"},{"id":2,"name":"docker-rp-15:9092"},{"id":1,"name":"docker-rp-3:9092"}],"topics":[{"topic":"topic-yhvslmrfme","partitions":[{"partition":0,"leader":2,"replicas":[{"id" [DEBUG - 2022-08-19 19:04:18,975 - kafka_cat - _cmd_raw - lineno:54]: {"originating_broker":{"id":2,"name":"docker-rp-15:9092/2"},"query":{"topic":"*"},"controllerid":1,"brokers":[{"id":3,"name":"docker-rp-2:9092"},{"id":2,"name":"docker-rp-15:9092"},{"id":1,"name":"docker-rp-3:9092"}],"topics":[{"topic":"topic-yhvslmrfme","partitions":[{"partition":0,"leader":2,"replicas":[{"id" Signed-off-by: Noah Watkins <noahwatkins@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. But I thought we should put all kind of check like this inside wait_until
I've rebased it to #6003 to test the fix, will get data in several hours |
tbh, i think we could have just removed the two lines i moved below the sleep, they seem to only be a defensive assertion. i'd have added it to a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good to go, the tests pass
Cover letter
after shutting down a node we asserted that that node was not a leader
for any partition. however, it may take some time for elections and
metadata dissemination to propogate changes. we move the check below a
sleep to give the system time for all the changes to be propogated. if
after 1 minute the changes haven't been propogated then it would be a
worthy thing to investigate.
We can see some indirect evidence of this cause for the assertion
failure where back-to-back metadata queries hit two different
originating brokers where the one query was used to indicate that we'd
reached a desired state in the test, while the other resulted in the
assertion failure from what is apparent yet-to-be-updated metadata.
Fixes: #6120
Backport Required
Release notes