-
Notifications
You must be signed in to change notification settings - Fork 579
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Failure in RaftAvailabilityTest.test_follower_isolation
#3098
Comments
Two things are strange here:
(grepping out of https://buildkite.com/vectorized/redpanda/builds/4811#57c8a844-486a-4c17-b4a4-13c95a3f89db logs) |
Stranger and stranger...
Somehow health_monitor_backend is still trying to load health status from the old controller leader at Aside from that ghost in the machine, I think there's also a more general issue with the way we handle timeouts loading health info from the leader, where that timeout should be reset whenever the leader changes. |
We should improve the health monitor updates to not take as long, but in the meantime let's make the test more tolerant. Related: redpanda-data#3098 Signed-off-by: John Spray <jcs@vectorized.io>
We should improve the health monitor updates to not take as long, but in the meantime let's make the test more tolerant. Related: redpanda-data#3098 Signed-off-by: John Spray <jcs@vectorized.io>
(revert timeout change in #3105 when underlying issue is fixed) |
There are two problems here:
Going to fix that by aborting health requests pending to the old leader when leadership has changed,and issuing request to the new leader immediately. |
Seen twice:
https://buildkite.com/vectorized/redpanda/builds/4730#4a3e6431-118d-453a-9075-109039c2f6b9
https://buildkite.com/vectorized/redpanda/builds/4811#57c8a844-486a-4c17-b4a4-13c95a3f89db
This is the test trying to verify availability of the cluster after isolation, and timing out while doing so.
The delivery timeout is 4.5s, which is tight enough that this could just be a slow election when the leader is failed, haven't gone over it forensically.
The text was updated successfully, but these errors were encountered: