-
Notifications
You must be signed in to change notification settings - Fork 579
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CI Failure (transactions stuck in not_leader_for_partition loop) in CompactionE2EIdempotencyTest.test_basic_compaction
#8486
Comments
Setting |
there are several failures in the linked build, here we look at Let's check the compacted verifier log (rw.log) to see which operation is failing. We see that the test has three producers:
but the the end of the log only two are active
By using
After looking at test_log.debug we learn that the topic's name is
By grepping By searching
and then we notice that the issue is happening over and over again (not_leader_for_partition loop)
|
It looks like we don't clean the cache on re-election so we need to:
|
prototype of the fix - #8624 |
once a new leader is elected we need to remove info about unresolved inflight replication because it's guranteed that all unresolved repli- cation of previous leader is resolved and the new leader sees all its written records the residual info was causing the not_leader_for_partition loop fix redpanda-data#8486
once a new leader is elected we need to remove info about unresolved inflight replication because it's guranteed that all unresolved repli- cation of previous leader is resolved and the new leader sees all its written records the residual info was causing the not_leader_for_partition loop fix redpanda-data#8486
once a new leader is elected we need to remove info about unresolved inflight replication because it's guranteed that all unresolved repli- cation of previous leader is resolved and the new leader sees all its written records the residual info was causing the not_leader_for_partition loop fix redpanda-data#8486
/backport v22.3.x |
once a new leader is elected we need to remove info about unresolved inflight replication because it's guranteed that all unresolved repli- cation of previous leader is resolved and the new leader sees all its written records the residual info was causing the not_leader_for_partition loop fix redpanda-data#8486 (cherry picked from commit 95e809f)
once a new leader is elected we need to remove info about unresolved inflight replication because it's guranteed that all unresolved repli- cation of previous leader is resolved and the new leader sees all its written records the residual info was causing the not_leader_for_partition loop fix redpanda-data#8486 (cherry picked from commit 95e809f)
once a new leader is elected we need to remove info about unresolved inflight replication because it's guranteed that all unresolved repli- cation of previous leader is resolved and the new leader sees all its written records the residual info was causing the not_leader_for_partition loop fix redpanda-data#8486 (cherry picked from commit 95e809f)
https://buildkite.com/redpanda/redpanda/builds/22020#0185f9a4-0ade-4701-961d-4d098bc63128
It isn't enough to have the same stacktrace in
report.txt
you need to look at compacted verifier & redpanda logs to confirm the same root cause (see comments)The text was updated successfully, but these errors were encountered: