-
Notifications
You must be signed in to change notification settings - Fork 577
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Shutdown hang in ConsumerOffsetsMigrationTest
test_migrating_consume_offsets
failures=True.cpus=3
#5324
Comments
I did some initial log analysis, seems like an issue in clean up of Sequence of events..
Its just one ntp
It took a lot longer for consensus to
Think the timestamps suggest that the issue with delayed shutdown could possibly be in one of the following two (event manager shutdown seems trivial to me)...
The following log and the timestamp suggest that, things were probably stuck in flush, some sort of interplay between append_entries_buffer and consensus shutdown, not obvious to me though.
Finally it realizes it is not the leader...and steps down.. something seems to be messed up, need to take a closer look.
|
Seen again here: FAIL test: ConsumerOffsetsMigrationTest.test_migrating_consume_offsets.failures=False.cpus=1 (1/24 runs) Stack trace
|
Another instance of this .. https://ci-artifacts.dev.vectorized.cloud/redpanda/0182cbda-ffdd-4339-90a7-ef336ed22f9c/vbuild/ducktape/results/2022-08-23--001/ConsumerOffsetsMigrationTest/test_migrating_consume_offsets/failures=False.cpus=1/47/ A potential fix for this is ready.. can someone review and merge please? Thanks. |
Looks like on v22.2.x as well: https://buildkite.com/redpanda/redpanda/builds/16044#01839d62-8fd9-420d-ad3c-2e3f7edb4af9 |
There was a previous issue #4670 about a "failed to stop" in this test that was related to the failure injection, but the logs in this failure look like an authentic shutdown hang in redpanda -- we can see it getting the signal to stop, but remaining alive for some time. There is compaction going on at the time it is signaled to stop.
The text was updated successfully, but these errors were encountered: