-
Notifications
You must be signed in to change notification settings - Fork 579
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
storage: assertion failure in offset_translator_state.cc
#4466
Comments
One possible way this could happen that I can think of is when shadow indexing data from different topic instances gets mixed together (e.g. when a topic was recreated in a different redpanda instance that reused the same s3 bucket and the revision numbers coincided due to bad luck). |
This was reproduced with 22.1.1 as well. |
@piyushredpanda do you have a reference to the instance of this occurring in 22.1.1? |
Ok I think I found it. The problem is at the intersection of partition movement, offset translation and shadow indexing.
|
😅 just a few minor sub systems |
Nice find. That looks like it took a lot of digging.
Given that we try to keep state partitioned and isolated on each core, does this suggest that something global to the node is involved? Or rather, reading the rest of the bullet points I'm not sure which part is related to the property of the movement remaining on the same node. Do you think a reproducer in ducktape is feasible? Do we have insight yet into what the fix may look like? |
This global thing is the log itself. When partition movement is cross-node, we have to download the log via recovery and the offset translator state gets rebuilt correctly. OTOH when the movement is cross-core, the log stays in the same place but we have to move the raft kvstore state. Looks like in this case we forgot to move offset translator bits.
Sure, it should be easily reproducible. |
Oh of course. Thanks. That's a really clear explanation |
Previously due to a typo raft::details::move_persistent_state was called twice. Fixes redpanda-data#4466
/backport v22.1.x |
/backport v21.11.x |
Previously due to a typo raft::details::move_persistent_state was called twice. Fixes redpanda-data#4466 (cherry picked from commit 3152509)
Previously due to a typo raft::details::move_persistent_state was called twice. Fixes redpanda-data#4466 (cherry picked from commit 3152509)
Previously due to a typo raft::details::move_persistent_state was called twice. Fixes redpanda-data#4466
Version:
v21.11.10
Do we have any more information / context about this failure @bpraseed ?
The text was updated successfully, but these errors were encountered: