Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

storage: assertion failure in offset_translator_state.cc #4466

Closed
dotnwat opened this issue Apr 28, 2022 · 11 comments · Fixed by #4493
Closed

storage: assertion failure in offset_translator_state.cc #4466

dotnwat opened this issue Apr 28, 2022 · 11 comments · Fixed by #4493
Assignees
Labels
kind/bug Something isn't working
Milestone

Comments

@dotnwat
Copy link
Member

dotnwat commented Apr 28, 2022

Version: v21.11.10

ERROR 2022-04-26 19:57:36,136 [shard 0] assert - Assert failure: (../../../src/v/storage/offset_translator_state.cc:194) 'base_offset > last_offset && base_offset < offset' ntp {kafka/topic/1}: inconsistent add_absolute_delta (offset 767, delta 20), but last_offset: 761, last_delta: 33

Do we have any more information / context about this failure @bpraseed ?

@dotnwat dotnwat added the kind/bug Something isn't working label Apr 28, 2022
@ztlpn
Copy link
Contributor

ztlpn commented Apr 28, 2022

offset_translator_state::add_absolute_delta is used in shadow indexing when reading from remote segments. For each remote segment offset delta is stored in the manifest metadata. So this error is possible when these offset deltas are inconsistent.

One possible way this could happen that I can think of is when shadow indexing data from different topic instances gets mixed together (e.g. when a topic was recreated in a different redpanda instance that reused the same s3 bucket and the revision numbers coincided due to bad luck).

@piyushredpanda
Copy link
Contributor

This was reproduced with 22.1.1 as well.

@dotnwat
Copy link
Member Author

dotnwat commented Apr 28, 2022

This was reproduced with 22.1.1 as well.

@piyushredpanda do you have a reference to the instance of this occurring in 22.1.1?

@dotnwat dotnwat added this to the v22.1.1 milestone Apr 28, 2022
@piyushredpanda
Copy link
Contributor

@VadimPlh do you mind adding the details? cc: @dotnwat

@ztlpn
Copy link
Contributor

ztlpn commented Apr 28, 2022

Ok I think I found it.

The problem is at the intersection of partition movement, offset translation and shadow indexing.

  1. ntp kafka/topic-swiubxneva/0 got moved from {node_id: 1, shard: 1} to {node_id: 1, shard: 3}. Crucially, the new partition placement is on the same node.
  2. Offset translator state is not found in the kvstore and gets initialized from the configuration manager state: INFO 2022-04-28 12:01:56,259 [shard 3] offset_translator - ntp: {kafka/topic-swiubxneva/0} - offset_translator.cc:131 - offset translation kvstore state not found, loading from provided bootstrap state
  3. Because offset delta from the configuration manager doesn't take archival metadata batches into account, offset translator state gets clobbered.
  4. Later a segment with incorrect offset delta gets uploaded.

@dotnwat
Copy link
Member Author

dotnwat commented Apr 29, 2022

partition movement, offset translation and shadow indexing.

😅 just a few minor sub systems

@dotnwat
Copy link
Member Author

dotnwat commented Apr 29, 2022

Nice find. That looks like it took a lot of digging.

Crucially, the new partition placement is on the same node.

Given that we try to keep state partitioned and isolated on each core, does this suggest that something global to the node is involved? Or rather, reading the rest of the bullet points I'm not sure which part is related to the property of the movement remaining on the same node.

Do you think a reproducer in ducktape is feasible? Do we have insight yet into what the fix may look like?

@ztlpn
Copy link
Contributor

ztlpn commented Apr 29, 2022

Given that we try to keep state partitioned and isolated on each core, does this suggest that something global to the node is involved?

This global thing is the log itself. When partition movement is cross-node, we have to download the log via recovery and the offset translator state gets rebuilt correctly. OTOH when the movement is cross-core, the log stays in the same place but we have to move the raft kvstore state. Looks like in this case we forgot to move offset translator bits.

Do you think a reproducer in ducktape is feasible?

Sure, it should be easily reproducible.

@dotnwat
Copy link
Member Author

dotnwat commented Apr 29, 2022

When partition movement is cross-node, we have to download the log via recovery and the offset translator state gets rebuilt correctly.

Oh of course. Thanks. That's a really clear explanation

ztlpn added a commit to ztlpn/redpanda that referenced this issue Apr 29, 2022
Previously due to a typo raft::details::move_persistent_state was called
twice. Fixes redpanda-data#4466
@ztlpn
Copy link
Contributor

ztlpn commented Apr 29, 2022

/backport v22.1.x

@ztlpn
Copy link
Contributor

ztlpn commented Apr 29, 2022

/backport v21.11.x

ztlpn added a commit to ztlpn/redpanda that referenced this issue Apr 29, 2022
Previously due to a typo raft::details::move_persistent_state was called
twice. Fixes redpanda-data#4466

(cherry picked from commit 3152509)
ztlpn added a commit to ztlpn/redpanda that referenced this issue Apr 29, 2022
Previously due to a typo raft::details::move_persistent_state was called
twice. Fixes redpanda-data#4466

(cherry picked from commit 3152509)
abhijat pushed a commit to abhijat/redpanda that referenced this issue May 20, 2022
Previously due to a typo raft::details::move_persistent_state was called
twice. Fixes redpanda-data#4466
This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants