Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cloud storage: unique archived segment names #3365

Merged
merged 3 commits into from
Jan 11, 2022

Conversation

ztlpn
Copy link
Contributor

@ztlpn ztlpn commented Dec 28, 2021

Cover letter

Fixes #3272

To ensure that segments in the cloud storage are not overwritten by concurrent archivers running on different nodes, we append archiver term id as a suffix for segment paths. As raft guarantees that in each term there will be only one leader, this ensures segment path uniqueness.

Release notes

Bug fixes

  • Fixed data loss in shadow indexing archived data that could occur after quick partition leadership transfer back and forth between two nodes. Compatibility note: previous redpanda versions won't be able to read shadow indexing data archived by newer versions.

Previously segment paths were used in the manifest for the case when
segments come from a different ntp revision than the current one
(these segments could appear in the manifest for example after
recovery). This lead to a lot of code that had to handle both cases for
manifest keys (short names and full paths) and parsed these paths and
assembled them back. By storing ntp revision id in segment_meta instead,
we can eliminate all this code. Segment paths become opaque, as we don't
have to parse them anywhere - we only generate them when we need to talk
to cloud storage and that's it.
See redpanda-data#3272. To ensure that
segments in the cloud storage are not overwritten by concurrent archivers
running on different nodes, we append archiver term id as a suffix for
segment paths. As raft guarantees that in each term there will be only
one leader, this ensures segment path uniqueness.
@ztlpn ztlpn force-pushed the unique-archived-segment-names branch from b1dc613 to 8cf6c12 Compare December 29, 2021 12:54
@ztlpn
Copy link
Contributor Author

ztlpn commented Dec 29, 2021

changes in force-push:

  • rebase to tip of dev

src/v/cloud_storage/remote_segment.cc Show resolved Hide resolved
src/v/cloud_storage/manifest.cc Show resolved Hide resolved
BOOST_REQUIRE(res.has_value() == false);
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good riddance to all the code below

src/v/archival/ntp_archiver_service.cc Show resolved Hide resolved
src/v/archival/ntp_archiver_service.cc Show resolved Hide resolved
src/v/cloud_storage/tests/manifest_test.cc Show resolved Hide resolved
@ztlpn ztlpn requested review from Lazin and dotnwat January 10, 2022 19:22
Copy link
Contributor

@Lazin Lazin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ztlpn ztlpn merged commit 85f159c into redpanda-data:dev Jan 11, 2022
@ztlpn ztlpn mentioned this pull request Jan 11, 2022
ztlpn added a commit that referenced this pull request Jan 11, 2022
@ztlpn ztlpn deleted the unique-archived-segment-names branch November 27, 2023 13:15
This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Prevent segment overwrite in the cloud storage by archivers running on different nodes
3 participants