Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cloud_storage: fix race between remote_partition::stop and readers #3813

Merged
merged 2 commits into from
Feb 17, 2022

Conversation

ztlpn
Copy link
Contributor

@ztlpn ztlpn commented Feb 16, 2022

Cover letter

Fixes #3507. Previously remote_partition readers could return segment readers to the partition object after stop() was called, leading to an assertion failure in the remote_partition destructor. To fix that we make readers hold a guard for the partition gate and do a last pass over the eviction list after the gate was closed (and thus all readers had finished).

Release notes

Bug fixes

  • Fix a rare crash that could happen when doing a shadow indexing fetch concurrently with topic deletion.

@ztlpn ztlpn added this to the Shadow Indexing GA milestone Feb 16, 2022
@CLAassistant
Copy link

CLAassistant commented Feb 16, 2022

CLA assistant check
All committers have signed the CLA.

@ztlpn ztlpn marked this pull request as ready for review February 16, 2022 13:47
@ztlpn
Copy link
Contributor Author

ztlpn commented Feb 16, 2022

test_cloud_storage_rpunit failed, investigating

Fixes redpanda-data#3507. Previously
remote_partition readers could return segment readers to the partition
object after stop() was called, leading to an assertion failure in the
remote_partition destructor. To fix that we make readers hold a guard
for the partition gate and do a last pass over the eviction list after
the gate was closed (and thus all readers finished).

Test test_remote_partition_lifetime_issue is removed because it tries to
model the situation (readers trying to read after
remote_partition::stop()) which is now impossible.
We don't need abort_source because readers can use _gate.is_closed()
to check if they need to proceed, and eviction loop can use the
broken_condition_variable exception as a stop condition.
@ztlpn
Copy link
Contributor Author

ztlpn commented Feb 17, 2022

build error is #3714, restarted

@ztlpn ztlpn merged commit 1ba1ab7 into redpanda-data:dev Feb 17, 2022
ztlpn added a commit that referenced this pull request Feb 19, 2022
Backport #3813: cloud_storage: fix race between remote_partition::stop and readers
@ztlpn ztlpn deleted the fix-3507 branch November 27, 2023 13:17
This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

cloud_storage: Assertion while deleting topic under load ('_stopped' Destroyed without stopping)
3 participants