Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix use-after-free during consensus shutdown #5759

Merged
merged 3 commits into from
Aug 1, 2022

Conversation

mmaslankaprv
Copy link
Member

@mmaslankaprv mmaslankaprv commented Aug 1, 2022

Cover letter

When raft stops the last step in stop sequence is closing gate. After
gate is closed all the background futures are guaranteed to be finished.
However it may happen that the fiber is waiting for an _op_lock while
gate is being closed. We need to make sure that when the _op_lock is
acquired the consensus instance is still alive.

Clearing and marking an oplock as broken after dispatching gate
close makes it impossible for incoming fibers to access consensus
after it is deleted.When raft stops the last step in stop sequence is closing gate. After
gate is closed all the background futures are guaranteed to be finished.
However it may happen that the fiber is waiting for an _op_lock while
gate is being closed. We need to make sure that when the _op_lock is
acquired the consensus instance is still alive.

Clearing and marking an oplock as broken after dispatching gate
close makes it impossible for incoming fibers to access consensus
after it is deleted.

Fixes #5754

Backport Required

  • not a bug fix
  • papercut/not impactful enough to backport
  • v22.2.x
  • v22.1.x
  • v21.11.x

UX changes

Describe in plain language how this PR affects an end-user. What topic flags, configuration flags, command line flags, deprecation policies etc are added/changed.

Release notes

return _snapshot_writer->close().then(
[this] { _snapshot_writer.reset(); });
});
co_await _event_manager.stop();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so much nicer for my eyeballs

src/v/raft/consensus.cc Outdated Show resolved Hide resolved
src/v/raft/consensus.cc Outdated Show resolved Hide resolved
@mmaslankaprv mmaslankaprv requested a review from jcsp August 1, 2022 10:35
mmaslankaprv added a commit to mmaslankaprv/redpanda that referenced this pull request Aug 1, 2022
When raft stops the last step in stop sequence is closing gate. After
gate is closed all the background futures are guaranteed to be finished.
However it may happen that the fiber is waiting for an `_op_lock` while
gate is being closed. We need to make sure that when gate is closed no
other fiber can acquire the `_op_lock`.

Marking `_op_lock` mutex as broken prevents all waiters for acquiring a
mutex and at the same time accessing `consensus` state.

Fixes: redpanda-data#5759

Signed-off-by: Michal Maslanka <[email protected]>
jcsp
jcsp previously approved these changes Aug 1, 2022
Copy link
Contributor

@jcsp jcsp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this -- the checks for broken are verbose but IMO worthwhile.

When raft stops the last step in stop sequence is closing gate. After
gate is closed all the background futures are guaranteed to be finished.
However it may happen that the fiber is waiting for an `_op_lock` while
gate is being closed. We need to make sure that when gate is closed no
other fiber can acquire the `_op_lock`.

Marking `_op_lock` mutex as broken prevents all waiters for acquiring a
mutex and at the same time accessing `consensus` state.

Fixes: redpanda-data#5759

Signed-off-by: Michal Maslanka <[email protected]>
Signed-off-by: Michal Maslanka <[email protected]>
@jcsp jcsp changed the title Fix 5754 Fix use-after-free during consensus shutdown Aug 1, 2022
@mmaslankaprv mmaslankaprv requested a review from jcsp August 1, 2022 11:20
@mmaslankaprv
Copy link
Member Author

Ci failures: #5276, #5725

@mmaslankaprv mmaslankaprv merged commit 72f4359 into redpanda-data:dev Aug 1, 2022
Copy link
Member

@dotnwat dotnwat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

UAF during shutdown in AutomaticLeadershipBalancingTest.test_automatic_rebalance
3 participants