-
Notifications
You must be signed in to change notification settings - Fork 577
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rpc/transport: release caller units when timeout occurs #6738
Conversation
dbc4f8e
to
3bd85a6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeh this looks correct to me
looks like maybe there are more UAF issues?
|
src/v/rpc/transport.cc
Outdated
@@ -163,6 +164,13 @@ transport::make_response_handler(netbuf& b, const rpc::client_opts& opts) { | |||
_probe.request_timeout(); | |||
_correlations.erase(it); | |||
} | |||
/* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't we release the units in L239? I added a test for this in the last patch ..
// Verify that the resources are released correctly after timeout. |
or is this an edge case (race) where a timeout kicks in before the dispatch fiber is scheduled?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like it's possible that we begin calling send after the call to fail_outstanding_futures()
since that only calls shutdown()
which doesn't close the gate. Maybe our failure-mode calls to fail_outstanding_futures()
should actually be calls to stop()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, around the other call to _requests_queue.erase()
we explicitly move the resource_units
and let them leave scope. Do we have to do that here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
id we call send
on an output stream after shutdown
it will always throw as the underlying fd is closed. The edge case is is that the request is timed out so it returns to the caller, then the caller can continue and be destroyed. The timeout may happen before we dispatch send and then the send. Hence the units will stay in the _requests_queue
3bd85a6
to
1bcbd36
Compare
ci failure: #5575 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
makes sense.
/ci-repeat 10 |
|
When request is completed with timeout we must release request related caller passed semaphore units. Otherwise the units may be held in the requests queue long enough to outlive the caller and cause use after free error when released. Fixes: redpanda-data#6711 Signed-off-by: Michal Maslanka <michal@redpanda.com>
1bcbd36
to
8967962
Compare
/ci-repeat 5 |
the ci failure is: #6614 |
/backport v22.2.x |
/backport v22.1.x |
Cover letter
When request is completed with timeout we must release request related caller passed semaphore units. Otherwise the units may be held in the requests queue long enough to outlive the caller and cause use after free error when released.
Fixes: #6711
Fixes: #5261
Fixes #ISSUE-NUMBER, Fixes #ISSUE-NUMBER, ...
Backport Required
UX changes
Describe in plain language how this PR affects an end-user. What topic flags, configuration flags, command line flags, deprecation policies etc are added/changed.
Release notes
Bug Fixes