Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Request resources may be released at the wrong time #5278

Closed
travisdowns opened this issue Jun 29, 2022 · 1 comment · Fixed by #5346 · May be fixed by #5280
Closed

Request resources may be released at the wrong time #5278

travisdowns opened this issue Jun 29, 2022 · 1 comment · Fixed by #5346 · May be fixed by #5280
Assignees
Labels
area/kafka kind/bug Something isn't working

Comments

@travisdowns
Copy link
Member

travisdowns commented Jun 29, 2022

Version & Environment

Redpanda version: 148f82aa3

What went wrong and what should have happened instead?

Request resources such as the memory semaphore units should be released when the response has been written out to the socket, and the response object and any other usespace buffers have been destroyed. However, this may not be the case because of the way we process responses in order.

Additional information

Find below the processing for the "second stage" of a request, which involves generating the response. See below for analysis.

                    /**
                     * second stage processed in background.
                     */
                    ssx::background
                      = ssx::spawn_with_gate_then(
                          _rs.conn_gate(),
                          [this, f = std::move(f), seq, correlation]() mutable {
                              return f.then([this, seq, correlation](
                                              response_ptr r) mutable {
                                  r->set_correlation(correlation);
                                  _responses.insert({seq, std::move(r)});
                                  return process_next_response();
                              });
                          })
                          .handle_exception([self](std::exception_ptr e) {
                              // snip
                          })
                          .finally([s = std::move(s), self] {});

After the response is ready, we insert it to _responses queue and call process_next_response. At the very bottom we move s into a finally clause. It is s (a session_resources object) that holds the resources associated with the request, so it is destroyed after the future from process_next_response resolves. However this future does not necessarily resolve after the response has finished (response destroyed, etc): if may in fact resolve immediately if the response isn't the next response in sequence.

For example: requests A and B are sent on the connection in that order, and request B finishes its second stage first: it will not be sent by the process_next_response call when it finishes, but remain enqueued, but its resources are destroyed at this point. Only when A completes both A and B (in that order) will be sent.

@travisdowns travisdowns added kind/bug Something isn't working area/kafka labels Jun 29, 2022
travisdowns added a commit to travisdowns/redpanda that referenced this issue Jun 30, 2022
Currently we release resources after the response is enqueued
in connection_context and response processing is called, but it
may not have been sent at this point as we require in-order responses
but second-stage processing may happen out of order.

In this change, we instead tunnel the resource object through to
the place where the response is written, and release it there.

FIxes redpanda-data#5278.
@piyushredpanda
Copy link
Contributor

I saw a draft PR already, hence assigning to you, @travisdowns

travisdowns added a commit to travisdowns/redpanda that referenced this issue Jul 5, 2022
Currently we release resources after the response is enqueued
in connection_context and response processing is called, but it
may not have been sent at this point as we require in-order responses
but second-stage processing may happen out of order.

In this change, we instead tunnel the resource object through to
the place where the response is written, and release it there.

FIxes redpanda-data#5278.
travisdowns added a commit to travisdowns/redpanda that referenced this issue Jul 14, 2022
Currently we release resources after the response is enqueued
in connection_context and response processing is called, but it
may not have been sent at this point as we require in-order responses
but second-stage processing may happen out of order.

In this change, we instead tunnel the resource object through to
the place where the response is written, and release it there.

FIxes redpanda-data#5278.
travisdowns added a commit that referenced this issue Jul 15, 2022
Per-handler memory estimation, more accurate estimate for metadata handler

Currently we estimate that metadata requests take 8000 + rsize * 2 bytes
of memory to process, where rsize is the size of the request. Since
metadata requests are very small, this end up being roughly 8000 bytes.

However, metadata requests which return information about every
partition and replica may easily be several MBs in size.

To fix this for metadata requests specifically, we use a new more
conservative estimate which uses the current topic and partition
configuration to give an upper bound on the size.

The remainder of this series sets up this change and also prepares
for a more comprehensive change where we will allow a "second
chance" allocation from the memory semaphore.

Fixes: #4804
Fixes: #5278
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/kafka kind/bug Something isn't working
Projects
None yet
2 participants