Request resources may be released at the wrong time #5278

travisdowns · 2022-06-29T21:58:04Z

Version & Environment

Redpanda version: 148f82aa3

What went wrong and what should have happened instead?

Request resources such as the memory semaphore units should be released when the response has been written out to the socket, and the response object and any other usespace buffers have been destroyed. However, this may not be the case because of the way we process responses in order.

Additional information

Find below the processing for the "second stage" of a request, which involves generating the response. See below for analysis.

                    /**
                     * second stage processed in background.
                     */
                    ssx::background
                      = ssx::spawn_with_gate_then(
                          _rs.conn_gate(),
                          [this, f = std::move(f), seq, correlation]() mutable {
                              return f.then([this, seq, correlation](
                                              response_ptr r) mutable {
                                  r->set_correlation(correlation);
                                  _responses.insert({seq, std::move(r)});
                                  return process_next_response();
                              });
                          })
                          .handle_exception([self](std::exception_ptr e) {
                              // snip
                          })
                          .finally([s = std::move(s), self] {});

After the response is ready, we insert it to _responses queue and call process_next_response. At the very bottom we move s into a finally clause. It is s (a session_resources object) that holds the resources associated with the request, so it is destroyed after the future from process_next_response resolves. However this future does not necessarily resolve after the response has finished (response destroyed, etc): if may in fact resolve immediately if the response isn't the next response in sequence.

For example: requests A and B are sent on the connection in that order, and request B finishes its second stage first: it will not be sent by the process_next_response call when it finishes, but remain enqueued, but its resources are destroyed at this point. Only when A completes both A and B (in that order) will be sent.

The text was updated successfully, but these errors were encountered:

Currently we release resources after the response is enqueued in connection_context and response processing is called, but it may not have been sent at this point as we require in-order responses but second-stage processing may happen out of order. In this change, we instead tunnel the resource object through to the place where the response is written, and release it there. FIxes redpanda-data#5278.

piyushredpanda · 2022-07-02T15:33:59Z

I saw a draft PR already, hence assigning to you, @travisdowns

Currently we release resources after the response is enqueued in connection_context and response processing is called, but it may not have been sent at this point as we require in-order responses but second-stage processing may happen out of order. In this change, we instead tunnel the resource object through to the place where the response is written, and release it there. FIxes redpanda-data#5278.

Per-handler memory estimation, more accurate estimate for metadata handler Currently we estimate that metadata requests take 8000 + rsize * 2 bytes of memory to process, where rsize is the size of the request. Since metadata requests are very small, this end up being roughly 8000 bytes. However, metadata requests which return information about every partition and replica may easily be several MBs in size. To fix this for metadata requests specifically, we use a new more conservative estimate which uses the current topic and partition configuration to give an upper bound on the size. The remainder of this series sets up this change and also prepares for a more comprehensive change where we will allow a "second chance" allocation from the memory semaphore. Fixes: #4804 Fixes: #5278

travisdowns added kind/bug Something isn't working area/kafka labels Jun 29, 2022

travisdowns mentioned this issue Jun 30, 2022

Release kafka request resources at the right time #5280

Draft

piyushredpanda assigned travisdowns Jul 2, 2022

travisdowns mentioned this issue Jul 11, 2022

Per-handler memory estimation, more accurate estimate for metadata handler #5346

Merged

travisdowns closed this as completed in #5346 Jul 15, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Request resources may be released at the wrong time #5278

Request resources may be released at the wrong time #5278

travisdowns commented Jun 29, 2022 •

edited

Loading

piyushredpanda commented Jul 2, 2022

Request resources may be released at the wrong time #5278

Request resources may be released at the wrong time #5278

Comments

travisdowns commented Jun 29, 2022 • edited Loading

Version & Environment

What went wrong and what should have happened instead?

Additional information

piyushredpanda commented Jul 2, 2022

travisdowns commented Jun 29, 2022 •

edited

Loading