Td 4804 oom fix v2 #5464

travisdowns · 2022-07-14T03:55:23Z

Cover letter

Describe in plain language the motivation (bug, feature, etc.) behind the change in this PR and how the included commits address it.

Fixes #ISSUE-NUMBER, Fixes #ISSUE-NUMBER, ...

UX changes

Describe in plain language how this PR affects an end-user. What topic flags, configuration flags, command line flags, deprecation policies etc are added/changed.

Release notes

Fixes: #4804

The behavior of process_next_response is worth clarifying as the returned future does not nececessarily wait for all enqueued respones to be finished before resolving. We also rename the method to better reflect its purpose.

Currently we release resources after the response is enqueued in connection_context and response processing is called, but it may not have been sent at this point as we require in-order responses but second-stage processing may happen out of order. In this change, we instead tunnel the resource object through to the place where the response is written, and release it there. FIxes redpanda-data#5278.

This lets us share it with the request processing code which would also like to do type list based manipulation of the request types.

We already had concepts for one-phase and two-phase handlers, and this concept is simply the union of those two handler concepts, i.e., "any" type of handler.

This is a polymorphic handler each of which is backed by an existing concrete handler, but which lets us treat handlers generically without restorting to template functions. This reduces code bloat significantly as we do not duplicate code paths for our ~45 handler types. For example, requests.cc.o drops from ~11 MB to ~5 MB after it is switched to the any_handler approach.

The any_handler already gets good functional coverage as it is added to the core request path in requests.cc, but we also include this unit test with basic coverage.

Preceding changes in this series introduced a runtime polymorphic handler class, and this change switches most of the request handling to use it. In particular, we replace the large switch on API key which dispatches to a template method to a lookup of the handler method and virtual dispatch. Some handlers that need special processing like the authentication/SASL related ones still use the old approach for now.

Currently we use single memory estimate for all kafka request types, but different API calls may use wildly different amounts of memory. This change allows each handler to perform an API-specific calculation instead.

In connection_context, we now use the handler-specific initial memory use estimate, rather than a single estimate for every handler type.

The session_resources type was a private member of connection_context, but as we want to use it more broadly, move it out as a standalone public class. Additionally, pass it by shared_pointer in preparation for later changes will feed it into requests.

Currently we estimate that metadata requests take 8000 + rsize * 2 bytes of memory to process, where rsize is the size of the request. Since metadata requests are very small, this end up being roughly 8000 bytes. However, metadata requests which return information about every partition and replica may easily be several MBs in size. To fix this for metadata requests specifically, we use a new more conservative estimate which uses the "maximum supported" partition count give an upper bound on the size. This this ends up "only" 8 MB, and applies only to metadata requests, the performance cost (chiefly a reduced maximum number of metadata requests in flight on one shard) should be moderate.

Single-stage handlers have a hander template which means that handler objects can be declared in a single line specifying their api object, min and max API versions. This change extends this nice concept to two-stage handlers as well.

Passing the connection context to the estimator allows the estimator to use the various subsystems to estimate the memory use of a given request.

travisdowns added 16 commits July 5, 2022 12:07

Clarify behavior of process_next_response

8285f4f

The behavior of process_next_response is worth clarifying as the returned future does not nececessarily wait for all enqueued respones to be finished before resolving. We also rename the method to better reflect its purpose.

Add comment to session resources

f74e577

Improve documentation of throttle related methods

1b07f98

Move max_api_key function to handlers header.

66ebbe8

This lets us share it with the request processing code which would also like to do type list based manipulation of the request types.

Introduce KafkaApiHandlerAny concept

9340d4a

We already had concepts for one-phase and two-phase handlers, and this concept is simply the union of those two handler concepts, i.e., "any" type of handler.

Add any_handler unit tests

2dc759c

The any_handler already gets good functional coverage as it is added to the core request path in requests.cc, but we also include this unit test with basic coverage.

Add support for memory estimation to handlers

31256f9

Currently we use single memory estimate for all kafka request types, but different API calls may use wildly different amounts of memory. This change allows each handler to perform an API-specific calculation instead.

Use handler specific memory estimate

172a457

In connection_context, we now use the handler-specific initial memory use estimate, rather than a single estimate for every handler type.

Sort kafka/server forward includes

0c0462a

Introduce handler template for two-stage handlers

4004d72

Single-stage handlers have a hander template which means that handler objects can be declared in a single line specifying their api object, min and max API versions. This change extends this nice concept to two-stage handlers as well.

Add the connection context to the memory estimator

5429713

Passing the connection context to the estimator allows the estimator to use the various subsystems to estimate the memory use of a given request.

github-actions bot added the area/redpanda label Jul 14, 2022

mmedenjak added kind/bug Something isn't working DW labels Jul 14, 2022

travisdowns closed this Jul 22, 2022

travisdowns deleted the td-4804-OOM-fix-v2 branch July 22, 2022 07:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Td 4804 oom fix v2 #5464

Td 4804 oom fix v2 #5464

travisdowns commented Jul 14, 2022 •

edited by mmedenjak

Loading

Td 4804 oom fix v2 #5464

Td 4804 oom fix v2 #5464

Conversation

travisdowns commented Jul 14, 2022 • edited by mmedenjak Loading

Cover letter

UX changes

Release notes

travisdowns commented Jul 14, 2022 •

edited by mmedenjak

Loading