OOM due to underestimated memory use in metadata handler #4804

travisdowns · 2022-05-18T21:45:10Z

Version & Environment

Redpanda version: dev f75ceed

What went wrong?

Redpanda node continually runs out of memory at start when joining a loaded cluster. An underlying cause is that the size of metadata requests (and probably other types) is underestimated by several orders of magnitude in our memory throttling logic. This results in many requests executing in parallel, exhausting the per-shard memory.

The crashes are correlated with (a) startup or (b) another node being killed because in those cases the metadata requests pile up as we are trying to either (a) get the first metadata refresh or (b) figuring out who the new controller is, and many requests pile up on the refresh semaphore and then are suddenly uncorked all at once, causing a temporary and perhaps fatal spike in memory use.

What should have happened instead?

Our kafka request logic should throttle the requests to an an appropriate concurrent level to avoid OOM.

How to reproduce the issue?

Connect many consumers to a cluster.
Kill a node and restart it.
Observe (with added logging) the concurrency and memory usage for metadata requests.

Additional information

In reserve_request_units we calculate an estimated "memory size" of the request. This estimate is based on the over-the-wire size of the request: specifically we estimate we need request_size * 2 + 8000 bytes. In the case of metadata requests, this request is about 26 bytes on the wire, so we estimate that 8052 bytes are needed and "reserve" that many units from the memory semaphore.

However, the actual memory size to handle the request is not related to the on-the-wire size, but rather the size of the response. With many partitions, the response can be large: at least 100 bytes per partition just in "working space" alone, plus the size of the on-the-wire response which is similar. For 20k partitions that's probably in the range of 4 MB (we see memory allocation failures for single allocations in this range). So this is about 500x larger than the estimate and the memory semaphore provides no real protection (e.g., with 600M allocated to Kafka requests, we should let at most 150 requests run concurrently, but the existing throttle will let ~75,000 requests go in parallel).

It is not clear how best to fix this. A better estimate would solve the problem, but very early in the request handling flow it is not clear we can make a much better estimate: in this example we haven't built the topics table at all as we are starting up, so the requests are already in progress before we know there are many partitions. As a workaround, we can add a second semaphore inside the metadata handler: in this location we can make a better estimate of the size of the response (before we start building it) and limit the total concurrency to a response-size-aware value.

The text was updated successfully, but these errors were encountered:

Per-handler memory estimation, more accurate estimate for metadata handler Currently we estimate that metadata requests take 8000 + rsize * 2 bytes of memory to process, where rsize is the size of the request. Since metadata requests are very small, this end up being roughly 8000 bytes. However, metadata requests which return information about every partition and replica may easily be several MBs in size. To fix this for metadata requests specifically, we use a new more conservative estimate which uses the current topic and partition configuration to give an upper bound on the size. The remainder of this series sets up this change and also prepares for a more comprehensive change where we will allow a "second chance" allocation from the memory semaphore. Fixes: #4804 Fixes: #5278

travisdowns added kind/bug Something isn't working area/controller labels May 18, 2022

travisdowns self-assigned this May 18, 2022

travisdowns added the DW label Jun 15, 2022

mmedenjak mentioned this issue Jul 6, 2022

Per-handler memory estimation, more accurate estimate for metadata handler #5346

Merged

travisdowns mentioned this issue Jul 11, 2022

Improve memory estimate for metadata handler #5419

Open

mmedenjak mentioned this issue Jul 14, 2022

Td 4804 oom fix v2 #5464

Closed

travisdowns closed this as completed in #5346 Jul 15, 2022

shane-runsafe mentioned this issue Oct 19, 2022

[Snyk] Security upgrade mocha from 7.1.1 to 9.2.2 runsafesecurity/redpanda#8

Open

secpanda mentioned this issue Oct 19, 2022

[Snyk] Security upgrade mocha from 7.1.1 to 9.2.2 #6819

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OOM due to underestimated memory use in metadata handler #4804

OOM due to underestimated memory use in metadata handler #4804

travisdowns commented May 18, 2022 •

edited

Loading

OOM due to underestimated memory use in metadata handler #4804

OOM due to underestimated memory use in metadata handler #4804

Comments

travisdowns commented May 18, 2022 • edited Loading

Version & Environment

What went wrong?

What should have happened instead?

How to reproduce the issue?

Additional information

travisdowns commented May 18, 2022 •

edited

Loading