Per-handler memory estimation, more accurate estimate for metadata handler #5346

travisdowns · 2022-07-05T18:31:53Z

Currently we estimate that metadata requests take 8000 + rsize * 2 bytes
of memory to process, where rsize is the size of the request. Since
metadata requests are very small, this end up being roughly 8000 bytes.

However, metadata requests which return information about every
partition and replica may easily be several MBs in size.

To fix this for metadata requests specifically, we use a new more
conservative estimate which uses the current topic and partition
configuration to give an upper bound on the size.

The remainder of this series sets up this change and also prepares
for a more comprehensive change where we will allow a "second
chance" allocation from the memory semaphore.

Fixes: #4804
Fixes: #5278

jcsp

This is a nice change: clearly there are degrees of sophistication we can try to use for our guesses here, but having a big estimate for metadata requests is a good place to start.

The use of dynamic dispatch for handlers looks like a net win as well, although it does make me wish we had a microbenchmark to provide confidence that the code size reduction is beneficial enough to cancel out any overhead from using virtual functions -- I wonder if you tested that by hand?

jcsp · 2022-07-11T10:42:50Z

src/v/kafka/server/handlers/metadata.cc

+ // metadata response.
+ constexpr size_t bytes_per_partition = 200;
+
+ return max_clusterwide_partitions * bytes_per_partition;


Can we get a reference to topic_table into the request processing path so that we can use the real partition count?

"Probably" - but I explicitly wanted to avoid this. This early request processing should be as simple as possible, so use information we have statically available which is just the request size at this point.

At least in the scenario where we saw this problem, the topic metadata wasn't necessarily populated yet because the system was just starting up so in some cases you wouldn't have a reliable estimate, though I guess you could use the conservative estimate at that point, but in general this seems subject to data staleness (among others, the OOM was seen in other cases too e.g., where the controller was killed and so requests were piling up waiting for a new controller to be elected: in this case the old topic table would probably be good enough).

If this were the final change, I would perhaps be in favor of this: but it is only a stopgap towards what I think should be the final solution: a two phase allocation of units where we allocate a small number of units up front to account for initial processing, then the "full amount" at the point during processing that we know the final amount, e.g., once we've queried the topic table and before we start serializing the result. I'll raise this solution up for more discussion, however.

the topic metadata wasn't necessarily populated yet because the system was just starting up

That sounds strange: we don't start listening on the Kafka port until the controller has finished replay. The exception to that is when we are adding a new node, or a node is coming back after being offline for some time, but even in those cases the window for clients to act on a stale partition count is small (and very fixable, all we need is a way for nodes in that state to learn the controller high water mark via internal rpc).

it is only a stopgap towards what I think should be the final solution: a two phase allocation of units where we allocate a small number of units up front to account for initial processing, then the "full amount" at the point during processing that we know the final amount

I was thinking of the two phase allocation the other way around: that we would initially allocate the worst case number of units, and then release some when we knew enough about the request to do so. If we do it the other way (small allocation then big allocation), we risk deadlock (or just reduced throughput) from too many concurrent requests taking their small phase 1 units, then there not being enough units available to allow any of them to proceed into phase 2.

So I guess the key question here is whether the end state is going to be the small->big procedure (in which case the end state does not require the total partition count, it calculates its big amount from the request's partition list), or the end state is going to be the big->small procedure (in which case the end state will still require the total partition count, so we should plumb it in now)

That sounds strange: we don't start listening on the Kafka port until the controller

The server was up and accepting kafka connections, but the metadata requests were blocked on refreshing metadata from the controller (which is what caused requests to line up in time). As I recall there had never been a successful metadata dissemination to this node at the time so as far as the node was concerned there were ~0 partitions, and by the time the dissemination completed the requests were already long past the initial part of the connection and were blocked on the metadata refresh semaphore.

Regardless of the specific details of that case, doing the estimation separately is always subject to races? You do the estimation and then some time later prepare the response and the number of partitions may have changed arbitrarily in the meantime.

I was thinking of the two phase allocation the other way around: that we would initially allocate the worst case number of units, and then release some when we knew enough about the request to do so.

Yes, I recall. The primary complaint (not from me) was the possible performance impact due to reduced concurrency. Though see below also.

If we do it the other way (small allocation then big allocation), we risk deadlock (or just reduced throughput) from too many concurrent requests taking their small phase 1 units, then there not being enough units available to allow any of them to proceed into phase 2.

Yes, I am aware. I don't know if this is the right place for a detailed review of this final state, but I do consider this case by including a (1) portion of the units which are guaranteed to be only for phase 2 and (2) prioritization of the units semaphore so in-progress requests always get first shots at the units. It was the realization that we need to do (2) that made me want to do this stopgap first before proceeding with the full fix.

So I guess the key question here is whether the end state is going to be the small->big procedure

My primary complaint about big -> small is that I don't think it is very general. It happens to work for a specific type of metadata request (the "empty" request which means "give me everything") which has a known max size and where the size is fairly easily to calculate based on cached metadata, but in general you might have to a lot more work to estimate the size: e.g., any metadata request which a topic list (though an overestimate would at least work) or other types of requests. At this point we don't even have access to the body of the request (though we could change that): we've read only 4 bytes (the size).

In the limit, you have to more or less do the request twice: once to estimate the size and then once again to really do it. The small -> big just kind of works "naturally" in that you don't have a separate estimation phase, you'll always have some point where you know the size of the request as you are processing it (at least, one can hope, though there is the question of whether you know it before you use the memory, which is usually possible but often not the way it is currently written).

If it were only this one type of metadata request, I think this would be fine but this is really a problem with a lot of request types, which I think favors small -> big if we can convince ourselves of deadlock freedom. Note that small -> big can really be something like small -> big -> small too, e.g., we can free units after we leave the high memory part of the processing such as in the other comment (though to avoid deadlock there should only be 1 increases after the initial one).

Though I wonder if we can decouple the "full fix" discussion from the "stopgap" one? Whichever way we go I don't think this change really pre-disposes us either way. Most of the changes are just related to allowing per-handler estimates at all and neither method of plumbing the units around is implemented here (well really, big -> small doesn't need any plumbing of units, it just needs a richer estimator, like a second handle type call to do the estimation, while small -> big needs the units put into the request context so they can be adjusted).

As I recall there had never been a successful metadata dissemination to this node at the time so as far as the node was concerned there were ~0 partitions,

Ah, so this sounds like it could have been avoided by consulting topics_table (populated before kafka api comes up by controller replay) rather than any of the metadata dissemination state which is fuzzier.

Anyway. I guess this boils down to "is hardcoded 40k good enough for 22.2". I think what I'm missing is how hard it would actually be to wire through topics_table: if it's super onerous then I can accept hardcoding a magic number, but can you maybe try it and see how invasive the patch is? and/or for concerns about races, we could have a hardcoded lower bound that says "even if you didn't see partitions yet, pretend like there are at least 10k" or somesuch.

Maybe a compromise is to make it a config property rather than hardcoding it -- I was thinking of having one of these anyway, to encode the "here's how big we ever tested" number. That way folks doing scale testing beyond 40k can at least dial it up without recompiling redpanda.

Ah, so this sounds like it could have been avoided by consulting topics_table (populated before kafka api comes up by controller replay) rather than any of the metadata dissemination state which is fuzzier.

I believe this was the first startup ever of the node so I suspect the controller state is at least initially empty, though maybe you said above that on first join we don't start accepting connections until controller state has been recovered/replayed from another node?

Anyway. I guess this boils down to "is hardcoded 40k good enough for 22.2". I think what I'm missing is how hard it would actually be to wire through topics_table: if it's super onerous then I can accept hardcoding a magic number, but can you maybe try it and see how invasive the patch is?

This should be fairly easy and yes I can do that.

we could have a hardcoded lower bound that says "even if you didn't see partitions yet, pretend like there are at least 10k" or somesuch.

Yes, though the race isn't only for the no-partitions case, it can be at any point: from jumping from 20k to 100k partitions or whatever. I think this a downside of big -> small in general: any estimate can always become out of date between the estimation point and when the request is processed, and even though this window might be small with constant requests we would expect it to be hit by some requests with high probability when the conditions change (though a fix for this could be to fail requests that are found to be unexpectedly large rather than forging ahead into a possible OOM).

Maybe a compromise is to make it a config property rather than hardcoding it -- I was thinking of having one of these anyway, to encode the "here's how big we ever tested" number. That way folks doing scale testing beyond 40k can at least dial it up without recompiling redpanda.

Also a good suggestion though I'll try your other suggestion first.

I have updated this PR to use an estimate based on the actual (worst-case) topic & partition metadata, plus the number of brokers.

jcsp · 2022-07-11T11:26:18Z

src/v/kafka/server/handlers/metadata.cc

+ // We cannot make a precise estimate of the size of a metadata response by
+ // examining only the size of the request (nor even by examining the entire
+ // request) since the response depends on the number of partitions in the
+ // cluster. Instead, we return a conservative estimate based on the soft


Does it require much more plumbing to also implement the return of un-needed units once the request is read? It's not essential for robustness, but i'd like to have confidence that the code structure here is sufficient for adding that.

There are a few places that we can release units, but I think "mostly yes".

Specifically, the main place we can release units is when the response processing has been been completed (the handler is done) and we just need to finish writing out the request: at this point we know the size of the request is simply the encoded size of the response (plus perhaps some padding due to allocation buckets, though we hardly handle this anywhere currently) so we can reduce the units to that amount.

This can be done generically w/o help from the handler (and in fact is quite easy, I just omitted it here out of an abundance of caution given that we'd like to get this into 22.2).

The other place would be inside the request processing, e.g., at certain points where it it is known that large intermediate structures have been cleaned up, we can free units. This can be part of the dynamic two-phase allocation as described above and would be up to each handler. To do this, we need to pass the units into somewhere the handlers can access them, e.g., the request context, which is where I was heading with much of this plumbing. I don't actually plumb it into the request context yet as it is not used in this stopgap fix.

src/v/kafka/server/connection_context.h

mmaslankaprv · 2022-07-14T09:56:56Z

src/v/kafka/server/handlers/details/any_handler.h

+ * to handlers, generally const objects with static storage duration
+ * obtained from handler_for_key.
+ */
+using any_handler = const any_handler_t*;


nit: i am wondering if we can just name this handler this way according to Liskov substitution principle each request_hander inheriting from any_handler is a hander. (this is a matter o taste i guess)

I like this name, but the problem is we already have (even before these changes) a kafka::handler symbol which is the helper template for single-pass handlers, so we can create one like:

using fetch_handler = handler<fetch_api, 4, 11>;

Both uses of the symbol can't co-exist. Would you like to rename that helper to something else?

It is also possible I did not fully understand the suggestion.

so maybe handler_base or hander_interface ?

@mmaslankaprv - OK, I tried out your original suggestion of handler: to do this I renamed the existing template helper to single_stage_helper which is consistent with the other helper with the same purpose for 2-stage: two_stage_helper.

This freed up kafka::handler to be used for the pointer to the handler interface. I've pushed those changes here.

mmaslankaprv · 2022-07-14T10:06:07Z

This mostly looks good, if we are concerned about the virtual dispatch overhead we may replace any_handler with concept and then generate the dispatch switch()...case with template parameter pack expansion

graphcareful

Few questions and small comments, but super nice job

graphcareful · 2022-07-14T14:10:28Z

src/v/kafka/server/handlers/details/any_handler.cc

+std::optional<any_handler> handler_for_key(kafka::api_key key) {
+ static constexpr auto lut = make_lut(request_types{});
+ if (key >= (short)0 && key < (short)lut.size()) {
+ if (auto handler = lut.at(key)) {


you can prefer operator[] here instead of calling .at because you've already performed the bounds checking.

It true - I did it this way to quiet a linter warning against every using [] with non-constant bounds which wasn't smart enough to notice the earlier bounds check. I'll switch to [] linter be-damned.

// NO LINT hehe

also you can mark this as noexcept too

Good idea, added.

src/v/kafka/server/handlers/handler.h

graphcareful · 2022-07-14T14:31:49Z

src/v/kafka/server/handlers/metadata.cc

+ size_estimate += sizeof(kafka::metadata_response_topic);
+ size_estimate += tp_ns.tp().size();
+
+ using partition = kafka::metadata_response_partition;


Super nice! Just a thought here. Metadata response v9 supports returning the list of brokers too. In large installations with thousands of brokers we might want to reserve extra memory for that as well. Opinions? In either case we only support this API at v7 anyway, so this would for sure be future work.

Yeah I think even earlier versions supported that? I know returning the brokers is a key function of this API.

I was including this in the 10,000 k slush fund, assuming something like 100 bytes per broker * 100 brokers max, but sure I can add brokers into this calculation too.

(it looks like brokers are unconditionally returned in all versions of metadata response)

ah yes, not sure why i said v9 i guess i was looking at something wrong. Heres the source:

{ "name": "Brokers", "type": "[]MetadataResponseBroker", "versions": "0+", "about": "Each broker in the response.", "fields": [

Makes sense. In any case, I added an estimate of 200 bytes per broker, plus the size of the broker struct itself. The 200 bytes is for the hostname and rack, if any.

travisdowns · 2022-07-14T17:03:41Z

This mostly looks good, if we are concerned about the virtual dispatch overhead we may replace any_handler with concept and then generate the dispatch switch()...case with template parameter pack expansion

@mmaslankaprv I forgot to add why I introduced the generic any_handler at all: because existing way of using template methods generated a lot of code bloat: requests.cc dropped from 12 MB to 5 MB in size after this changed: just the one vlog call which logged the request was responsible for several MB of bloat since the same call gets instantiated for all 30+ handlers though it does the same thing.

Since I was adding even more handler-specific logic here, I didn't want to propagate this pattern and I think "bad" virtual-method based polymorphism is natural here: after all, we before already had a big switch to dispatch to the right template-based process function (since fundamentally the handler needs to be chosen dynamically) and we'd need it again in any place we want to do handler specific logic. Now we can do the handler lookup once and pass the handler around and we just have 1 virtual function call on each invocation which as a similar cost to the switch on a call-vs-call basis (though the template approach can have fewer total calls since once you are inside the template you don't pay any more cost). The request handling path looks like it is already many 1000s of instructions at least, so I think the cost of a virtual call or two here is very small.

So in fact generating specialized per-handler code is exactly what I was trying to avoid here :).

graphcareful · 2022-07-14T17:12:43Z

This mostly looks good, if we are concerned about the virtual dispatch overhead we may replace any_handler with concept and then generate the dispatch switch()...case with template parameter pack expansion

@mmaslankaprv I forgot to add why I introduced the generic any_handler at all: because existing way of using template methods generated a lot of code bloat: requests.cc dropped from 12 MB to 5 MB in size after this changed: just the one vlog call which logged the request was responsible for several MB of bloat since the same call gets instantiated for all 30+ handlers though it does the same thing.

Since I was adding even more handler-specific logic here, I didn't want to propagate this pattern and I think "bad" virtual-method based polymorphism is natural here: after all, we before already had a big switch to dispatch to the right template-based process function (since fundamentally the handler needs to be chosen dynamically) and we'd need it again in any place we want to do handler specific logic. Now we can do the handler lookup once and pass the handler around and we just have 1 virtual function call on each invocation which as a similar cost to the switch on a call-vs-call basis (though the template approach can have fewer total calls since once you are inside the template you don't pay any more cost). The request handling path looks like it is already many 1000s of instructions at least, so I think the cost of a virtual call or two here is very small.

So in fact generating specialized per-handler code is exactly what I was trying to avoid here :).

I actually looked into Michal's comment here, realizing you did move a big conditional and still wanted to look into the drawbacks of virtual method dispatch. As a preface I do think the benefits outweigh the possible drawbacks, so i'm in favor of what you have here.

I was reading that since the compiler cannot know what address in the vtable is going to be called, it cannot make optimizations on the proceeding conditional it must make. Do you know more about this? Just curious.

travisdowns · 2022-07-14T18:07:25Z

I was reading that since the compiler cannot know what address in the vtable is going to be called, it cannot make optimizations on the proceeding conditional it must make. Do you know more about this? Just curious.

Thanks for your comment @graphcareful .

Right, the primary optimization it can't make is that it can't know the target of the call, so it can't inline the call or do any other IPA it might otherwise do. The key is ask is if this reflects "fundamental" dynamism or not. That is, could the compiler conceivably know the target? At a high level here the answer is "no" - the API key on which we base our behavior comes over the wire, so the compiler is never going to know the path the request processing will take statically.

In the code prior to this change, this was reflected in the big switch where we move from "dynamic" type-erased handling of the request to template-based distinct per-handler instantiation. This switch is replaced by the the handler_for_key method and is probably sightly worse than this method (in particular because it has to generate a lot of code for all the copies of the do_process call. The compiler can't optimize away this switch either.

So just replacing this switch with a virtual function is more or less a wash for this 1 call to handle. One big difference is though that when you want to make N calls to the handler for things that vary per handler (in this case N is 2: the existing handle call and the new memory_estimate call). In the virtual case it is two virtual calls now, plus the handler lookup. In the switch case it could be two switch lookups + dispatch or you could move the template<typename Request> stuff up one level higher to the (grand, etc)parent function which calls all N functions so now you only have one switch at the top level and the compiler doesn't need to do any dynamic dispatch for the N handler calls since the type is known. So it is an advantage in the micro-sense if you don't consider the other impacts.

To do do this here even for N=2 you'd have to make a ton of code be under the template and the code size would just explode. Many MB of code probably to duplicate the request flow for every handler type. To evaluate whether it's worth it, I think you'd want to know some order of magnitude costs: a virtual call is just an indirect function call, which is not much more expensive than a function call and is on the order of 1 ns. Import requests are probably 10 us at least (?), so 4 orders of magnitude more than a virtual call. Even a single icache miss due to code bloat probably costs 100x more. The key is that the bulk of the request is still compiled with full visibility inside the handle method specific to the request type: there aren't fine-grained virtual calls inside the handling of a single request, just 2 at the "front end", before we start the handler. The meat is unaffected.

Sorry these are kind of handy-wavy napkin estimates: but I'd like to soon be in a place where we can actually measure this stuff, but until then I think you can say with high confidence that this won't make any measurable difference, and if anything I'd put my money on "slightly faster".

It also has compile-time benefits: the requests/connection_context code doesn't need to know about each handler type any more, just the generic any_handler, so it can stop including the header for every handler, etc (this made possible by your request_types type_list, very handy.

graphcareful · 2022-07-14T19:08:03Z

It also has compile-time benefits: the requests/connection_context code doesn't need to know about each handler type any more, just the generic any_handler, so it can stop including the header for every handler, etc (this made possible by your request_types type_list, very handy.

Awesome comment, thanks a bunch for the insights

travisdowns · 2022-07-14T20:42:28Z

src/v/kafka/server/handlers/produce.h

- static constexpr api_version min_supported = api_version(0);
- static constexpr api_version max_supported = api_version(7);
- static process_result_stages handle(request_context, ss::smp_service_group);
- static constexpr auto despam_interval = std::chrono::minutes(5);


despam_interval has moved inside the implementation, there was no reason, I think, to have it here inside the public interface of the handler.

travisdowns · 2022-07-14T20:43:32Z

src/v/kafka/server/handlers/sync_group.cc

@@ -21,6 +21,7 @@

 namespace kafka {

+template<>


These two-stage handlers now use the helper template so these methods are template specializations now, just as they are in the single-stage case.

The behavior of process_next_response is worth clarifying as the returned future does not nececessarily wait for all enqueued respones to be finished before resolving. We also rename the method to better reflect its purpose.

Currently we release resources after the response is enqueued in connection_context and response processing is called, but it may not have been sent at this point as we require in-order responses but second-stage processing may happen out of order. In this change, we instead tunnel the resource object through to the place where the response is written, and release it there. FIxes redpanda-data#5278.

This lets us share it with the request processing code which would also like to do type list based manipulation of the request types.

We already had concepts for one-phase and two-phase handlers, and this concept is simply the union of those two handler concepts, i.e., "any" type of handler.

We wish to reclaim the name handler for the generic handler interface introduced in the next change.

This is a polymorphic handler class (abstract class with virtual methods), as well as concrete instantiations of the interface for each of the existing handlers, which can be looked up by API key. This lets us treat handlers generically without resorting to template functions which must be specialized for each handler type. This reduces code bloat significantly as we do not duplicate code paths for our ~45 handler types. For example, requests.cc.o drops from ~11 MB to ~5 MB after it is switched to the any_handler approach.

The handler_interface already gets good functional coverage as it is added to the core request path in requests.cc, but we also include this unit test with basic coverage.

Preceding changes in this series introduced a runtime polymorphic handler class, and this change switches most of the request handling to use it. In particular, we replace the large switch on API key which dispatches to a template method to a lookup of the handler method and virtual dispatch. Some handlers that need special processing like the authentication/SASL related ones still use the old approach for now.

Currently we use single memory estimate for all kafka request types, but different API calls may use wildly different amounts of memory. This change allows each handler to perform an API-specific calculation instead.

In connection_context, we now use the handler-specific initial memory use estimate, rather than a single estimate for every handler type.

The session_resources type was a private member of connection_context, but as we want to use it more broadly, move it out as a standalone public class. Additionally, pass it by shared_pointer in preparation for later changes will feed it into requests.

Single-stage handlers have a handler template which means that handler objects can be declared in a single line specifying their api object, min and max API versions. This change extends this nice concept to two-stage handlers as well.

Passing the connection context to the estimator allows the estimator to use the various subsystems to estimate the memory use of a given request.

Currently we estimate that metadata requests take 8000 + rsize * 2 bytes of memory to process, where rsize is the size of the request. Since metadata requests are very small, this end up being roughly 8000 bytes. However, metadata requests which return information about every partition and replica may easily be several MBs in size. To fix this for metadata requests specifically, we use a new more conservative estimate which uses the current topic and partition configuration to give an upper bound on the size.

Prior to this change, we used only the topic and partition data to estimate the size of the metadata response. Now, we also include the approximate size of the broker metadata portion of the response, which may be important if the list of brokers is very large or they have very long hostnames.

We already check the bounds so the cpp core guideline presumably does not apply and we don't want to pay the price for the additional bounds check inside at().

graphcareful

LGTM sweet

travisdowns · 2022-07-14T23:11:02Z

CI failure is #5079

https://buildkite.com/redpanda/redpanda/builds/12588#0181fe9b-e04b-4031-84c5-9ab57e0f159c

github-actions bot added the area/redpanda label Jul 5, 2022

travisdowns force-pushed the td-4804-OOM-fix branch from 20cb02f to f07cdc9 Compare July 5, 2022 19:49

mmedenjak added kind/bug Something isn't working area/controller DW and removed area/redpanda labels Jul 6, 2022

github-actions bot added the area/redpanda label Jul 11, 2022

travisdowns marked this pull request as ready for review July 11, 2022 05:57

travisdowns requested review from dotnwat and NyaliaLui as code owners July 11, 2022 05:57

travisdowns changed the title ~~Td 4804 oom fix~~ Per-handler memory estimation, more convervative estimte for metadata handler Jul 11, 2022

jcsp reviewed Jul 11, 2022

View reviewed changes

jcsp changed the title ~~Per-handler memory estimation, more convervative estimte for metadata handler~~ Per-handler memory estimation, more convervative estimate for metadata handler Jul 11, 2022

travisdowns changed the title ~~Per-handler memory estimation, more convervative estimate for metadata handler~~ Per-handler memory estimation, more conservative estimate for metadata handler Jul 11, 2022

travisdowns force-pushed the td-4804-OOM-fix branch from 2db08a3 to 7827f4d Compare July 14, 2022 06:44

travisdowns requested review from mmaslankaprv and jcsp July 14, 2022 06:47

mmaslankaprv reviewed Jul 14, 2022

View reviewed changes

src/v/kafka/server/connection_context.h Show resolved Hide resolved

mmaslankaprv reviewed Jul 14, 2022

View reviewed changes

graphcareful reviewed Jul 14, 2022

View reviewed changes

travisdowns requested a review from mmaslankaprv July 14, 2022 18:19

travisdowns force-pushed the td-4804-OOM-fix branch from 7827f4d to f211455 Compare July 14, 2022 20:39

travisdowns requested a review from rystsov as a code owner July 14, 2022 20:39

travisdowns commented Jul 14, 2022

View reviewed changes

travisdowns changed the title ~~Per-handler memory estimation, more conservative estimate for metadata handler~~ Per-handler memory estimation, more accurate estimate for metadata handler Jul 14, 2022

travisdowns requested a review from graphcareful July 14, 2022 20:44

travisdowns added 18 commits July 14, 2022 13:46

Clarify behavior of process_next_response

eea0de5

The behavior of process_next_response is worth clarifying as the returned future does not nececessarily wait for all enqueued respones to be finished before resolving. We also rename the method to better reflect its purpose.

Add comment to session resources

5f5c7c6

Improve documentation of throttle related methods

5def3b7

Move max_api_key function to handlers header.

79fe3a5

This lets us share it with the request processing code which would also like to do type list based manipulation of the request types.

Introduce KafkaApiHandlerAny concept

2e4dd22

We already had concepts for one-phase and two-phase handlers, and this concept is simply the union of those two handler concepts, i.e., "any" type of handler.

Rename handler to single_stage_handler

0c8a6e9

We wish to reclaim the name handler for the generic handler interface introduced in the next change.

Add handler_interface unit tests

f80e99d

The handler_interface already gets good functional coverage as it is added to the core request path in requests.cc, but we also include this unit test with basic coverage.

Add support for memory estimation to handlers

38be81c

Currently we use single memory estimate for all kafka request types, but different API calls may use wildly different amounts of memory. This change allows each handler to perform an API-specific calculation instead.

Use handler specific memory estimate

5a9ca08

In connection_context, we now use the handler-specific initial memory use estimate, rather than a single estimate for every handler type.

Sort kafka/server forward includes

213b671

Introduce handler template for two-stage handlers

d645c7a

Single-stage handlers have a handler template which means that handler objects can be declared in a single line specifying their api object, min and max API versions. This change extends this nice concept to two-stage handlers as well.

Add the connection context to the memory estimator

bbea197

Passing the connection context to the estimator allows the estimator to use the various subsystems to estimate the memory use of a given request.

travisdowns force-pushed the td-4804-OOM-fix branch from f211455 to 6cb5a71 Compare July 14, 2022 20:48

travisdowns added 2 commits July 14, 2022 14:26

Add noexcept to handler_for_key.

f649f7f

Add NOLINT to use of operator[]

fd6353f

We already check the bounds so the cpp core guideline presumably does not apply and we don't want to pay the price for the additional bounds check inside at().

graphcareful previously approved these changes Jul 14, 2022

View reviewed changes

travisdowns dismissed graphcareful’s stale review via fd6353f July 14, 2022 23:10

graphcareful approved these changes Jul 14, 2022

View reviewed changes

travisdowns merged commit 0b8e0c1 into redpanda-data:dev Jul 15, 2022

Per-handler memory estimation, more accurate estimate for metadata handler #5346

Per-handler memory estimation, more accurate estimate for metadata handler #5346

Conversation

travisdowns commented Jul 5, 2022 • edited Loading

jcsp left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

travisdowns Jul 12, 2022 • edited Loading

Choose a reason for hiding this comment

travisdowns Jul 12, 2022 • edited Loading

Choose a reason for hiding this comment

jcsp Jul 12, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

travisdowns Jul 14, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

travisdowns Jul 14, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mmaslankaprv commented Jul 14, 2022

graphcareful left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

travisdowns commented Jul 14, 2022

graphcareful commented Jul 14, 2022

travisdowns commented Jul 14, 2022

graphcareful commented Jul 14, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

graphcareful left a comment

Choose a reason for hiding this comment

travisdowns commented Jul 14, 2022

travisdowns commented Jul 5, 2022 •

edited

Loading

travisdowns Jul 12, 2022 •

edited

Loading

travisdowns Jul 12, 2022 •

edited

Loading

jcsp Jul 12, 2022 •

edited

Loading

travisdowns Jul 14, 2022 •

edited

Loading

travisdowns Jul 14, 2022 •

edited

Loading