Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

transport: Pass Header metadata to tap handle. #6652

Merged
merged 3 commits into from
Oct 5, 2023

Conversation

pstibrany
Copy link
Contributor

@pstibrany pstibrany commented Sep 21, 2023

This PR extends tap.Info struct with metadata, and modifies http2_server.go to pass available metadata to ServerInHandle function.

This gives tap handle more details that it can use to reject request early.

RELEASE NOTES:

  • tap (experimental): Add Header metadata to tap handler

@linux-foundation-easycla
Copy link

linux-foundation-easycla bot commented Sep 21, 2023

CLA Signed

The committers listed above are authorized under a signed CLA.

@dfawley
Copy link
Member

dfawley commented Sep 21, 2023

May I ask your use case for the tap package? Have you compared this to using interceptors for the same purpose, by any chance?

The other grpc languages don't have this construct (and seem to be doing just fine without it), and we recently noticed that it's excluded from metrics & tracing, which is not very desirable from an observability perspective, and we may not want to introduce metrics callouts specifically for it, either, given its sensitive location. So I was actually considering just calling it deprecated and not recommending anyone to use it.

@pstibrany
Copy link
Contributor Author

pstibrany commented Sep 22, 2023

May I ask your use case for the tap package? Have you compared this to using interceptors for the same purpose, by any chance?

Hi. Our use case is to drop requests early, before spending too much resources on it. We use this when server is already unable to accept more messages (eg. too many inflight requests). We have tried to use interceptors before, but that did not provide adequate protection.

From what I understand, gRPC-go is processing the request in following steps (discussing unary case):

  1. Read request headers.
  2. Call tap handle, if configured. Tap handle can reject the request at this point.
  3. Request is then handed to one of worker goroutines (if configured, and any is free), or new goroutine is spawned.
  4. Generated method handler is called. This method handler receives function to read the request into memory. After reading request, generated method handler then then unmarshals the message.
  5. Unmarshaled message is passed to interceptors and eventually registered handler.

As can be seen in this description, when interceptors run, quite a lot of resources have already been used: possibly new goroutine spawned, request is in memory twice (once as bytes buffer, once as unmarshaled message). We want to avoid all that when we know that server can't possibly handle any more requests at the moment.

When we tried to use interceptors, we couldn't implement the protection effectively, and we could still crash the server under the load, for example by sending enough big messages -- then the server will just OOM.

The other grpc languages don't have this construct (and seem to be doing just fine without it), and we recently noticed that it's excluded from metrics & tracing, which is not very desirable from an observability perspective, and we may not want to introduce metrics callouts specifically for it, either, given its sensitive location. So I was actually considering just calling it deprecated and not recommending anyone to use it.

We've also noticed exclusion from metrics, because our metrics get updated in stats handler and interceptors. However that's easy to fix: we can increment relevant metric when rejecting request in tap handle. As for tracing, in our setup client would log returned grpc error to its trace span. (But we use grpc only in internal communication)

I think that even with these limitations, tap handle is useful feature and gives us much-needed protection against overloading server with too many messages.

@dfawley
Copy link
Member

dfawley commented Sep 22, 2023

Thanks for all the detailed info - it was really helpful!

Do you happen to work with gRPC in other languages besides Go (on the server side)? I'm not sure, but I believe the recommended way of dealing with these kinds of issues would be to use a LimitingListener and MaxConcurrentStreams instead. Have you tried these, by any chance?

It's also possible the other languages tend to provide interceptors at a lower level. Especially if you're looking at the message payload size in unary RPCs, I believe the other languages provide interceptors that are more like our streaming interceptors, but for all RPCs, that do not require the payload to be received and decoded before calling them, as our unary interceptors do. Unary interceptors can be nice for usability, but otherwise have some undesirable properties (e.g. they require implementing things twice if you have both unary and streaming RPCs that want the same interceptor behavior). To that end, we do have #1805 open to provide a streaming-style interceptor even for unary RPCs. It hasn't been prioritized, but might allow for sufficiently early termination of RPCs - it would cut out steps 4 and 5, saving 2 copies of the message payload. Could something like that potentially work for you in place of the tap handle feature?

@pstibrany
Copy link
Contributor Author

pstibrany commented Sep 25, 2023

Thank you for reply!

Do you happen to work with gRPC in other languages besides Go (on the server side)?

Not at the moment.

I'm not sure, but I believe the recommended way of dealing with these kinds of issues would be to use a LimitingListener and MaxConcurrentStreams instead. Have you tried these, by any chance?

We have experimented with MaxConcurrentStreams (which is per-connection), but not with LimitListener.

One problem with this approach is how to find the correct values. For example let's say we know that we can handle 1000 concurrent requests before OOMing. Our number of clients sending requests is dynamic, and we autoscale them with the load. We would need to allow LimitListener to allow "max autoscale" number of clients (say 300). But to handle max 1000 concurrent requests, we would then need to set MaxConcurrentStreams to 1000/300 = 3. That's clearly too low, and aims for worst-case scenario where we run at max number of clients.

Furthermore changing LimitListener's limit or MaxConcurrentStreams would require restart of the service. (At least with current implementations)

Another problem is that this approach doesn't allows us to distinguish between various methods. Eg. we never want to limit health checks. This is also the reason for this PR: we want more details to distinguish requests 😄

To that end, we do have #1805 open to provide a streaming-style interceptor even for unary RPCs. It hasn't been prioritized, but might allow for sufficiently early termination of RPCs - it would cut out steps 4 and 5, saving 2 copies of the message payload. Could something like that potentially work for you in place of the tap handle feature?

I think it would be a step in right direction, although I'm worried that step 3 -- creation of Goroutine -- may still be a problem. (If many clients try to send request all at once, and they create so many goroutines that server crashes with OOM) To give you definitive "Yes", we would need to load-test it. Goroutines problem could perhaps be solved by implementing hard-limit for number of workers, although once again it would probably be global to gRPC server, and not per-method.

Copy link
Member

@dfawley dfawley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can take this PR, but be aware that use of the tap handler feature is not recommended.

Also, can you please add a test for this. There is an existing TestTap in test/end2end_test.go that you can probably pretty easily extend:

func testTap(t *testing.T, e env) {

@pstibrany
Copy link
Contributor Author

I think we can take this PR, but be aware that use of the tap handler feature is not recommended.

👍 Understood. The best I can hope for is that it won't be removed anytime soon (before usable alternative). Thank you!

Also, can you please add a test for this. There is an existing TestTap in test/end2end_test.go that you can probably pretty easily extend:

func testTap(t *testing.T, e env) {

Thank you for the pointer. I was wondering how to add test for this change and missed the end2end test!

@ginayeh ginayeh added Type: Bug Type: Feature New features or improvements in behavior and removed Type: Bug labels Oct 2, 2023
@pstibrany
Copy link
Contributor Author

pstibrany commented Oct 3, 2023

Also, can you please add a test for this. There is an existing TestTap in test/end2end_test.go that you can probably pretty easily extend:

I've extended existing test to cover this change. Please take a look again.

Remove unwanted changes.
Remove unwanted changes.
@dfawley dfawley requested a review from zasweq October 5, 2023 16:40
@dfawley dfawley assigned zasweq and unassigned dfawley Oct 5, 2023
@dfawley
Copy link
Member

dfawley commented Oct 5, 2023

+zasweq for a second look

@dfawley dfawley added this to the 1.59 Release milestone Oct 5, 2023
Copy link
Contributor

@zasweq zasweq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@zasweq zasweq merged commit be7919c into grpc:master Oct 5, 2023
11 checks passed
@pstibrany
Copy link
Contributor Author

Thank you!

@pstibrany pstibrany deleted the pass-header-to-tap-handler branch October 6, 2023 08:15
@bboreham
Copy link
Contributor

bboreham commented Oct 18, 2023

We have experimented with MaxConcurrentStreams (which is per-connection)

Not any more: as of #6703 it is also implemented to limit the whole server.

@dfawley
Copy link
Member

dfawley commented Oct 18, 2023

Not any more: as of #6703 it is also implemented to limit the whole server.

That's not right. #6703 is still a per-connection limit. The difference is it limits the number of running method handlers, not just the number of streams from an HTTP/2 accounting point of view.

@bboreham
Copy link
Contributor

Apologies, this was a fantasy that entered my mind during intensive debugging.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Apr 17, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Type: Feature New features or improvements in behavior
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants