-
Notifications
You must be signed in to change notification settings - Fork 512
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Track push inflight requests via grpc server handlers. #5976
Conversation
Signed-off-by: Peter Štibraný <pstibrany@gmail.com>
Signed-off-by: Peter Štibraný <pstibrany@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Signed-off-by: Peter Štibraný <pstibrany@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Merging, will add integration test in separate PR. |
grafana/dskit#377 moves this code into dskit, and introduces global gRPC server inflight limit + a way of limiting individual methods (like we do in this PR, where only Ingester's Push method is limited). |
|
||
# (experimental) Use experimental method of limiting push requests | ||
# CLI flag: -ingester.limit-inflight-requests-using-grpc-handlers | ||
[limit_inflight_requests_using_grpc_tap_handle: <boolean> | default = false] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
YAML parameter and CLI flag don't match. Can you fix it, please?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will be fixed in #6073 (draft for now)
if err := i.checkRunning(); err != nil { | ||
return nil, util_log.DoNotLogError{Err: err} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why was DoNotLogError{}
removed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was moved from StartPushRequest
to PushWithCleanup
.
When StartPushRequest
is called from gRPC tap handle, no log messages will be logged anyway, because we do logging from interceptor, but interceptor will not run at all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When StartPushRequest is called from gRPC tap handle, no log messages will be logged anyway, because we do logging from interceptor, but interceptor will not run at all.
I see. I think it's a bit obscure. Maybe we can improve the code comments to explain it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will be fixed in #6073 (draft for now)
What this PR does
This PR moves tracking of inflight push requests to gRPC handlers. We install 1) tap handle and 2) stats handler, and they work together to check if another push request is allowed, and track number of ongoing push requests.
Tap handle is called as soon as request headers are received, but before request body is read into memory. At that point we can check and reject request early, or accept it and increase ongoing requests. At this point no interceptor or stats handler has run yet, so if we reject request, the only record of request will be in
cortex_ingester_instance_rejected_requests_total
metric.If tap handle allows the request to proceed, only then stats handlers and interceptors are called. Stats handler is also called after request has finished -- that is where we decrease number of inflight requests again.
Big benefit of this change is that we can reject requests early, before they are read into memory. This protects the ingester from crashing when it receives many push requests.
Since we increase number of ongoing requests in Tap handle, are we guaranteed that
StatsHandler
is called when each time when request finishes?Tap handle is called in
operateHeaders
:mimir/vendor/google.golang.org/grpc/internal/transport/http2_server.go
Lines 567 to 632 in 611e6ba
If request is not rejected, eventually
handle
function is called at the end ofoperateHeaders
.handle
is really a function passed toHandleStreams
:mimir/vendor/google.golang.org/grpc/server.go
Lines 969 to 984 in 611e6ba
This function will either pass request to worker, or start new goroutine. Either way, processing of the request moves to
handleStream
method:mimir/vendor/google.golang.org/grpc/server.go
Lines 1707 to 1770 in 611e6ba
This method can return early with error, without calling stats handler in several cases:
Once processing moves to
processUnaryRPC
orprocessStreamingRPC
, we are guaranteed that stats handlers are called when request is finished viadefer
statement. (Push requests are unary)TODO:
Which issue(s) this PR fixes or relates to
Related to
Checklist
CHANGELOG.md
updated - the order of entries should be[CHANGE]
,[FEATURE]
,[ENHANCEMENT]
,[BUGFIX]