Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ingester: reduce default -server.grpc-max-concurrent-streams to 500 #5666

Merged
merged 14 commits into from
Oct 9, 2023

Conversation

dimitarvdimitrov
Copy link
Contributor

@dimitarvdimitrov dimitarvdimitrov commented Aug 3, 2023

Context

-server.grpc-max-concurrent-streams controls the number of concurrent streams that a single client connection can serve, but doesn't affect the total number of connections that the ingester can serve. Currently each distributor holds only a single open client connection to each ingester.

What this PR does

This PR sets the value of -server.grpc-max-concurrent-streams to 500. We have been running with this configuration at GL for a few weeks and haven't seen negative effects. The positive effects you can see in the benchmark below.

Why?

The reason for reducing the value is that having a high value for max-concurrent-streams may overwhelm the ingester due to too many inflight streams. It can lead to an out-of-memory error in extreme cases.

For example with 200 distributors and 10,000 streams per client distributor, one ingester could be serving 2,000,000 gRPC requests concurrently. gRPC requests come with overhead and this concurrency may be enough to run the ingester out of memory.

This effectively pushes back the impact of an influx of requests to the distributor. This should be a safer option to handling that in the ingester since scaling distributors is faster and cheaper, and since a crashing distributor recover much more quickly than a crashing ingester.

Benchmark

Details

I ran a benchmark simulating a single distributor client and a single ingester. The distributor tries to send 100,000 requests concurrently. The ingester implementation immediately rejects each request and logs a line to stdout (using go-kit/log). I ran the benchmark with two settings for the ingester max-concurrent-streams: 100 and 10,000. Instead of recording the heap of the process as I recorded the RSS, which also includes the memory overhead of goroutines.

goos: darwin
goarch: arm64
pkg: github.com/grafana/mimir/pkg/ingester/client
                                    │ 10000-max-concurrent-streams.txt │   100-max-concurrent-streams.txt   │
                                    │              sec/op              │   sec/op     vs base               │
IngesterClient_ConcurrentStreams-10                        44.27 ± 32%   12.62 ± 14%  -71.50% (p=0.002 n=6)

                                    │ 10000-max-concurrent-streams.txt │    100-max-concurrent-streams.txt    │
                                    │          max-rss-bytes           │ max-rss-bytes  vs base               │
IngesterClient_ConcurrentStreams-10                      237.82Mi ± 0%    72.96Mi ± 1%  -69.32% (p=0.002 n=6)

The RSS of the ingester process was 3x with the higher concurrency. Surprisingly, the latency to process 100K requests was >3x with increased concurrency. I suspect the reduction in latency is due to reduced contention and context-switching, although I wasn't able to verify that.

Checklist

  • Tests updated
  • [n/a] Documentation added
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

Copy link
Collaborator

@pracucci pracucci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I recommend to follow the usual workflow which is:

  • First, test the change at Grafana Labs via a CLI flag override
  • If everything is fine in production and we're happy with it, then upstream the default config change

@dimitarvdimitrov dimitarvdimitrov marked this pull request as draft August 14, 2023 12:38
@colega
Copy link
Contributor

colega commented Aug 29, 2023

The CHANGELOG has just been cut to prepare for the next Mimir release. Please rebase main and eventually move the CHANGELOG entry added / updated in this PR to the top of the CHANGELOG document. Thanks!

benchstat 100-max-concurrent-streams.txt 10000-max-concurrent-streams.txt
100-max-concurrent-streams.txt:4: missing iteration count
10000-max-concurrent-streams.txt:4: missing iteration count
goos: darwin
goarch: arm64
pkg: github.com/grafana/mimir/pkg/ingester/client
                                    │ 100-max-concurrent-streams.txt │ 10000-max-concurrent-streams.txt │
                                    │             sec/op             │     sec/op      vs base          │
IngesterClient_ConcurrentStreams-10                      5.909 ± 14%      6.066 ± 13%  ~ (p=0.529 n=10)

                                    │ 100-max-concurrent-streams.txt │  10000-max-concurrent-streams.txt  │
                                    │            max-heap            │  max-heap    vs base               │
IngesterClient_ConcurrentStreams-10                      1.942G ± 4%   2.072G ± 1%  +6.69% (p=0.001 n=10)

Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>
Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>
Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>
Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>
Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>
Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>
Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>
Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>
Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>
Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>
Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>
Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>
Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>
Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>
@dimitarvdimitrov dimitarvdimitrov force-pushed the dimitar/ingester/too-many-grpc-streams branch from fc665cd to a9e395d Compare October 2, 2023 15:51
@dimitarvdimitrov dimitarvdimitrov changed the title ingester: reduce default -server.grpc-max-concurrent-streams to 100 ingester: reduce default -server.grpc-max-concurrent-streams to 500 Oct 2, 2023
@dimitarvdimitrov dimitarvdimitrov marked this pull request as ready for review October 2, 2023 16:00
@dimitarvdimitrov
Copy link
Contributor Author

this is again ready for a review. I changed the value from 100 to 500, since that's what we estimated as an upper bound to the number of gRPC streams that an ingester might have to handle from distributors.

We've been running this for a few weeks at grafana labs and don't see any problems related to latency, so it should be safe to make this change for everyone.

Copy link
Collaborator

@pracucci pracucci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@@ -28,6 +28,7 @@ Entries should include a reference to the Pull Request that introduced the chang

## main / unreleased

* [CHANGE] Reduce `-server.grpc-max-concurrent-streams` from 1000 to 500 for ingester and to 100 for all components. #5666
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and to 100 for all components

Is this changed in this PR? I can't se it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change is because I removed the parameter in the YAML config and set it in the ingester statefulSet. The YAML config used to say 1000 and be used on all components. The CLI args in the ingester statefulSet only apply to the ingester

@dimitarvdimitrov dimitarvdimitrov merged commit cdce9af into main Oct 9, 2023
29 checks passed
@dimitarvdimitrov dimitarvdimitrov deleted the dimitar/ingester/too-many-grpc-streams branch October 9, 2023 12:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants