store: add ability to limit max num of samples / concurrent queries #798

GiedriusS · 2019-02-01T15:22:40Z

Add the ability to limit the number of samples that could be retrieved in a single query. We do this by checking how many chunks would have to be downloaded to satisfy any kind of query and multiplying it by 120 -- the new maxSamplesPerChunk constant. Why it was chosen to be 120 is written above it and in the comments down below.

Add the ability to limit the number of concurrent queries - this is done by using a Prometheus gate. Another type has been added which wraps that gate and stores the metrics so that the users would know how many queries are in the queue, what is the limit, and a summary metric which says how long it takes for queries to get through it.

This is a huge pain point for us as it is impossible to do capacity planning otherwise.

Changes

Two new options:

store.grpc.series-sample-limit which lets you control how many samples can be returned at maximum via a single Series() call. Can potentially overestimate the number of samples but that is noted in the command line parameter.
store.grpc.series-max-concurrency which lets you control the maximum number of concurrent queries - they get blocked until there is appropriate space in the channel. Users can look at the new metrics to determine if a lot of queries are waiting to be executed.

New metrics:

thanos_bucket_store_queries_dropped_total shows how many queries were dropped due to the samples limit;
thanos_bucket_store_queries_limit is a constant metric which shows how many concurrent queries can come into Thanos Store;
thanos_bucket_store_queries_in_flight_total shows how many queries are currently "in flight" i.e. they are being executed;
thanos_bucket_store_gate_duration_seconds shows how many seconds it took for queries to pass through the gate in both cases - when that fails and when it does not.

Verification

E2E tests pass which test the new limits, manual testing locally.

bwplotka · 2019-02-01T16:06:26Z

Hey, the idea is super nice and we definitely need this, but will time to review only from next week.

I would aim for the same options as Prometheus has, so sample limit but also concurrent queries limit. We should aim for same expierience basicallly (:

Thanks for this!

GiedriusS · 2019-02-04T17:29:55Z

Sorry for a lot of commits - I noticed after the fact that the index reader only returns chunk references but not data about them. I moved it way down to the place where the retrieved chunks are converted into the "proper" format -- it seems like Prometheus does the same, I took a look at it too. I guess it would be nice to add extra stuff to the e2e test to test out this place at this point. Plus, a new metric which shows how many queries there are at the moment would probably be nice too.

Convert raw chunks into XOR encoded chunks and call the NumSamples() method on them to calculate the number of samples. Rip out the samples calculation into a different function because it is used in two different places.

It should be actually 30 - I miscalculated this.

GiedriusS · 2019-02-05T09:55:58Z

I believe this is ready for a review. My only concern is that AFAICT there can be huge compacted chunks made by thanos compact and then they would be downloaded, and only after the fact we could check that we have exceeded the limit. It seems like the index reader only stores minimum and maximum time, and the chunk ref id but not the other metadata - we have to download that timeseries data first and decode it to check the number of samples. Any thoughts anyone?

bwplotka · 2019-02-05T18:14:47Z

Well each chunk has max ~120 samples, so I think this is fine if we just estimate this.

bwplotka · 2019-02-05T18:15:51Z

Compaction compacts blocs by compacting index file. Chunks stay the same (They are just in different "chunk files", but chunks (or segments) length stay the same.

GiedriusS · 2019-02-06T09:33:58Z

Thank you for clarifying the part about TSDB/compaction - I wasn't so sure since I don't have a good mental model in my brain of how it all works, yet. Then it is all good as we will not have to download huge chunks.

As for the first message - are you saying that we should hardcode the number 120 inside the code? It doesn't seem like a good idea to me. Let me explain why:

I think it is much better if we overestimate the number of samples that would be in the response by ~120 than underestimate it by thousands. It seems to me that a lot of metrics nowadays are short-lived or IOW they are spread over a lot of different chunks - especially in the cases of "service discovery". For example, Consul nodes could only stay for about ~10 minutes, go away, and come back after about ~2 hours. Potentially, at worst we could be off by ~119 samples for each chunk which seems like a lot to me, especially if the query's range is 1.5 years, for example. What is also important to me and users, IMHO, is getting the bang for the buck - I wouldn't want our servers to sit unused just because the software that we use overestimates the load :(
Also, what happens if at some point in the future the default length of a TSDB block from around ~2 hours changes to some other duration? The estimation would be even more off;
Also, I don't see a big problem with downloading the chunks until the limit has been reached. It seems to that with the other option --chunk-pool-size we could achieve a very predictable RAM usage since the Go's GC process should quickly determine that RAM had been deallocated because we aren't allocating ad-hoc objects in this process. The limit is RAM, not network bandwidth; Furthermore, I think the most important thing here is that the query gets stopped at the lowest stage possible and Thanos Query doesn't receive this huge block of data.

Let me know what you think of my comments. If you really think this is the way to go forward and it's the only way it could get merged then I can change it.

bwplotka

Great overall, some suggestions though. 3 major ones regarding:

unnecessary decoding and even downloading.
gate queing time metric
sample limit code

As for the first message - are you saying that we should hardcode the number 120 inside the code? It doesn't seem like a good idea to me.

wouldn't want our servers to sit unused just because the software that we use overestimates the load :(

Agree, but look on my comment, we can estimate with great confidence checking lenght of bytes.

Also, what happens if at some point in the future the default length of a TSDB block from around ~2 hours changes to some other duration? The estimation would be even more off;

Block lenght can be 200 years but chunks will be still 120 samples maxium unless in TSDB we change that. If we do we can think about better solution (: But it won't happen anytime soon. See why here: prometheus-junkyard/tsdb#397

Also, I don't see a big problem with downloading the chunks until the limit has been reached. It seems to that with the other option --chunk-pool-size we could achieve a very predictable RAM usage since the Go's GC process should quickly determine that RAM had been deallocated because we aren't allocating ad-hoc objects in this process. The limit is RAM, not network bandwidth; Furthermore, I think the most important thing here is that the query gets stopped at the lowest stage possible and Thanos Query doesn't receive this huge block of data.

Sure, but we can avoid downloading all, see my comment.

pkg/store/bucket.go

pkg/store/gate.go

FUSAKLA

Generally ok from me and definitely useful feature to have, great! :)

Just few nits and and suggestions.

cmd/thanos/store.go

pkg/store/gate.go

pkg/store/bucket.go

pkg/store/bucket_e2e_test.go

GiedriusS · 2019-02-07T09:52:01Z

Ah, I see now where that 120 value comes from now after looking at prometheus-junkyard/tsdb#397. Now it makes much more sense to me and I agree 👍 just that I will add a stern note to that option so that users would know that it may unlikely overestimate the number of samples it may take for a single Series() request. Plus I didn't want to put more work into maintaining this in the future but now I'm sure that it will almost certainly stay the same. Will rework that part to calculate with this assumption.
As for the gating tests - I'm afraid it would be impossible to do that since it is unfeasible to generate a big enough TSDB in the unit tests due to increased times and load. I'm sure that we can trust that code since it had been battle tested in Prometheus and we are reusing it completely just with another addition of a metric.
I agree with all of the other comments - will fix them :) Thank you again for the reviews! Will cook up the second version of this change soon.

bwplotka

LGTM

adrien-f · 2019-03-17T14:50:05Z

docs/components/store.md

+                                 efficiency we take 120 as number of samples in
+                                 chunk, so the actual number of samples might be
+                                 lower, even though maximum could be hit. Cannot
+                                 be bigger than 120.


I'm confused. You say it cannot be bigger than 120 but default is 50000000 ?

It means that the upper-bound of samples per chunk is 120. Maybe we should move this part into parentheses in the former sentence: NOTE: efficiency we take 120 as the number of samples in a chunk (it cannot be bigger than that), so the actual number of samples might be lower, even though maximum could be hit.
Would it be clearer?

adrien-f · 2019-03-17T14:54:23Z

pkg/store/bucket.go

+		Help: "Number of queries that were dropped due to the sample limit.",
+	})
+	m.queriesLimit = prometheus.NewGauge(prometheus.GaugeOpts{
+		Name: "thanos_bucket_store_queries_limit",


Prometheus' team use prometheus_engine_queries_concurrent_max. Naming is hard but I figure we could be consistent with them ?

https://github.com/prometheus/prometheus/blob/8155cc49924cd48ec552506b4e5519ebafa8b722/promql/engine.go#L234-L239

Good point. Should rename this to thanos_bucket_store_queries_concurrent_max.

domgreen

Great stuff really like adding this ... think in a future PR we could probably apply it to some of the wider components (query could probably do with limiting) so making the flags more generic would help with this in the future.
Comments are mainly nits around conforming with some of the styles that are already in the Prometheus codebase or conventions.
With the default limit we could also set it to 0 meaning no limit and then do a phased rollout ... set it to 0 for the initial release so we know we will not break anyone and make sure it is in the release notes that the following release will set a limit ... this way people can do some planning and testing before it is set hard.

domgreen · 2019-03-15T21:31:52Z

cmd/thanos/store.go

@@ -36,6 +36,12 @@ func registerStore(m map[string]setupFunc, app *kingpin.Application, name string
 	chunkPoolSize := cmd.Flag("chunk-pool-size", "Maximum size of concurrently allocatable bytes for chunks.").
 		Default("2GB").Bytes()

+	maxSampleCount := cmd.Flag("grpc-sample-limit",


Maybe align the flags with how prom has flags? grpc-sample-limit > {query|store|storage}.grpc.read-sample-limit?

See storage.remote.read-sample-limit https://github.com/prometheus/prometheus/blob/master/cmd/prometheus/main.go#L206

domgreen · 2019-03-15T21:33:01Z

cmd/thanos/store.go

+		"Maximum amount of samples returned via a single Series call. 0 means no limit. NOTE: for efficiency we take 120 as number of samples in chunk, so the actual number of samples might be lower, even though maximum could be hit. Cannot be bigger than 120.").
+		Default("50000000").Uint()
+
+	maxConcurrent := cmd.Flag("grpc-concurrent-limit", "Maximum number of concurrent Series calls. 0 means no limit.").Default("20").Int()


As above grpc-concurrent-limit > query.grpc.max-concurrency?

See query.max-concurrency https://github.com/prometheus/prometheus/blob/master/cmd/prometheus/main.go#L233

Will rename to store.grpc.max-concurrency as we are talking about Thanos Store here :P

domgreen · 2019-03-15T22:05:11Z

pkg/store/bucket.go

@@ -42,6 +42,14 @@ import (
 	"google.golang.org/grpc/status"
 )

+// Approximately this is the max number of samples that we may have in any given chunk. This is needed
+// for precalculating the number of samples that we may have to retrieve and decode for any given query
+// without downloading them. Please take a look at https://github.com/prometheus/tsdb/pull/397 to know


Think this is fine 👍

pkg/store/bucket.go

domgreen · 2019-03-15T22:08:34Z

pkg/store/bucket.go

+	// Query gate which limits the maximum amount of concurrent queries.
+	queryGate *Gate
+
+	// Samples limiter which limits the number of samples per each Series() call.


Samples limiter which > samplesLimiter

domgreen · 2019-03-18T09:37:44Z

pkg/store/gate.go

+		g: gate.New(maxConcurrent),
+	}
+	g.inflightQueries = prometheus.NewGauge(prometheus.GaugeOpts{
+		Name:      "queries_in_flight",


queries_in_flight > thanos_store_queries_in_flight_total?

domgreen · 2019-03-18T09:39:45Z

pkg/store/gate.go

+		Subsystem: reg.Subsystem(),
+	})
+	g.gateTiming = prometheus.NewHistogram(prometheus.HistogramOpts{
+		Name: "gate_seconds",


gate_seconds > thanos_store_gate_duration_seconds?

domgreen · 2019-03-18T09:46:26Z

CHANGELOG.md

+New tracing span:
+* `store_query_gate_ismyturn` shows how long it took for a query to pass (or not) through the gate.
+
+:warning: **WARNING** :warning: #798 adds new default limits for max samples per one Series() gRPC method call and the maximum number of concurrent Series() gRPC method calls. Consider increasing them if you have a very huge deployment.


Lets do a phased rollout of this ... can we have 0 mean off ... that way we can phase rollout to people by ensuring we warn them that we will be setting a sensible default and they should ensure their system can deal with it or set it explicitly for their needs.

Sure, perhaps most users won't want this.

CHANGELOG.md

GiedriusS · 2019-03-18T17:27:24Z

You had already reviewed it again @domgreen when I haven't fixed all of the places yet 😄 I only made naming and default values changes since @bwplotka 's review so I think this is ready for merging.

bwplotka

Well, I already approved week ago, so feel free to merge. Found one nit though.

pkg/store/bucket.go

The original error already informs us about what is going wrong.

It's still useful to know that we are talking about samples here exactly.

Setting it to 0 by default doesn't make sense since the Go channel becomes unbuffered and all queries will timeout. Set it to 20 by default since that's the limit on Thanos Query and naturally there won't be more than 20 by default so it's good.

GiedriusS · 2019-03-23T10:21:40Z

I reverted the changes to the default value of --store.grpc.series-max-concurrency to make it 20 as it is on Thanos Query and Prometheus. If the value is 0 then an unbuffered channel is created and all queries start failing. That seemed good with @bwplotka some iterations before this so I think it's good , and as long as we mention this in the changelog, and because we still haven't reached 1.0. Also, in general, it doesn't make sense to provide such "unlimited" interfaces.

GiedriusS · 2019-03-23T10:27:22Z

Thank you all for the reviews and comments, merging (:

GiedriusS force-pushed the feature/store_sample_limit branch from 49db847 to a85e70f Compare February 4, 2019 11:19

GiedriusS changed the title ~~[WIP] store: add ability to limit max num of samples~~ store: add ability to limit max num of samples / concurrent queries Feb 4, 2019

GiedriusS force-pushed the feature/store_sample_limit branch from fff5a06 to 074d3be Compare February 4, 2019 15:16

store: add ability to limit max samples / conc. queries

1bc1f59

GiedriusS force-pushed the feature/store_sample_limit branch from ddeb4ed to 1bc1f59 Compare February 4, 2019 22:04

store/bucket: account for the RawChunk case

e87f763

Convert raw chunks into XOR encoded chunks and call the NumSamples() method on them to calculate the number of samples. Rip out the samples calculation into a different function because it is used in two different places.

GiedriusS force-pushed the feature/store_sample_limit branch 2 times, most recently from cd6cb48 to e87f763 Compare February 5, 2019 09:23

Giedrius Statkevičius added 3 commits February 5, 2019 11:36

store/bucket_e2e_test: adjust sample limit size

1ab1dc6

It should be actually 30 - I miscalculated this.

store/bucket: add metric thanos_bucket_store_queries_limited_total

d7c3ade

store/bucket: register queriesLimited metric

12db24a

GiedriusS mentioned this pull request Feb 6, 2019

store (s3, gcs): invalid memory address or nil pointer dereference #335

Closed

bwplotka mentioned this pull request Feb 6, 2019

store: Improve main pain points for using store gateway against big bucket. #814

Closed

5 tasks

bwplotka requested changes Feb 6, 2019

View reviewed changes

FUSAKLA reviewed Feb 6, 2019

View reviewed changes

store: make changes according to the review comments

9d0b8a7

GiedriusS force-pushed the feature/store_sample_limit branch 2 times, most recently from cd6cb48 to 9d0b8a7 Compare February 8, 2019 09:29

Giedrius Statkevičius added 5 commits February 8, 2019 11:41

docs/store: update

9727072

store: gating naming changes, add span/extra metric

d9c733a

store: improve error messages

c4ce735

store/limiter: improve error messages

30eef19

store/gate: time -> seconds

194394d

store/limiter: remove invalid comment

4d8420f

bwplotka approved these changes Mar 15, 2019

View reviewed changes

adrien-f reviewed Mar 17, 2019

View reviewed changes

Giedrius Statkevičius added 2 commits March 18, 2019 11:23

*: update according to review comments

590b9a6

CHANGELOG: update

3f40bac

domgreen added the component: store label Mar 18, 2019

domgreen self-requested a review March 18, 2019 09:26

domgreen assigned GiedriusS Mar 18, 2019

domgreen reviewed Mar 18, 2019

View reviewed changes

Giedrius Statkevičius added 4 commits March 18, 2019 18:07

*: fix according to review

f4734e5

*: fix according to review

d6c1534

*: make docs

1147acd

CHANGELOG: clean up

1d0fad3

domgreen reviewed Mar 18, 2019

View reviewed changes

CHANGELOG.md Show resolved Hide resolved

domgreen reviewed Mar 18, 2019

View reviewed changes

CHANGELOG.md Outdated Show resolved Hide resolved

CHANGELOG: update

ef4a51e

Giedrius Statkevičius and others added 2 commits March 20, 2019 17:11

Merge remote-tracking branch 'origin' into feature/store_sample_limit

48141fd

*: queries_in_flight_total -> queries_in_flight

c9a7d83

bwplotka approved these changes Mar 22, 2019

View reviewed changes

pkg/store/bucket.go Outdated Show resolved Hide resolved

Giedrius Statkevičius and others added 5 commits March 22, 2019 17:20

Merge branch 'master' into smpl_limit

d71f1d8

store/bucket: do not wraper samplesLimiter error

280a8ca

The original error already informs us about what is going wrong.

store/bucket: err -> errors.Wrap

11c4b18

It's still useful to know that we are talking about samples here exactly.

CHANGELOG: add warning about new limit

6e98dfd

GiedriusS merged commit f24d555 into thanos-io:master Mar 23, 2019

GiedriusS deleted the feature/store_sample_limit branch March 23, 2019 10:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

store: add ability to limit max num of samples / concurrent queries #798

store: add ability to limit max num of samples / concurrent queries #798

GiedriusS commented Feb 1, 2019 •

edited

Loading

bwplotka commented Feb 1, 2019

GiedriusS commented Feb 4, 2019 •

edited

Loading

GiedriusS commented Feb 5, 2019 •

edited

Loading

bwplotka commented Feb 5, 2019

bwplotka commented Feb 5, 2019

GiedriusS commented Feb 6, 2019 •

edited

Loading

bwplotka left a comment

FUSAKLA left a comment

GiedriusS commented Feb 7, 2019 •

edited

Loading

bwplotka left a comment

adrien-f Mar 17, 2019

GiedriusS Mar 17, 2019 •

edited

Loading

adrien-f Mar 17, 2019

GiedriusS Mar 17, 2019

domgreen left a comment

domgreen Mar 15, 2019

domgreen Mar 15, 2019

GiedriusS Mar 18, 2019

domgreen Mar 15, 2019

domgreen Mar 15, 2019

domgreen Mar 18, 2019

domgreen Mar 18, 2019

domgreen Mar 18, 2019

GiedriusS Mar 18, 2019

GiedriusS commented Mar 18, 2019

bwplotka left a comment

GiedriusS commented Mar 23, 2019 •

edited

Loading

GiedriusS commented Mar 23, 2019

store: add ability to limit max num of samples / concurrent queries #798

store: add ability to limit max num of samples / concurrent queries #798

Conversation

GiedriusS commented Feb 1, 2019 • edited Loading

Changes

Verification

bwplotka commented Feb 1, 2019

GiedriusS commented Feb 4, 2019 • edited Loading

GiedriusS commented Feb 5, 2019 • edited Loading

bwplotka commented Feb 5, 2019

bwplotka commented Feb 5, 2019

GiedriusS commented Feb 6, 2019 • edited Loading

bwplotka left a comment

Choose a reason for hiding this comment

FUSAKLA left a comment

Choose a reason for hiding this comment

GiedriusS commented Feb 7, 2019 • edited Loading

bwplotka left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

GiedriusS Mar 17, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

domgreen left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

GiedriusS commented Mar 18, 2019

bwplotka left a comment

Choose a reason for hiding this comment

GiedriusS commented Mar 23, 2019 • edited Loading

GiedriusS commented Mar 23, 2019

GiedriusS commented Feb 1, 2019 •

edited

Loading

GiedriusS commented Feb 4, 2019 •

edited

Loading

GiedriusS commented Feb 5, 2019 •

edited

Loading

GiedriusS commented Feb 6, 2019 •

edited

Loading

GiedriusS commented Feb 7, 2019 •

edited

Loading

GiedriusS Mar 17, 2019 •

edited

Loading

GiedriusS commented Mar 23, 2019 •

edited

Loading