storegateway: Merge series concurrently #7456

narqo · 2024-02-23T14:44:27Z

What this PR does

These changes are based on the observation, that mergedSeriesChunkRefsSet only fetches series from its underlying set iterators, when either of its cursors doesn't have items to read. Only when both cursors are empty, it makes sense to pre-populate items for them concurrently. Otherwise, the additional cost for the goroutine switch will outweigh the cost of two calls to ensureItemAvailableToRead.

Below are the benchmarks that show the effect of these changes:

BenchmarkBucketStore_Series

› benchstat ~/tmp/{1before,2after}.txt
goos: darwin
goarch: arm64
pkg: github.com/grafana/mimir/pkg/storegateway
                                                                                                                                        │ /Users/v/tmp/1before.txt │       /Users/v/tmp/2after.txt       │
                                                                                                                                        │          sec/op          │   sec/op     vs base                │
BucketStore_Series/100000SeriesWith100Samples/with_series_streaming_(1K_per_batch)/streamingBatchSize=0/10000000of10000000-4                           129.3m ± 0%   127.6m ± 0%   -1.33% (p=0.000 n=10)
BucketStore_Series/100000SeriesWith100Samples/with_series_streaming_(10K_per_batch)/streamingBatchSize=0/10000000of10000000-4                          157.3m ± 0%   136.0m ± 0%  -13.53% (p=0.000 n=10)
BucketStore_Series/100000SeriesWith100Samples/with_series_streaming_and_caches_(1K_per_batch)/streamingBatchSize=0/10000000of10000000-4                132.7m ± 0%   131.3m ± 0%   -1.04% (p=0.000 n=10)
geomean                                                                                                                                                139.2m        131.6m        -5.48%

                                                                                                                                        │ /Users/v/tmp/1before.txt │       /Users/v/tmp/2after.txt       │
                                                                                                                                        │           B/op           │     B/op      vs base               │
BucketStore_Series/100000SeriesWith100Samples/with_series_streaming_(1K_per_batch)/streamingBatchSize=0/10000000of10000000-4                          246.3Mi ± 0%   246.3Mi ± 0%  +0.03% (p=0.035 n=10)
BucketStore_Series/100000SeriesWith100Samples/with_series_streaming_(10K_per_batch)/streamingBatchSize=0/10000000of10000000-4                         253.2Mi ± 0%   255.5Mi ± 0%  +0.90% (p=0.000 n=10)
BucketStore_Series/100000SeriesWith100Samples/with_series_streaming_and_caches_(1K_per_batch)/streamingBatchSize=0/10000000of10000000-4               262.2Mi ± 0%   262.2Mi ± 0%       ~ (p=0.529 n=10)
geomean                                                                                                                                               253.8Mi        254.6Mi       +0.31%

                                                                                                                                        │ /Users/v/tmp/1before.txt │      /Users/v/tmp/2after.txt       │
                                                                                                                                        │        allocs/op         │  allocs/op   vs base               │
BucketStore_Series/100000SeriesWith100Samples/with_series_streaming_(1K_per_batch)/streamingBatchSize=0/10000000of10000000-4                           3.135M ± 0%   3.135M ± 0%       ~ (p=0.247 n=10)
BucketStore_Series/100000SeriesWith100Samples/with_series_streaming_(10K_per_batch)/streamingBatchSize=0/10000000of10000000-4                          3.123M ± 0%   3.123M ± 0%       ~ (p=0.280 n=10)
BucketStore_Series/100000SeriesWith100Samples/with_series_streaming_and_caches_(1K_per_batch)/streamingBatchSize=0/10000000of10000000-4                3.235M ± 0%   3.236M ± 0%       ~ (p=0.139 n=10)
geomean                                                                                                                                                3.164M        3.164M       +0.01%

Note that BenchmarkBucketStore_Series involve lots of work, on top of one being done per BucketStore.Series. That is, the benchmarks literally call gRPC over the local network, thus they show a mix of results from the client's and server's POV.

Also, I update BenchmarkMergedSeriesChunkRefsSetIterators benchmarks, adding one extra dimension: without and with IO in the set iterators. This allows to compare the results of the changes in both pure-CPU-bound and IO-bound benchmarks. That is, the CPU-bound workload was very snappy before. But this might be misleading, since the real workload, always include some amount of IO.

BenchmarkMergedSeriesChunkRefsSetIterators

› benchstat ~/tmp/merge-iter-{1before,2after}.txt
goos: darwin
goarch: arm64
pkg: github.com/grafana/mimir/pkg/storegateway
                                                                                                    │ /Users/v/tmp/merge-iter-1before.txt │ /Users/v/tmp/merge-iter-2after.txt  │
                                                                                                    │               sec/op                │   sec/op     vs base                │
MergedSeriesChunkRefsSetIterators/number_of_iterators_=_4/with_duplicates_=_true/with_IO_=_false-2                            11.84µ ± 1%   19.22µ ± 1%  +62.35% (p=0.000 n=10)
MergedSeriesChunkRefsSetIterators/number_of_iterators_=_4/with_duplicates_=_true/with_IO_=_true-2                            1321.0µ ± 0%   358.4µ ± 0%  -72.87% (p=0.000 n=10)
MergedSeriesChunkRefsSetIterators/number_of_iterators_=_4/with_duplicates_=_false/with_IO_=_false-2                           21.16µ ± 1%   28.16µ ± 1%  +33.08% (p=0.000 n=10)
MergedSeriesChunkRefsSetIterators/number_of_iterators_=_4/with_duplicates_=_false/with_IO_=_true-2                            4.180m ± 0%   1.796m ± 0%  -57.04% (p=0.000 n=10)
geomean                                                                                                                       192.9µ        136.6µ       -29.16%

                                                                                                    │ /Users/v/tmp/merge-iter-1before.txt │  /Users/v/tmp/merge-iter-2after.txt   │
                                                                                                    │                B/op                 │     B/op      vs base                 │
MergedSeriesChunkRefsSetIterators/number_of_iterators_=_4/with_duplicates_=_true/with_IO_=_false-2                          1.528Ki ±  0%   3.385Ki ± 0%  +121.47% (p=0.000 n=10)
MergedSeriesChunkRefsSetIterators/number_of_iterators_=_4/with_duplicates_=_true/with_IO_=_true-2                           1.516Ki ± 17%   4.283Ki ± 2%  +182.60% (p=0.000 n=10)
MergedSeriesChunkRefsSetIterators/number_of_iterators_=_4/with_duplicates_=_false/with_IO_=_false-2                         1.532Ki ±  1%   3.392Ki ± 1%  +121.35% (p=0.000 n=10)
MergedSeriesChunkRefsSetIterators/number_of_iterators_=_4/with_duplicates_=_false/with_IO_=_true-2                          1.518Ki ±  0%   4.797Ki ± 8%  +216.12% (p=0.000 n=10)
geomean                                                                                                                     1.523Ki         3.919Ki       +157.25%

                                                                                                    │ /Users/v/tmp/merge-iter-1before.txt │ /Users/v/tmp/merge-iter-2after.txt  │
                                                                                                    │              allocs/op              │ allocs/op   vs base                 │
MergedSeriesChunkRefsSetIterators/number_of_iterators_=_4/with_duplicates_=_true/with_IO_=_false-2                             35.00 ± 0%   80.00 ± 0%  +128.57% (p=0.000 n=10)
MergedSeriesChunkRefsSetIterators/number_of_iterators_=_4/with_duplicates_=_true/with_IO_=_true-2                              35.00 ± 0%   92.00 ± 0%  +162.86% (p=0.000 n=10)
MergedSeriesChunkRefsSetIterators/number_of_iterators_=_4/with_duplicates_=_false/with_IO_=_false-2                            35.00 ± 0%   80.00 ± 0%  +128.57% (p=0.000 n=10)
MergedSeriesChunkRefsSetIterators/number_of_iterators_=_4/with_duplicates_=_false/with_IO_=_true-2                             35.00 ± 0%   94.00 ± 0%  +168.57% (p=0.000 n=10)
geomean                                                                                                                        35.00        86.25       +146.44%

Which issue(s) this PR fixes or relates to

Fixes #4596

Checklist

Tests updated.
Documentation added.
CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX].
about-versioning.md updated with experimental features.

dimitarvdimitrov

changes make sense, 13% is quite a lot for such a small change on such a high-level benchmark

narqo · 2024-02-26T12:46:06Z

pkg/storegateway/series_refs_test.go

+	batch := make([]iterator[seriesChunkRefsSet], len(iterators))
+	for i := 0; i < numIterators; i++ {
+		if withIO {
+			// Delay only 2% of iterations to mimic IO ops, happening during real set iterations.
+			batch[i] = newDelayedMaybeIterator(2, time.Millisecond, iterators[i])
+		} else {
+			batch[i] = iterators[i]
+		}
+	}


This is the chunk of the diff, that was actually changed here. I.e. for the withIO dimension we wrap the iterators with a delayedIterator, to add some amount of IO to the iterations

pkg/storegateway/series_refs.go

dimitarvdimitrov · 2024-02-27T13:27:48Z

pkg/storegateway/series_chunks_test.go

-	time.Sleep(s.delay)
+func (s *delayedIterator[S]) delayMaybe() {
+	if s.chance == 0 || rand.Intn(100) <= s.chance {
+		time.Sleep(s.delay)


wouldn't this randomness make the benchmarks less reproducible/consistent?

In theory, no. The chance of the delay is the same of all b.N (e.g. "5% of iterations in the benchmark sleep"). I'm not sure, but statistically this should be fine, right?

the total run of these benchmarks looks to be at most 10ms. I think missing one or a few of these time.Sleep can make a significant difference (I consider +/- 10% to be significant).

Would it be better to set the delay to be constant on all Next() invocations, but reduce the duration? So if we are aiming for 1ms sleep on 5% of invocations, then this is statistically the same time slept as 50µs sleep on every invocation. In this case on average the time to sleep is the same in both cases, but the average, best, and worst cases are the same when we sleep a consistent amount on every invocation.

As discussed in DM, my initial idea was to avoid having IO (represented by a sleep) on every call to underlying's Iterator.Next(). This shouldn't have been the goal in the first place, and it only makes the changes confusing.

I update this chunk to use a fixed delay. The results of these benchmarks are still the same (updated the description of the PR, also).

pkg/storegateway/series_refs.go

duricanikolic · 2024-03-12T09:58:27Z

The CHANGELOG has just been cut to prepare for the next Mimir release. Please rebase main and eventually move the CHANGELOG entry added / updated in this PR to the top of the CHANGELOG document. Thanks!

Signed-off-by: Vladimir Varankin <vladimir.varankin@grafana.com>

narqo requested a review from dimitarvdimitrov February 23, 2024 14:44

dimitarvdimitrov reviewed Feb 23, 2024

View reviewed changes

narqo force-pushed the storegw-merge-iter-concurrently branch 2 times, most recently from 522d7f0 to 821383b Compare February 26, 2024 12:43

narqo marked this pull request as ready for review February 26, 2024 12:43

narqo requested a review from a team as a code owner February 26, 2024 12:43

narqo commented Feb 26, 2024

View reviewed changes

narqo requested a review from dimitarvdimitrov February 26, 2024 12:46

pracucci reviewed Feb 26, 2024

View reviewed changes

pkg/storegateway/series_refs.go Show resolved Hide resolved

dimitarvdimitrov reviewed Feb 27, 2024

View reviewed changes

narqo requested a review from dimitarvdimitrov February 28, 2024 12:12

narqo force-pushed the storegw-merge-iter-concurrently branch from 6824c24 to 5e946f2 Compare February 29, 2024 07:48

duricanikolic added the release/notified-changelog-cut label Mar 12, 2024

narqo added 4 commits March 12, 2024 19:53

storegateway: merge series concurrently

f783447

Signed-off-by: Vladimir Varankin <vladimir.varankin@grafana.com>

storegateway: update BenchmarkMergedSeriesChunkRefsSetIterators

d7d849f

Signed-off-by: Vladimir Varankin <vladimir.varankin@grafana.com>

Update changelog

c296e8e

Signed-off-by: Vladimir Varankin <vladimir.varankin@grafana.com>

fixup! storegateway: update BenchmarkMergedSeriesChunkRefsSetIterators

7a5b628

narqo force-pushed the storegw-merge-iter-concurrently branch from 5e946f2 to 7a5b628 Compare March 12, 2024 18:55

dimitarvdimitrov approved these changes Mar 13, 2024

View reviewed changes

dimitarvdimitrov merged commit 1ba965a into main Mar 13, 2024
29 checks passed

dimitarvdimitrov deleted the storegw-merge-iter-concurrently branch March 13, 2024 10:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

storegateway: Merge series concurrently #7456

storegateway: Merge series concurrently #7456

narqo commented Feb 23, 2024 •

edited

Loading

dimitarvdimitrov left a comment

narqo Feb 26, 2024

dimitarvdimitrov Feb 27, 2024

narqo Feb 28, 2024

dimitarvdimitrov Feb 28, 2024

narqo Feb 28, 2024 •

edited

Loading

duricanikolic commented Mar 12, 2024

storegateway: Merge series concurrently #7456

storegateway: Merge series concurrently #7456

Conversation

narqo commented Feb 23, 2024 • edited Loading

What this PR does

Which issue(s) this PR fixes or relates to

Checklist

dimitarvdimitrov left a comment

Choose a reason for hiding this comment

narqo Feb 26, 2024

Choose a reason for hiding this comment

dimitarvdimitrov Feb 27, 2024

Choose a reason for hiding this comment

narqo Feb 28, 2024

Choose a reason for hiding this comment

dimitarvdimitrov Feb 28, 2024

Choose a reason for hiding this comment

narqo Feb 28, 2024 • edited Loading

Choose a reason for hiding this comment

duricanikolic commented Mar 12, 2024

narqo commented Feb 23, 2024 •

edited

Loading

narqo Feb 28, 2024 •

edited

Loading