Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

storegateway: Merge series concurrently #7456

Merged
merged 4 commits into from
Mar 13, 2024

Conversation

narqo
Copy link
Contributor

@narqo narqo commented Feb 23, 2024

What this PR does

These changes are based on the observation, that mergedSeriesChunkRefsSet only fetches series from its underlying set iterators, when either of its cursors doesn't have items to read. Only when both cursors are empty, it makes sense to pre-populate items for them concurrently. Otherwise, the additional cost for the goroutine switch will outweigh the cost of two calls to ensureItemAvailableToRead.

Below are the benchmarks that show the effect of these changes:

BenchmarkBucketStore_Series
› benchstat ~/tmp/{1before,2after}.txt
goos: darwin
goarch: arm64
pkg: github.com/grafana/mimir/pkg/storegateway
                                                                                                                                        │ /Users/v/tmp/1before.txt │       /Users/v/tmp/2after.txt       │
                                                                                                                                        │          sec/op          │   sec/op     vs base                │
BucketStore_Series/100000SeriesWith100Samples/with_series_streaming_(1K_per_batch)/streamingBatchSize=0/10000000of10000000-4                           129.3m ± 0%   127.6m ± 0%   -1.33% (p=0.000 n=10)
BucketStore_Series/100000SeriesWith100Samples/with_series_streaming_(10K_per_batch)/streamingBatchSize=0/10000000of10000000-4                          157.3m ± 0%   136.0m ± 0%  -13.53% (p=0.000 n=10)
BucketStore_Series/100000SeriesWith100Samples/with_series_streaming_and_caches_(1K_per_batch)/streamingBatchSize=0/10000000of10000000-4                132.7m ± 0%   131.3m ± 0%   -1.04% (p=0.000 n=10)
geomean                                                                                                                                                139.2m        131.6m        -5.48%

                                                                                                                                        │ /Users/v/tmp/1before.txt │       /Users/v/tmp/2after.txt       │
                                                                                                                                        │           B/op           │     B/op      vs base               │
BucketStore_Series/100000SeriesWith100Samples/with_series_streaming_(1K_per_batch)/streamingBatchSize=0/10000000of10000000-4                          246.3Mi ± 0%   246.3Mi ± 0%  +0.03% (p=0.035 n=10)
BucketStore_Series/100000SeriesWith100Samples/with_series_streaming_(10K_per_batch)/streamingBatchSize=0/10000000of10000000-4                         253.2Mi ± 0%   255.5Mi ± 0%  +0.90% (p=0.000 n=10)
BucketStore_Series/100000SeriesWith100Samples/with_series_streaming_and_caches_(1K_per_batch)/streamingBatchSize=0/10000000of10000000-4               262.2Mi ± 0%   262.2Mi ± 0%       ~ (p=0.529 n=10)
geomean                                                                                                                                               253.8Mi        254.6Mi       +0.31%

                                                                                                                                        │ /Users/v/tmp/1before.txt │      /Users/v/tmp/2after.txt       │
                                                                                                                                        │        allocs/op         │  allocs/op   vs base               │
BucketStore_Series/100000SeriesWith100Samples/with_series_streaming_(1K_per_batch)/streamingBatchSize=0/10000000of10000000-4                           3.135M ± 0%   3.135M ± 0%       ~ (p=0.247 n=10)
BucketStore_Series/100000SeriesWith100Samples/with_series_streaming_(10K_per_batch)/streamingBatchSize=0/10000000of10000000-4                          3.123M ± 0%   3.123M ± 0%       ~ (p=0.280 n=10)
BucketStore_Series/100000SeriesWith100Samples/with_series_streaming_and_caches_(1K_per_batch)/streamingBatchSize=0/10000000of10000000-4                3.235M ± 0%   3.236M ± 0%       ~ (p=0.139 n=10)
geomean                                                                                                                                                3.164M        3.164M       +0.01%

Note that BenchmarkBucketStore_Series involve lots of work, on top of one being done per BucketStore.Series. That is, the benchmarks literally call gRPC over the local network, thus they show a mix of results from the client's and server's POV.

Also, I update BenchmarkMergedSeriesChunkRefsSetIterators benchmarks, adding one extra dimension: without and with IO in the set iterators. This allows to compare the results of the changes in both pure-CPU-bound and IO-bound benchmarks. That is, the CPU-bound workload was very snappy before. But this might be misleading, since the real workload, always include some amount of IO.

BenchmarkMergedSeriesChunkRefsSetIterators
› benchstat ~/tmp/merge-iter-{1before,2after}.txt
goos: darwin
goarch: arm64
pkg: github.com/grafana/mimir/pkg/storegateway
                                                                                                    │ /Users/v/tmp/merge-iter-1before.txt │ /Users/v/tmp/merge-iter-2after.txt  │
                                                                                                    │               sec/op                │   sec/op     vs base                │
MergedSeriesChunkRefsSetIterators/number_of_iterators_=_4/with_duplicates_=_true/with_IO_=_false-2                            11.84µ ± 1%   19.22µ ± 1%  +62.35% (p=0.000 n=10)
MergedSeriesChunkRefsSetIterators/number_of_iterators_=_4/with_duplicates_=_true/with_IO_=_true-2                            1321.0µ ± 0%   358.4µ ± 0%  -72.87% (p=0.000 n=10)
MergedSeriesChunkRefsSetIterators/number_of_iterators_=_4/with_duplicates_=_false/with_IO_=_false-2                           21.16µ ± 1%   28.16µ ± 1%  +33.08% (p=0.000 n=10)
MergedSeriesChunkRefsSetIterators/number_of_iterators_=_4/with_duplicates_=_false/with_IO_=_true-2                            4.180m ± 0%   1.796m ± 0%  -57.04% (p=0.000 n=10)
geomean                                                                                                                       192.9µ        136.6µ       -29.16%

                                                                                                    │ /Users/v/tmp/merge-iter-1before.txt │  /Users/v/tmp/merge-iter-2after.txt   │
                                                                                                    │                B/op                 │     B/op      vs base                 │
MergedSeriesChunkRefsSetIterators/number_of_iterators_=_4/with_duplicates_=_true/with_IO_=_false-2                          1.528Ki ±  0%   3.385Ki ± 0%  +121.47% (p=0.000 n=10)
MergedSeriesChunkRefsSetIterators/number_of_iterators_=_4/with_duplicates_=_true/with_IO_=_true-2                           1.516Ki ± 17%   4.283Ki ± 2%  +182.60% (p=0.000 n=10)
MergedSeriesChunkRefsSetIterators/number_of_iterators_=_4/with_duplicates_=_false/with_IO_=_false-2                         1.532Ki ±  1%   3.392Ki ± 1%  +121.35% (p=0.000 n=10)
MergedSeriesChunkRefsSetIterators/number_of_iterators_=_4/with_duplicates_=_false/with_IO_=_true-2                          1.518Ki ±  0%   4.797Ki ± 8%  +216.12% (p=0.000 n=10)
geomean                                                                                                                     1.523Ki         3.919Ki       +157.25%

                                                                                                    │ /Users/v/tmp/merge-iter-1before.txt │ /Users/v/tmp/merge-iter-2after.txt  │
                                                                                                    │              allocs/op              │ allocs/op   vs base                 │
MergedSeriesChunkRefsSetIterators/number_of_iterators_=_4/with_duplicates_=_true/with_IO_=_false-2                             35.00 ± 0%   80.00 ± 0%  +128.57% (p=0.000 n=10)
MergedSeriesChunkRefsSetIterators/number_of_iterators_=_4/with_duplicates_=_true/with_IO_=_true-2                              35.00 ± 0%   92.00 ± 0%  +162.86% (p=0.000 n=10)
MergedSeriesChunkRefsSetIterators/number_of_iterators_=_4/with_duplicates_=_false/with_IO_=_false-2                            35.00 ± 0%   80.00 ± 0%  +128.57% (p=0.000 n=10)
MergedSeriesChunkRefsSetIterators/number_of_iterators_=_4/with_duplicates_=_false/with_IO_=_true-2                             35.00 ± 0%   94.00 ± 0%  +168.57% (p=0.000 n=10)
geomean                                                                                                                        35.00        86.25       +146.44%

Which issue(s) this PR fixes or relates to

Fixes #4596

Checklist

  • Tests updated.
  • Documentation added.
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX].
  • about-versioning.md updated with experimental features.

Copy link
Contributor

@dimitarvdimitrov dimitarvdimitrov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changes make sense, 13% is quite a lot for such a small change on such a high-level benchmark

@narqo narqo force-pushed the storegw-merge-iter-concurrently branch 2 times, most recently from 522d7f0 to 821383b Compare February 26, 2024 12:43
@narqo narqo marked this pull request as ready for review February 26, 2024 12:43
@narqo narqo requested a review from a team as a code owner February 26, 2024 12:43
Comment on lines 709 to 717
batch := make([]iterator[seriesChunkRefsSet], len(iterators))
for i := 0; i < numIterators; i++ {
if withIO {
// Delay only 2% of iterations to mimic IO ops, happening during real set iterations.
batch[i] = newDelayedMaybeIterator(2, time.Millisecond, iterators[i])
} else {
batch[i] = iterators[i]
}
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the chunk of the diff, that was actually changed here. I.e. for the withIO dimension we wrap the iterators with a delayedIterator, to add some amount of IO to the iterations

time.Sleep(s.delay)
func (s *delayedIterator[S]) delayMaybe() {
if s.chance == 0 || rand.Intn(100) <= s.chance {
time.Sleep(s.delay)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wouldn't this randomness make the benchmarks less reproducible/consistent?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In theory, no. The chance of the delay is the same of all b.N (e.g. "5% of iterations in the benchmark sleep"). I'm not sure, but statistically this should be fine, right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the total run of these benchmarks looks to be at most 10ms. I think missing one or a few of these time.Sleep can make a significant difference (I consider +/- 10% to be significant).

Would it be better to set the delay to be constant on all Next() invocations, but reduce the duration? So if we are aiming for 1ms sleep on 5% of invocations, then this is statistically the same time slept as 50µs sleep on every invocation. In this case on average the time to sleep is the same in both cases, but the average, best, and worst cases are the same when we sleep a consistent amount on every invocation.

Copy link
Contributor Author

@narqo narqo Feb 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed in DM, my initial idea was to avoid having IO (represented by a sleep) on every call to underlying's Iterator.Next(). This shouldn't have been the goal in the first place, and it only makes the changes confusing.

I update this chunk to use a fixed delay. The results of these benchmarks are still the same (updated the description of the PR, also).

pkg/storegateway/series_refs.go Show resolved Hide resolved
@duricanikolic
Copy link
Contributor

The CHANGELOG has just been cut to prepare for the next Mimir release. Please rebase main and eventually move the CHANGELOG entry added / updated in this PR to the top of the CHANGELOG document. Thanks!

Signed-off-by: Vladimir Varankin <vladimir.varankin@grafana.com>
Signed-off-by: Vladimir Varankin <vladimir.varankin@grafana.com>
Signed-off-by: Vladimir Varankin <vladimir.varankin@grafana.com>
@narqo narqo force-pushed the storegw-merge-iter-concurrently branch from 5e946f2 to 7a5b628 Compare March 12, 2024 18:55
@dimitarvdimitrov dimitarvdimitrov merged commit 1ba965a into main Mar 13, 2024
29 checks passed
@dimitarvdimitrov dimitarvdimitrov deleted the storegw-merge-iter-concurrently branch March 13, 2024 10:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

store-gateway: merged series from different blocks concurrently
4 participants