Mimir query engine: add memory consumption per query limit #8230

charleskorn · 2024-05-31T06:44:27Z

What this PR does

This PR implements a per-query estimated memory consumption limit for the Mimir query engine.

The estimate is based on the primary contributors to memory consumption: samples (eg. promql.FPoints) and running totals (eg. the slices of float64s used by sum() aggregations).

The estimate ignores other contributions to a query's memory consumption like chunks and series labels. These could be added in the future if need be.

The limit is enforced as slices are created during the query, and is based on the capacity of the slice created, not the size requested. These are not necessarily the same: we use bucketed pools for each of these slice types, and the pool will allocate a slice of capacity equal to the bucket that will hold the requested size, which will always be greater or equal to the requested size. This means the limit more closely tracks the actual memory utilisation of the query, but may be slightly higher than otherwise expected.

The estimate is generally accurate, except for:

For native histograms, we assume a fixed size per histogram sample (see nativeHistogramSampleSizeFactor in limiting_pool.go), as tracking the true size of each native histogram would be very expensive. A future improvement would be to make nativeHistogramSampleSizeFactor configurable, but I think this is fine for now.
The size of slice headers is ignored: the estimate only considers the memory used by slice elements.

Enforcing the limit adds up to 1% latency overhead to some benchmarks, but this seems worthwhile.

This change also required some shuffling of types between packages to help ensure that the underlying pools are not accessed directly and all allocations go through the limit-enforcing methods. In the interests of keeping this PR as small as possible, I haven't done all the refactoring I'd like to do and will do this in a future PR. In particular, I'd like to move the Operator interface and RingBuffer type to the types package, and rename the operator package to operators.

I'd also like to use the PeakEstimatedMemoryConsumptionBytes in a metric and log it on query traces, but this too will come in a follow-up PR.

Which issue(s) this PR fixes or relates to

(none)

Checklist

Tests updated.
Documentation added.
CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX].
about-versioning.md updated with experimental features.

…r of samples

jhesketh

Awesome stuff, looks really good to me. I agree with the planned work (specifically exposing the metric), but makes sense to split it up like this.

pkg/streamingpromql/pooling/limiting_pool_test.go

* Move interface definitions to `types` package * Move operators to `operators` package * Add changelog entry

) * Make formatting consistent * Initial version of `LimitingPool` * Move to `operator` package * Move `FPoint` and `HPoint` pools next to `LimitingPool` * Add methods for slices of `HPoint` to `LimitingPool` * Use `LimitingPool` everywhere * Move pool to its own package and introduce interface * Move pool interface to `types` package * Add documentation for `err-mimir-max-in-memory-samples-per-query` * Add limit CLI flag and config option * Add (failing) tests * Fix linting warnings * Add another test case * Add more slice types to `LimitedPool`. * Rework limit to use estimated memory consumption, rather than a number of samples * Ensure float and bool slices are cleared. * Update tests to use bytes rather than samples limit * Add limit to list of experimental features * Add changelog entry * Fix linting warning * Fix description of error * Remove unnecessary interface and early enforcement of limit * Fix flag name * Remove unnecessary interface * Remove unused methods * Address PR feedback

…afana#8247) * Move interface definitions to `types` package * Move operators to `operators` package * Add changelog entry

charleskorn added 25 commits May 30, 2024 14:24

Make formatting consistent

3a79cfd

Initial version of LimitingPool

40c427a

Move to operator package

7e0a53e

Move FPoint and HPoint pools next to LimitingPool

ebecf9e

Add methods for slices of HPoint to LimitingPool

0b03553

Use LimitingPool everywhere

27f24fc

Move pool to its own package and introduce interface

0493b93

Move pool interface to types package

1f3490e

Add documentation for err-mimir-max-in-memory-samples-per-query

033fe4a

Add limit CLI flag and config option

b0c807b

Add (failing) tests

f89d7c4

Fix linting warnings

3241525

Add another test case

d86a2ab

Add more slice types to LimitedPool.

c9e6ba0

Rework limit to use estimated memory consumption, rather than a numbe…

290f3fe

…r of samples

Ensure float and bool slices are cleared.

29ccb77

Update tests to use bytes rather than samples limit

f620085

Add limit to list of experimental features

6f105c0

Add changelog entry

4baab11

Fix linting warning

da341b8

Fix description of error

0b8b82d

Remove unnecessary interface and early enforcement of limit

c69cc08

Fix flag name

499991e

Remove unnecessary interface

fa70180

Remove unused methods

479bf0c

charleskorn marked this pull request as ready for review June 2, 2024 23:29

charleskorn requested review from jdbaldry and a team as code owners June 2, 2024 23:29

jhesketh approved these changes Jun 3, 2024

View reviewed changes

pkg/streamingpromql/pooling/limiting_pool_test.go Show resolved Hide resolved

Address PR feedback

86649ca

charleskorn enabled auto-merge (squash) June 3, 2024 04:48

charleskorn merged commit 45a683e into main Jun 3, 2024
29 checks passed

charleskorn deleted the charleskorn/max-samples-limit branch June 3, 2024 05:00

charleskorn mentioned this pull request Jun 3, 2024

Mimir query engine: finish reorganisation started in #8230 #8247

Merged

2 tasks

charleskorn added a commit that referenced this pull request Jun 4, 2024

Mimir query engine: finish reorganisation started in #8230 (#8247)

006ee9c

* Move interface definitions to `types` package * Move operators to `operators` package * Add changelog entry

charleskorn mentioned this pull request Jun 4, 2024

Mimir query engine: report estimated peak memory consumption as a metric and in traces #8270

Merged

2 tasks

charleskorn mentioned this pull request Jun 7, 2024

Mimir query engine: report queries rejected due to hitting the memory consumption limit in the cortex_querier_queries_rejected_total metric #8303

Merged

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Mimir query engine: add memory consumption per query limit #8230

Mimir query engine: add memory consumption per query limit #8230

charleskorn commented May 31, 2024 •

edited

Loading

jhesketh left a comment

Mimir query engine: add memory consumption per query limit #8230

Mimir query engine: add memory consumption per query limit #8230

Conversation

charleskorn commented May 31, 2024 • edited Loading

What this PR does

Which issue(s) this PR fixes or relates to

Checklist

jhesketh left a comment

Choose a reason for hiding this comment

charleskorn commented May 31, 2024 •

edited

Loading