Skip to content

Commit

Permalink
Mimir query engine: add memory consumption per query limit (#8230)
Browse files Browse the repository at this point in the history
* Make formatting consistent

* Initial version of `LimitingPool`

* Move to `operator` package

* Move `FPoint` and `HPoint` pools next to `LimitingPool`

* Add methods for slices of `HPoint` to `LimitingPool`

* Use `LimitingPool` everywhere

* Move pool to its own package and introduce interface

* Move pool interface to `types` package

* Add documentation for `err-mimir-max-in-memory-samples-per-query`

* Add limit CLI flag and config option

* Add (failing) tests

* Fix linting warnings

* Add another test case

* Add more slice types to `LimitedPool`.

* Rework limit to use estimated memory consumption, rather than a number of samples

* Ensure float and bool slices are cleared.

* Update tests to use bytes rather than samples limit

* Add limit to list of experimental features

* Add changelog entry

* Fix linting warning

* Fix description of error

* Remove unnecessary interface and early enforcement of limit

* Fix flag name

* Remove unnecessary interface

* Remove unused methods

* Address PR feedback
  • Loading branch information
charleskorn committed Jun 3, 2024
1 parent c212608 commit 45a683e
Show file tree
Hide file tree
Showing 32 changed files with 1,101 additions and 475 deletions.
2 changes: 1 addition & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@
* [FEATURE] Continuous-test: now runable as a module with `mimir -target=continuous-test`. #7747
* [FEATURE] Store-gateway: Allow specific tenants to be enabled or disabled via `-store-gateway.enabled-tenants` or `-store-gateway.disabled-tenants` CLI flags or their corresponding YAML settings. #7653
* [FEATURE] New `-<prefix>.s3.bucket-lookup-type` flag configures lookup style type, used to access bucket in s3 compatible providers. #7684
* [FEATURE] Querier: add experimental streaming PromQL engine, enabled with `-querier.promql-engine=streaming`. #7693 #7898 #7899 #8023 #8058 #8096 #8121 #8197
* [FEATURE] Querier: add experimental streaming PromQL engine, enabled with `-querier.promql-engine=streaming`. #7693 #7898 #7899 #8023 #8058 #8096 #8121 #8197 #8230
* [FEATURE] New `/ingester/unregister-on-shutdown` HTTP endpoint allows dynamic access to ingesters' `-ingester.ring.unregister-on-shutdown` configuration. #7739
* [FEATURE] Server: added experimental [PROXY protocol support](https://www.haproxy.org/download/2.3/doc/proxy-protocol.txt). The PROXY protocol support can be enabled via `-server.proxy-protocol-enabled=true`. When enabled, the support is added both to HTTP and gRPC listening ports. #7698
* [FEATURE] mimirtool: Add `runtime-config verify` sub-command, for verifying Mimir runtime config files. #8123
Expand Down
11 changes: 11 additions & 0 deletions cmd/mimir/config-descriptor.json
Original file line number Diff line number Diff line change
Expand Up @@ -3556,6 +3556,17 @@
"fieldFlag": "querier.max-fetched-chunk-bytes-per-query",
"fieldType": "int"
},
{
"kind": "field",
"name": "max_estimated_memory_consumption_per_query",
"required": false,
"desc": "The maximum estimated memory a single query can consume at once, in bytes. This limit is only enforced when Mimir's query engine is in use. This limit is enforced in the querier. 0 to disable.",
"fieldValue": null,
"fieldDefaultValue": 0,
"fieldFlag": "querier.max-estimated-memory-consumption-per-query",
"fieldType": "int",
"fieldCategory": "experimental"
},
{
"kind": "field",
"name": "max_query_lookback",
Expand Down
2 changes: 2 additions & 0 deletions cmd/mimir/help-all.txt.tmpl
Original file line number Diff line number Diff line change
Expand Up @@ -1743,6 +1743,8 @@ Usage of ./cmd/mimir/mimir:
The number of workers running in each querier process. This setting limits the maximum number of concurrent queries in each querier. (default 20)
-querier.max-estimated-fetched-chunks-per-query-multiplier float
[experimental] Maximum number of chunks estimated to be fetched in a single query from ingesters and store-gateways, as a multiple of -querier.max-fetched-chunks-per-query. This limit is enforced in the querier. Must be greater than or equal to 1, or 0 to disable.
-querier.max-estimated-memory-consumption-per-query uint
[experimental] The maximum estimated memory a single query can consume at once, in bytes. This limit is only enforced when Mimir's query engine is in use. This limit is enforced in the querier. 0 to disable.
-querier.max-fetched-chunk-bytes-per-query int
The maximum size of all chunks in bytes that a query can fetch from ingesters and store-gateways. This limit is enforced in the querier and ruler. 0 to disable.
-querier.max-fetched-chunks-per-query int
Expand Down
1 change: 1 addition & 0 deletions docs/sources/mimir/configure/about-versioning.md
Original file line number Diff line number Diff line change
Expand Up @@ -133,6 +133,7 @@ The following features are currently experimental:
- Enable PromQL experimental functions (`-querier.promql-experimental-functions-enabled`)
- Allow streaming of `/active_series` responses to the frontend (`-querier.response-streaming-enabled`)
- Streaming PromQL engine (`-querier.promql-engine=streaming` and `-querier.enable-promql-engine-fallback`)
- Maximum estimated memory consumption per query limit (`-querier.max-estimated-memory-consumption-per-query`)
- Query-frontend
- `-query-frontend.querier-forget-delay`
- Instant query splitting (`-query-frontend.split-instant-queries-by-interval`)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -3169,6 +3169,12 @@ The `limits` block configures default and per-tenant limits imposed by component
# CLI flag: -querier.max-fetched-chunk-bytes-per-query
[max_fetched_chunk_bytes_per_query: <int> | default = 0]
# (experimental) The maximum estimated memory a single query can consume at
# once, in bytes. This limit is only enforced when Mimir's query engine is in
# use. This limit is enforced in the querier. 0 to disable.
# CLI flag: -querier.max-estimated-memory-consumption-per-query
[max_estimated_memory_consumption_per_query: <int> | default = 0]
# Limit how long back data (series and metadata) can be queried, up until
# <lookback> duration ago. This limit is enforced in the query-frontend, querier
# and ruler. If the requested time range is outside the allowed range, the
Expand Down
17 changes: 17 additions & 0 deletions docs/sources/mimir/manage/mimir-runbooks/_index.md
Original file line number Diff line number Diff line change
Expand Up @@ -2004,6 +2004,23 @@ How to **fix** it:
- Consider increasing the global limit by using the `-querier.max-fetched-chunk-bytes-per-query` option.
- Consider increasing the limit on a per-tenant basis by using the `max_fetched_chunk_bytes_per_query` per-tenant override in the runtime configuration.

### err-mimir-max-estimated-memory-consumption-per-query

This error occurs when execution of a query exceeds the limit on the maximum estimated memory consumed by a single query.

This limit is used to protect the system’s stability from potential abuse or mistakes, when running a query fetching a huge amount of data.
This limit only applies when Mimir's query engine is used (ie. `-querier.promql-engine=streaming`).
To configure the limit on a global basis, use the `-querier.max-estimated-memory-consumption-per-query` option.
To configure the limit on a per-tenant basis, set the `max_estimated_memory_consumption_per_query` per-tenant override in the runtime configuration.

How to **fix** it:

- Consider reducing the time range of the query.
- Consider reducing the cardinality of the query. To reduce the cardinality of the query, you can add more label matchers to the query, restricting the set of matching series.
- Consider applying aggregations such as `sum` or `avg` to the query.
- Consider increasing the global limit by using the `-querier.max-estimated-memory-consumption-per-query` option.
- Consider increasing the limit on a per-tenant basis by using the `max_estimated_memory_consumption_per_query` per tenant-override in the runtime configuration.

### err-mimir-max-query-length

This error occurs when the time range of a partial (after possible splitting, sharding by the query-frontend) query exceeds the configured maximum length. For a limit on the total query length, see [err-mimir-max-total-query-length](#err-mimir-max-total-query-length).
Expand Down
16 changes: 15 additions & 1 deletion pkg/querier/querier.go
Original file line number Diff line number Diff line change
Expand Up @@ -164,7 +164,8 @@ func New(cfg Config, limits *validation.Overrides, distributor Distributor, stor
case standardPromQLEngine:
eng = promql.NewEngine(opts)
case streamingPromQLEngine:
streamingEngine, err := streamingpromql.NewEngine(opts)
limitsProvider := &tenantQueryLimitsProvider{limits: limits}
streamingEngine, err := streamingpromql.NewEngine(opts, limitsProvider)
if err != nil {
return nil, nil, nil, err
}
Expand Down Expand Up @@ -623,3 +624,16 @@ func logClampEvent(spanLog *spanlogger.SpanLogger, originalT, clampedT int64, mi
"updated", util.TimeFromMillis(clampedT).String(),
)
}

type tenantQueryLimitsProvider struct {
limits *validation.Overrides
}

func (p *tenantQueryLimitsProvider) GetMaxEstimatedMemoryConsumptionPerQuery(ctx context.Context) (uint64, error) {
tenantID, err := tenant.TenantID(ctx)
if err != nil {
return 0, err
}

return p.limits.MaxEstimatedMemoryConsumptionPerQuery(tenantID), nil
}
6 changes: 3 additions & 3 deletions pkg/streamingpromql/benchmarks/comparison_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,7 @@ func BenchmarkQuery(b *testing.B) {

opts := streamingpromql.NewTestEngineOpts()
prometheusEngine := promql.NewEngine(opts)
streamingEngine, err := streamingpromql.NewEngine(opts)
streamingEngine, err := streamingpromql.NewEngine(opts, streamingpromql.NewStaticQueryLimitsProvider(0))
require.NoError(b, err)

// Important: the names below must remain in sync with the names used in tools/benchmark-query-engine.
Expand Down Expand Up @@ -96,7 +96,7 @@ func TestBothEnginesReturnSameResultsForBenchmarkQueries(t *testing.T) {

opts := streamingpromql.NewTestEngineOpts()
prometheusEngine := promql.NewEngine(opts)
streamingEngine, err := streamingpromql.NewEngine(opts)
streamingEngine, err := streamingpromql.NewEngine(opts, streamingpromql.NewStaticQueryLimitsProvider(0))
require.NoError(t, err)

ctx := user.InjectOrgID(context.Background(), UserID)
Expand All @@ -123,7 +123,7 @@ func TestBenchmarkSetup(t *testing.T) {
q := createBenchmarkQueryable(t, []int{1})

opts := streamingpromql.NewTestEngineOpts()
streamingEngine, err := streamingpromql.NewEngine(opts)
streamingEngine, err := streamingpromql.NewEngine(opts, streamingpromql.NewStaticQueryLimitsProvider(0))
require.NoError(t, err)

ctx := user.InjectOrgID(context.Background(), UserID)
Expand Down
42 changes: 33 additions & 9 deletions pkg/streamingpromql/engine.go
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ import (

const defaultLookbackDelta = 5 * time.Minute // This should be the same value as github.com/prometheus/prometheus/promql.defaultLookbackDelta.

func NewEngine(opts promql.EngineOpts) (promql.QueryEngine, error) {
func NewEngine(opts promql.EngineOpts, limitsProvider QueryLimitsProvider) (promql.QueryEngine, error) {
lookbackDelta := opts.LookbackDelta
if lookbackDelta == 0 {
lookbackDelta = defaultLookbackDelta
Expand All @@ -36,21 +36,23 @@ func NewEngine(opts promql.EngineOpts) (promql.QueryEngine, error) {
}

return &Engine{
lookbackDelta: lookbackDelta,
timeout: opts.Timeout,
lookbackDelta: lookbackDelta,
timeout: opts.Timeout,
limitsProvider: limitsProvider,
}, nil
}

type Engine struct {
lookbackDelta time.Duration
timeout time.Duration
lookbackDelta time.Duration
timeout time.Duration
limitsProvider QueryLimitsProvider
}

func (e *Engine) NewInstantQuery(_ context.Context, q storage.Queryable, opts promql.QueryOpts, qs string, ts time.Time) (promql.Query, error) {
return newQuery(q, opts, qs, ts, ts, 0, e)
func (e *Engine) NewInstantQuery(ctx context.Context, q storage.Queryable, opts promql.QueryOpts, qs string, ts time.Time) (promql.Query, error) {
return newQuery(ctx, q, opts, qs, ts, ts, 0, e)
}

func (e *Engine) NewRangeQuery(_ context.Context, q storage.Queryable, opts promql.QueryOpts, qs string, start, end time.Time, interval time.Duration) (promql.Query, error) {
func (e *Engine) NewRangeQuery(ctx context.Context, q storage.Queryable, opts promql.QueryOpts, qs string, start, end time.Time, interval time.Duration) (promql.Query, error) {
if interval <= 0 {
return nil, fmt.Errorf("%v is not a valid interval for a range query, must be greater than 0", interval)
}
Expand All @@ -59,5 +61,27 @@ func (e *Engine) NewRangeQuery(_ context.Context, q storage.Queryable, opts prom
return nil, fmt.Errorf("range query time range is invalid: end time %v is before start time %v", end.Format(time.RFC3339), start.Format(time.RFC3339))
}

return newQuery(q, opts, qs, start, end, interval, e)
return newQuery(ctx, q, opts, qs, start, end, interval, e)
}

type QueryLimitsProvider interface {
// GetMaxEstimatedMemoryConsumptionPerQuery returns the maximum estimated memory allowed to be consumed by a query in bytes, or 0 to disable the limit.
GetMaxEstimatedMemoryConsumptionPerQuery(ctx context.Context) (uint64, error)
}

// NewStaticQueryLimitsProvider returns a QueryLimitsProvider that always returns the provided limits.
//
// This should generally only be used in tests.
func NewStaticQueryLimitsProvider(maxEstimatedMemoryConsumptionPerQuery uint64) QueryLimitsProvider {
return staticQueryLimitsProvider{
maxEstimatedMemoryConsumptionPerQuery: maxEstimatedMemoryConsumptionPerQuery,
}
}

type staticQueryLimitsProvider struct {
maxEstimatedMemoryConsumptionPerQuery uint64
}

func (p staticQueryLimitsProvider) GetMaxEstimatedMemoryConsumptionPerQuery(_ context.Context) (uint64, error) {
return p.maxEstimatedMemoryConsumptionPerQuery, nil
}
Loading

0 comments on commit 45a683e

Please sign in to comment.