Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mimir query engine: add memory consumption per query limit #8230

Merged
merged 26 commits into from
Jun 3, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
26 commits
Select commit Hold shift + click to select a range
3a79cfd
Make formatting consistent
charleskorn May 30, 2024
40c427a
Initial version of `LimitingPool`
charleskorn May 30, 2024
7e0a53e
Move to `operator` package
charleskorn May 30, 2024
ebecf9e
Move `FPoint` and `HPoint` pools next to `LimitingPool`
charleskorn May 30, 2024
0b03553
Add methods for slices of `HPoint` to `LimitingPool`
charleskorn May 30, 2024
27f24fc
Use `LimitingPool` everywhere
charleskorn May 30, 2024
0493b93
Move pool to its own package and introduce interface
charleskorn May 30, 2024
1f3490e
Move pool interface to `types` package
charleskorn May 30, 2024
033fe4a
Add documentation for `err-mimir-max-in-memory-samples-per-query`
charleskorn May 30, 2024
b0c807b
Add limit CLI flag and config option
charleskorn May 30, 2024
f89d7c4
Add (failing) tests
charleskorn May 30, 2024
3241525
Fix linting warnings
charleskorn May 30, 2024
d86a2ab
Add another test case
charleskorn May 30, 2024
c9e6ba0
Add more slice types to `LimitedPool`.
charleskorn May 30, 2024
290f3fe
Rework limit to use estimated memory consumption, rather than a numbe…
charleskorn May 31, 2024
29ccb77
Ensure float and bool slices are cleared.
charleskorn May 31, 2024
f620085
Update tests to use bytes rather than samples limit
charleskorn May 31, 2024
6f105c0
Add limit to list of experimental features
charleskorn May 31, 2024
4baab11
Add changelog entry
charleskorn May 31, 2024
da341b8
Fix linting warning
charleskorn May 31, 2024
0b8b82d
Fix description of error
charleskorn May 31, 2024
c69cc08
Remove unnecessary interface and early enforcement of limit
charleskorn Jun 2, 2024
499991e
Fix flag name
charleskorn Jun 2, 2024
fa70180
Remove unnecessary interface
charleskorn Jun 2, 2024
479bf0c
Remove unused methods
charleskorn Jun 2, 2024
86649ca
Address PR feedback
charleskorn Jun 3, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@
* [FEATURE] Continuous-test: now runable as a module with `mimir -target=continuous-test`. #7747
* [FEATURE] Store-gateway: Allow specific tenants to be enabled or disabled via `-store-gateway.enabled-tenants` or `-store-gateway.disabled-tenants` CLI flags or their corresponding YAML settings. #7653
* [FEATURE] New `-<prefix>.s3.bucket-lookup-type` flag configures lookup style type, used to access bucket in s3 compatible providers. #7684
* [FEATURE] Querier: add experimental streaming PromQL engine, enabled with `-querier.promql-engine=streaming`. #7693 #7898 #7899 #8023 #8058 #8096 #8121 #8197
* [FEATURE] Querier: add experimental streaming PromQL engine, enabled with `-querier.promql-engine=streaming`. #7693 #7898 #7899 #8023 #8058 #8096 #8121 #8197 #8230
* [FEATURE] New `/ingester/unregister-on-shutdown` HTTP endpoint allows dynamic access to ingesters' `-ingester.ring.unregister-on-shutdown` configuration. #7739
* [FEATURE] Server: added experimental [PROXY protocol support](https://www.haproxy.org/download/2.3/doc/proxy-protocol.txt). The PROXY protocol support can be enabled via `-server.proxy-protocol-enabled=true`. When enabled, the support is added both to HTTP and gRPC listening ports. #7698
* [FEATURE] mimirtool: Add `runtime-config verify` sub-command, for verifying Mimir runtime config files. #8123
Expand Down
11 changes: 11 additions & 0 deletions cmd/mimir/config-descriptor.json
Original file line number Diff line number Diff line change
Expand Up @@ -3545,6 +3545,17 @@
"fieldFlag": "querier.max-fetched-chunk-bytes-per-query",
"fieldType": "int"
},
{
"kind": "field",
"name": "max_estimated_memory_consumption_per_query",
"required": false,
"desc": "The maximum estimated memory a single query can consume at once, in bytes. This limit is only enforced when Mimir's query engine is in use. This limit is enforced in the querier. 0 to disable.",
"fieldValue": null,
"fieldDefaultValue": 0,
"fieldFlag": "querier.max-estimated-memory-consumption-per-query",
"fieldType": "int",
"fieldCategory": "experimental"
},
{
"kind": "field",
"name": "max_query_lookback",
Expand Down
2 changes: 2 additions & 0 deletions cmd/mimir/help-all.txt.tmpl
Original file line number Diff line number Diff line change
Expand Up @@ -1737,6 +1737,8 @@ Usage of ./cmd/mimir/mimir:
The number of workers running in each querier process. This setting limits the maximum number of concurrent queries in each querier. (default 20)
-querier.max-estimated-fetched-chunks-per-query-multiplier float
[experimental] Maximum number of chunks estimated to be fetched in a single query from ingesters and store-gateways, as a multiple of -querier.max-fetched-chunks-per-query. This limit is enforced in the querier. Must be greater than or equal to 1, or 0 to disable.
-querier.max-estimated-memory-consumption-per-query uint
[experimental] The maximum estimated memory a single query can consume at once, in bytes. This limit is only enforced when Mimir's query engine is in use. This limit is enforced in the querier. 0 to disable.
-querier.max-fetched-chunk-bytes-per-query int
The maximum size of all chunks in bytes that a query can fetch from ingesters and store-gateways. This limit is enforced in the querier and ruler. 0 to disable.
-querier.max-fetched-chunks-per-query int
Expand Down
1 change: 1 addition & 0 deletions docs/sources/mimir/configure/about-versioning.md
Original file line number Diff line number Diff line change
Expand Up @@ -131,6 +131,7 @@ The following features are currently experimental:
- Enable PromQL experimental functions (`-querier.promql-experimental-functions-enabled`)
- Allow streaming of `/active_series` responses to the frontend (`-querier.response-streaming-enabled`)
- Streaming PromQL engine (`-querier.promql-engine=streaming` and `-querier.enable-promql-engine-fallback`)
- Maximum estimated memory consumption per query limit (`-querier.max-estimated-memory-consumption-per-query`)
- Query-frontend
- `-query-frontend.querier-forget-delay`
- Instant query splitting (`-query-frontend.split-instant-queries-by-interval`)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -3164,6 +3164,12 @@ The `limits` block configures default and per-tenant limits imposed by component
# CLI flag: -querier.max-fetched-chunk-bytes-per-query
[max_fetched_chunk_bytes_per_query: <int> | default = 0]

# (experimental) The maximum estimated memory a single query can consume at
# once, in bytes. This limit is only enforced when Mimir's query engine is in
# use. This limit is enforced in the querier. 0 to disable.
# CLI flag: -querier.max-estimated-memory-consumption-per-query
[max_estimated_memory_consumption_per_query: <int> | default = 0]

# Limit how long back data (series and metadata) can be queried, up until
# <lookback> duration ago. This limit is enforced in the query-frontend, querier
# and ruler. If the requested time range is outside the allowed range, the
Expand Down
17 changes: 17 additions & 0 deletions docs/sources/mimir/manage/mimir-runbooks/_index.md
Original file line number Diff line number Diff line change
Expand Up @@ -1992,6 +1992,23 @@ How to **fix** it:
- Consider reducing the time range and/or cardinality of the query. To reduce the cardinality of the query, you can add more label matchers to the query, restricting the set of matching series.
- Consider increasing the per-tenant limit by using the `-querier.max-fetched-chunk-bytes-per-query` option (or `max_fetched_chunk_bytes_per_query` in the runtime configuration).

### err-mimir-max-estimated-memory-consumption-per-query

This error occurs when execution of a query exceeds the limit on the maximum estimated memory consumed by a single query.

This limit is used to protect the system’s stability from potential abuse or mistakes, when running a query fetching a huge amount of data.
This limit only applies when Mimir's query engine is used (ie. `-querier.promql-engine=streaming`).
To configure the limit on a global basis, use the `-querier.max-estimated-memory-consumption-per-query` option.
To configure the limit on a per-tenant basis, set the `max_estimated_memory_consumption_per_query` per-tenant override in the runtime configuration.

How to **fix** it:

- Consider reducing the time range of the query.
- Consider reducing the cardinality of the query. To reduce the cardinality of the query, you can add more label matchers to the query, restricting the set of matching series.
- Consider applying aggregations such as `sum` or `avg` to the query.
- Consider increasing the global limit by using the `-querier.max-estimated-memory-consumption-per-query` option.
- Consider increasing the limit on a per-tenant basis by using the `max_estimated_memory_consumption_per_query` per tenant-override in the runtime configuration.

### err-mimir-max-query-length

This error occurs when the time range of a partial (after possible splitting, sharding by the query-frontend) query exceeds the configured maximum length. For a limit on the total query length, see [err-mimir-max-total-query-length](#err-mimir-max-total-query-length).
Expand Down
16 changes: 15 additions & 1 deletion pkg/querier/querier.go
Original file line number Diff line number Diff line change
Expand Up @@ -164,7 +164,8 @@ func New(cfg Config, limits *validation.Overrides, distributor Distributor, stor
case standardPromQLEngine:
eng = promql.NewEngine(opts)
case streamingPromQLEngine:
streamingEngine, err := streamingpromql.NewEngine(opts)
limitsProvider := &tenantQueryLimitsProvider{limits: limits}
streamingEngine, err := streamingpromql.NewEngine(opts, limitsProvider)
if err != nil {
return nil, nil, nil, err
}
Expand Down Expand Up @@ -623,3 +624,16 @@ func logClampEvent(spanLog *spanlogger.SpanLogger, originalT, clampedT int64, mi
"updated", util.TimeFromMillis(clampedT).String(),
)
}

type tenantQueryLimitsProvider struct {
limits *validation.Overrides
}

func (p *tenantQueryLimitsProvider) GetMaxEstimatedMemoryConsumptionPerQuery(ctx context.Context) (uint64, error) {
tenantID, err := tenant.TenantID(ctx)
if err != nil {
return 0, err
}

return p.limits.MaxEstimatedMemoryConsumptionPerQuery(tenantID), nil
}
6 changes: 3 additions & 3 deletions pkg/streamingpromql/benchmarks/comparison_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,7 @@ func BenchmarkQuery(b *testing.B) {

opts := streamingpromql.NewTestEngineOpts()
prometheusEngine := promql.NewEngine(opts)
streamingEngine, err := streamingpromql.NewEngine(opts)
streamingEngine, err := streamingpromql.NewEngine(opts, streamingpromql.NewStaticQueryLimitsProvider(0))
require.NoError(b, err)

// Important: the names below must remain in sync with the names used in tools/benchmark-query-engine.
Expand Down Expand Up @@ -96,7 +96,7 @@ func TestBothEnginesReturnSameResultsForBenchmarkQueries(t *testing.T) {

opts := streamingpromql.NewTestEngineOpts()
prometheusEngine := promql.NewEngine(opts)
streamingEngine, err := streamingpromql.NewEngine(opts)
streamingEngine, err := streamingpromql.NewEngine(opts, streamingpromql.NewStaticQueryLimitsProvider(0))
require.NoError(t, err)

ctx := user.InjectOrgID(context.Background(), UserID)
Expand All @@ -123,7 +123,7 @@ func TestBenchmarkSetup(t *testing.T) {
q := createBenchmarkQueryable(t, []int{1})

opts := streamingpromql.NewTestEngineOpts()
streamingEngine, err := streamingpromql.NewEngine(opts)
streamingEngine, err := streamingpromql.NewEngine(opts, streamingpromql.NewStaticQueryLimitsProvider(0))
require.NoError(t, err)

ctx := user.InjectOrgID(context.Background(), UserID)
Expand Down
42 changes: 33 additions & 9 deletions pkg/streamingpromql/engine.go
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ import (

const defaultLookbackDelta = 5 * time.Minute // This should be the same value as github.com/prometheus/prometheus/promql.defaultLookbackDelta.

func NewEngine(opts promql.EngineOpts) (promql.QueryEngine, error) {
func NewEngine(opts promql.EngineOpts, limitsProvider QueryLimitsProvider) (promql.QueryEngine, error) {
lookbackDelta := opts.LookbackDelta
if lookbackDelta == 0 {
lookbackDelta = defaultLookbackDelta
Expand All @@ -36,21 +36,23 @@ func NewEngine(opts promql.EngineOpts) (promql.QueryEngine, error) {
}

return &Engine{
lookbackDelta: lookbackDelta,
timeout: opts.Timeout,
lookbackDelta: lookbackDelta,
timeout: opts.Timeout,
limitsProvider: limitsProvider,
}, nil
}

type Engine struct {
lookbackDelta time.Duration
timeout time.Duration
lookbackDelta time.Duration
timeout time.Duration
limitsProvider QueryLimitsProvider
}

func (e *Engine) NewInstantQuery(_ context.Context, q storage.Queryable, opts promql.QueryOpts, qs string, ts time.Time) (promql.Query, error) {
return newQuery(q, opts, qs, ts, ts, 0, e)
func (e *Engine) NewInstantQuery(ctx context.Context, q storage.Queryable, opts promql.QueryOpts, qs string, ts time.Time) (promql.Query, error) {
return newQuery(ctx, q, opts, qs, ts, ts, 0, e)
}

func (e *Engine) NewRangeQuery(_ context.Context, q storage.Queryable, opts promql.QueryOpts, qs string, start, end time.Time, interval time.Duration) (promql.Query, error) {
func (e *Engine) NewRangeQuery(ctx context.Context, q storage.Queryable, opts promql.QueryOpts, qs string, start, end time.Time, interval time.Duration) (promql.Query, error) {
if interval <= 0 {
return nil, fmt.Errorf("%v is not a valid interval for a range query, must be greater than 0", interval)
}
Expand All @@ -59,5 +61,27 @@ func (e *Engine) NewRangeQuery(_ context.Context, q storage.Queryable, opts prom
return nil, fmt.Errorf("range query time range is invalid: end time %v is before start time %v", end.Format(time.RFC3339), start.Format(time.RFC3339))
}

return newQuery(q, opts, qs, start, end, interval, e)
return newQuery(ctx, q, opts, qs, start, end, interval, e)
}

type QueryLimitsProvider interface {
// GetMaxEstimatedMemoryConsumptionPerQuery returns the maximum estimated memory allowed to be consumed by a query in bytes, or 0 to disable the limit.
GetMaxEstimatedMemoryConsumptionPerQuery(ctx context.Context) (uint64, error)
}

// NewStaticQueryLimitsProvider returns a QueryLimitsProvider that always returns the provided limits.
//
// This should generally only be used in tests.
func NewStaticQueryLimitsProvider(maxEstimatedMemoryConsumptionPerQuery uint64) QueryLimitsProvider {
return staticQueryLimitsProvider{
maxEstimatedMemoryConsumptionPerQuery: maxEstimatedMemoryConsumptionPerQuery,
}
}

type staticQueryLimitsProvider struct {
maxEstimatedMemoryConsumptionPerQuery uint64
}

func (p staticQueryLimitsProvider) GetMaxEstimatedMemoryConsumptionPerQuery(_ context.Context) (uint64, error) {
return p.maxEstimatedMemoryConsumptionPerQuery, nil
}
Loading
Loading