Query-Frontend Prometheus Codec: Metrics Query MinTime and MaxTime, Apply Fix to Query Component Assignment #7742

francoposa · 2024-03-27T21:39:32Z

What this PR does

Prometheus Codec for Metrics Query types only expose the Start and End, with no understanding of the query itself or how a range vector or offset can change the actual time range queried.

First noticed as part of an issue where the frontend's query scheduler adapter was assigning the expected query component to be ingesters only, based on start and end timestamps, but the query was actually looking back over many hours or days, meaning the expected query component should have been assigned as ingesters and store gateways.

Which issue(s) this PR fixes or relates to

Fixes #

Checklist

Tests updated.
Documentation added.
CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX].
about-versioning.md updated with experimental features.

francoposa · 2024-03-29T17:13:57Z

pkg/frontend/querymiddleware/codec.go

@@ -270,16 +273,17 @@ func (prometheusCodec) decodeRangeQueryRequest(r *http.Request) (MetricsQueryReq
 		return nil, err
 	}

-	result.Query = r.FormValue("query")


any use of formvalue is a potential bug as it consumes the body on a POST.
everything should use util.ParseRequestFormWithoutConsumingBody, then work with the map of values

Should we have a linting rule for this?

pkg/frontend/querymiddleware/codec.go

pkg/frontend/querymiddleware/model_extra.go

francoposa · 2024-04-03T23:39:40Z

pkg/frontend/v2/frontend_scheduler_adapter.go

 		if err != nil {
 			return nil, err
 		}
-		return a.queryComponentQueueDimensionFromTimeParams(tenantIDs, time, time, now), nil
+
+		return a.queryComponentQueueDimensionFromTimeParams(tenantIDs, minT, maxT, now), nil


this is the actual bugfix - instead of using start/end or time for range and instant queries, we are determining this query component with actual minT and maxT for the query.

charleskorn

Overall approach looks good to me.

pkg/frontend/querymiddleware/codec.go

pkg/frontend/querymiddleware/codec_test.go

pkg/frontend/querymiddleware/model_extra.go

francoposa · 2024-04-04T21:06:06Z

Overall approach looks good to me.

thanks for the feedback!

Will end up needing a rebase or re-do here if I get #7810 through - point is to abandon proto representations so we can enforce relationships between minT, maxT, query, etc.

francoposa · 2024-04-08T21:39:50Z

@charleskorn updated approach to only parse query once, though it still walks the queryExpr tree to get min and max each time - I think I am fine with that for now.
Unfortunately this created a bunch of boilerplate test changes throughout the frontend since WithQuery can return an error for a bad parse now.

We could actually set minT and maxT on the object and only walk the tree on object creation or the WithStartEnd/WithQuery transforms, but for that we would want to create a constructor which doesn't currently exist and would probably create a bunch more boilerplate changes.

charleskorn

We could actually set minT and maxT on the object and only walk the tree on object creation or the WithStartEnd/WithQuery transforms, but for that we would want to create a constructor which doesn't currently exist and would probably create a bunch more boilerplate changes.

Given we only call GetMinT() and GetMaxT() in one place, and call both at the same time, what if we had a single GetTimeRange() method that returns both? Then we won't pay the cost of walking the tree twice.

charleskorn · 2024-04-09T00:13:37Z

pkg/frontend/querymiddleware/codec.go

@@ -270,16 +273,17 @@ func (prometheusCodec) decodeRangeQueryRequest(r *http.Request) (MetricsQueryReq
 		return nil, err
 	}

-	result.Query = r.FormValue("query")


Should we have a linting rule for this?

francoposa · 2024-04-09T01:14:21Z

Given we only call GetMinT() and GetMaxT() in one place, and call both at the same time, what if we had a single GetTimeRange() method that returns both? Then we won't pay the cost of walking the tree twice.

I wanted to keep GetMinT and GetMaxT separate given the existing style of GetStart and GetEnd.

If we collapsed GetMinT and GetMaxT together I would want to do the same for GetStart and GetEnd.

Alternatively I could just go with the route mentioned of moving to a constructor and only parsing and walking on constructors or transforms. I think this is a bit more of a complete solution.

Either option would be some more boilerplate-y changes - up to you if you want them to come in this PR or a follow-on one

charleskorn · 2024-04-09T01:26:13Z

I wanted to keep GetMinT and GetMaxT separate given the existing style of GetStart and GetEnd.

If we collapsed GetMinT and GetMaxT together I would want to do the same for GetStart and GetEnd.

Alternatively I could just go with the route mentioned of moving to a constructor and only parsing and walking on constructors or transforms. I think this is a bit more of a complete solution.

I'm OK with GetStart/GetEnd and GetTimeRange having different styles, but either solution (single method or constructor) is fine with me.

Either option would be some more boilerplate-y changes - up to you if you want them to come in this PR or a follow-on one

I think it'd be best to do whichever choice you choose as part of this PR.

francoposa · 2024-04-12T17:25:35Z

pkg/frontend/querymiddleware/stats_test.go

@@ -62,47 +62,16 @@ func Test_queryStatsMiddleware_Do(t *testing.T) {
 				Step:         step,
 			},
 		},
-		{


no longer a case as the query will have already been validated and parsed before it is in the request object

francoposa · 2024-04-12T21:47:26Z

change to use constructor and only parse query on construct or transform is complete for the range requests - need to do instant now, should be shorter since there's a lot fewer components and tests using instant queries

francoposa · 2024-04-12T22:54:02Z

it's a doozy with all the test fixtures to update, but it's all ready.

Constructors NewPrometheusRangeQueryRequest and NewPrometheusInstantQueryRequest, as well as each type'
s respective WithStartEnd and WithQuery implementation now calculate minT and maxT and set them as fields.

All type fields are now private and exposed through getters and setters to avoid being able to set query, start, end, step, or lookback delta without the derived minT and maxT being updated as well.

WithStartEnd and WithQuery transforms are tested for consistency.

the only functional things that should have changed are

the original bug to fix, which is determining expected query component in frontend scheduler adapter with minT and maxT, not start and end
query blocker and caching should actually be a bit more stable now because they generate cache keys from the parsed-then-stringified query, rather than the unvalidated string query, as the parsed query's String() method always formats it in the same way, regardless of original query format.
i. I doubt there were ever that many cases of this in the wild
ii. it reorders operators, like sum(container_memory_rss) by (namespace) becomes sum by (namespace) (container_memory_rss)
iii. and removes braces from metrics names when there is no selector - foo{} becomes foo

charleskorn

Overall LGTM, thanks for adding the constructor.

query blocker and caching should actually be a bit more stable now because they generate cache keys from the parsed-then-stringified query, rather than the unvalidated string query, as the parsed query's String() method always formats it in the same way, regardless of original query format.

I doubt there were ever that many cases of this in the wild

it reorders operators, like sum(container_memory_rss) by (namespace) becomes sum by (namespace) (container_memory_rss)

and removes braces from metrics names when there is no selector - foo{} becomes foo

Might be worth calling out in the changelog entry that this will cause some query result cache churn when first deployed.

Will the pretty-printed version of the expression be logged anywhere? If not, it's going to be difficult to block problematic queries. If it is logged somewhere, we should mention both that the pretty-printed query is used and how to see the pretty-printed query in the docs for query blocking.

charleskorn · 2024-04-15T13:24:25Z

pkg/frontend/querymiddleware/split_by_instant_interval.go

@@ -171,7 +171,8 @@ func (s *splitInstantQueryByIntervalMiddleware) Do(ctx context.Context, req Metr
 	s.metrics.splitQueriesPerQuery.Observe(float64(mapperStats.GetSplitQueries()))

 	// Send hint with number of embedded queries to the sharding middleware
-	req = req.WithQuery(instantSplitQuery.String()).WithTotalQueriesHint(int32(mapperStats.GetSplitQueries()))
+	req, _ = req.WithQuery(instantSplitQuery.String()) // expect no error as instantSplitQuery is already a valid prometheus query


What if we added a WithExpr method here? Then we wouldn't need to pay the price of turning the existing Expr into a string and then parsing it again

francoposa · 2024-04-25T21:20:01Z

Will the pretty-printed version of the expression be logged anywhere? If not, it's going to be difficult to block problematic queries. If it is logged somewhere, we should mention both that the pretty-printed query is used and how to see the pretty-printed query in the docs for query blocking.

So I looked at this from a few different angles, and the query is logged when it's blocked, but not when it's not blocked.
So the primary purpose is to enable people to block the correct queries:

cost of blocking something you didn't mean to is low
cost of not blocking something you need to block is potentially very high

In either case we should enable the user to enter the query block correctly.
I checked if anything existed to do this, and the latest version of promtool does it, behind an experimental flag:

$ promtool --experimental promql format "foo{}"
foo

however this only pretty-prints it. There are two methods for string-formatting a promql expression: String and Pretty.
We only use String in the frontend, and in any case it would probably be a poor experience to put pretty-printed queries with tons of whitespace in the blocker YAML.

thankfully there's nothing magical in the promtool version we can't do ourselves in mimirtool:
https://github.com/prometheus/prometheus/blob/0305490e4e7baf9cb69e4bf5b131238fb38ffa77/cmd/promtool/main.go#L1221-L1229

func formatPromQL(query string) error {
	expr, err := parser.ParseExpr(query)
	if err != nil {
		return err
	}

	fmt.Println(expr.Pretty(0))
	return nil
}

I think the best approach here is to add something like this like mimirtool promql format [--pretty] expr, and put it in the query blocking docs that you can use it to see how you format your query blocking entries

…arser as a function

… MaxT as methods; add test cases for instant queries

…an up test structure

This reverts commit f7212cd18d11bb8df5a3b2a52995911956b1fa41.

…e fields

…locking

francoposa · 2024-04-25T23:20:48Z

everything I mentioned above is done, and WithExpr is tested alongside WithQuery.

Feedback welcome on changelog wording of course

CHANGELOG.md

charleskorn · 2024-04-29T01:52:06Z

docs/sources/mimir/configure/configure-blocked-queries.md

+## Formatting queries to block
+
+Queries received by Mimir are parsed into PromQL expressions before blocking is applied.
+The `pattern` from the blocked queries is compared against Prometheus' string format representation of the parsed query,


[nit] Maybe this would be clearer?

Suggested change

The `pattern` from the blocked queries is compared against Prometheus' string format representation of the parsed query,

The `pattern` from the blocked queries is compared against the formatted representation of the parsed query,

If so, I would use the same term below (rather than referring to Prometheus).

docs/sources/mimir/configure/configure-blocked-queries.md

charleskorn · 2024-04-29T02:06:15Z

pkg/mimirtool/commands/promql.go

+
+	// PromQL Format Query Command
+	promqlFormatCmd := promqlCmd.Command("format", "Format PromQL query with Prometheus' string formatter; wrap query in quotes for CLI parsing.").Action(c.formatQuery)
+	promqlFormatCmd.Flag("pretty", "use Prometheus' pretty-print formatter").BoolVar(&c.prettyPrint)


Is Prometheus relevant here? I think we can just say something like "pretty-print expression"

I think the fact that it's using upstream code is relevant for a user technical enough to be doing this. It implies a level of stability and consistency with other tooling like promtool - using the --pretty flag here is identical to the promtool command. The promtool version just doesn't allow non-pretty at this point.

I'm not sure that the fact we're using the upstream code is so relevant here or in the docs above - query blocking is a Mimir-only feature, so alignment with how Prometheus formats queries isn't so important, but consistency with how the query-blocking feature formats queries is important.

pkg/mimirtool/commands/promql.go

pkg/frontend/querymiddleware/codec_test.go

jdbaldry

I think we may need to reorder the conceptual material to focus on why the user wants to perform this task up front, after that is done, I can review the language specifically.

nit: the arbitrary line breaks are worse for line based diffing in GitHub in the future, consider using semantic line breaks instead.

docs/sources/mimir/configure/configure-blocked-queries.md

francoposa · 2024-04-30T21:26:14Z

docs shell style has been updated.

The line breaks are already semantic, not arbitrary they all break at the latest possible sentence clause break while still keeping line length <= 120 chars.

There are non-semantic breaks in the existing language in the doc but I left them in order not to create further meaningless changes in this diff.

…-time-from-query-parsing

francoposa · 2024-04-30T21:35:49Z

tests were split, Changelog feedback added.

Resolved everything but the ideas about not referring to using the Prometheus formatter.
As mentioned above I personally feel that using explicitly using the upstream is an advantage from the perspective of an end user technical enough to use query blocking.
It implies that it should be a pretty stable and consistent formatting, not Mimir diverging from what upstream does and then trying to achieve consistency later. If the (currently-experimental) new promtool promql format supported non-pretty string formatting, we probably would direct users there as an alternative to mimirtool.

pkg/frontend/querymiddleware/codec_test.go

charleskorn

🚀

francoposa commented Mar 29, 2024

View reviewed changes

pkg/frontend/querymiddleware/codec.go Show resolved Hide resolved

francoposa force-pushed the francoposa/frontend-prometheus-codec-min-max-time-from-query-parsing branch from 3188129 to b3979cd Compare April 3, 2024 20:55

francoposa commented Apr 3, 2024

View reviewed changes

pkg/frontend/querymiddleware/model_extra.go Outdated Show resolved Hide resolved

francoposa commented Apr 3, 2024

View reviewed changes

francoposa changed the title ~~Query-Frontend Prometheus Codec Metrics Query MinTime and MaxTime~~ Query-Frontend Prometheus Codec Metrics Query MinTime and MaxTime, Apply Fix to Query Component Assignment Apr 3, 2024

francoposa marked this pull request as ready for review April 3, 2024 23:40

francoposa requested a review from a team as a code owner April 3, 2024 23:40

francoposa changed the title ~~Query-Frontend Prometheus Codec Metrics Query MinTime and MaxTime, Apply Fix to Query Component Assignment~~ Query-Frontend Prometheus Codec: Metrics Query MinTime and MaxTime, Apply Fix to Query Component Assignment Apr 3, 2024

charleskorn reviewed Apr 4, 2024

View reviewed changes

francoposa mentioned this pull request Apr 4, 2024

refactor: query frontend prometheus codec convert from proto to standard types #7810

Merged

4 tasks

francoposa force-pushed the francoposa/frontend-prometheus-codec-min-max-time-from-query-parsing branch 2 times, most recently from fbfa226 to eec43e1 Compare April 8, 2024 17:50

charleskorn reviewed Apr 9, 2024

View reviewed changes

francoposa commented Apr 12, 2024

View reviewed changes

charleskorn reviewed Apr 15, 2024

View reviewed changes

francoposa requested a review from jdbaldry as a code owner April 25, 2024 22:37

francoposa added 6 commits April 25, 2024 15:52

implement prometheus codec metrics query mintime maxtime with query p…

9b1f3c3

…arser as a function

convert MetricsQueryRequest and implementing types to expose MinT and…

6e83dd3

… MaxT as methods; add test cases for instant queries

apply codec minT and maxT to frontend scheduler adapter

35ab76b

add lookback delta to query frontend config and prometheus codec; cle…

4686d18

…an up test structure

fix test

7d4e6e9

add test cases for prometheus at-modifier

2d9c462

francoposa added 6 commits April 25, 2024 15:52

Revert "convert to usages of constructor to fix codec tests"

9ace77f

This reverts commit f7212cd18d11bb8df5a3b2a52995911956b1fa41.

fix possibility for nil panics and update tests

f3f7b47

convert instant request to construct / transform approach with privat…

38a9ae4

…e fields

cleanup

1e62a97

mimirtool command for query formatting and documenting it for query b…

d2d216e

…locking

CHANGELOG

f66a66e

francoposa force-pushed the francoposa/frontend-prometheus-codec-min-max-time-from-query-parsing branch from b53fab6 to f66a66e Compare April 25, 2024 22:54

WithExpr tested

41e377d

francoposa requested a review from charleskorn April 25, 2024 23:20

francoposa added 2 commits April 25, 2024 16:29

lint

8849881

backticks for formatting in changelog for mimirtool command

f051d1c

charleskorn reviewed Apr 29, 2024

View reviewed changes

jdbaldry reviewed Apr 29, 2024

View reviewed changes

docs/sources/mimir/configure/configure-blocked-queries.md Outdated Show resolved Hide resolved

docs/sources/mimir/configure/configure-blocked-queries.md Outdated Show resolved Hide resolved

docs/sources/mimir/configure/configure-blocked-queries.md Outdated Show resolved Hide resolved

francoposa added 2 commits April 30, 2024 14:05

split tests between WithStartEnd and WithQuery/WithExpr

5e87581

docs style update

7c35a21

Merge branch 'main' into francoposa/frontend-prometheus-codec-min-max…

ee35638

…-time-from-query-parsing

francoposa added 2 commits April 30, 2024 14:36

make doc

88bb1d3

utilize lookback duration in WithStartEnd tests

a856aeb

francoposa requested review from charleskorn and jdbaldry April 30, 2024 21:51

charleskorn reviewed May 1, 2024

View reviewed changes

pkg/frontend/querymiddleware/codec_test.go Outdated Show resolved Hide resolved

pkg/frontend/querymiddleware/codec_test.go Outdated Show resolved Hide resolved

pkg/frontend/querymiddleware/codec_test.go Outdated Show resolved Hide resolved

PR feedback: clean up tests after split

28fec1d

francoposa requested a review from charleskorn May 1, 2024 01:28

massaging docs wording for clarity

1d208f2

charleskorn approved these changes May 1, 2024

View reviewed changes

francoposa merged commit f43c62e into main May 1, 2024
29 checks passed

francoposa deleted the francoposa/frontend-prometheus-codec-min-max-time-from-query-parsing branch May 1, 2024 03:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Query-Frontend Prometheus Codec: Metrics Query MinTime and MaxTime, Apply Fix to Query Component Assignment #7742

Query-Frontend Prometheus Codec: Metrics Query MinTime and MaxTime, Apply Fix to Query Component Assignment #7742

francoposa commented Mar 27, 2024 •

edited

Loading

francoposa Mar 29, 2024

charleskorn Apr 9, 2024

francoposa Apr 3, 2024

charleskorn left a comment

francoposa commented Apr 4, 2024

francoposa commented Apr 8, 2024

charleskorn left a comment

charleskorn Apr 9, 2024

francoposa commented Apr 9, 2024

charleskorn commented Apr 9, 2024

francoposa Apr 12, 2024

francoposa commented Apr 12, 2024

francoposa commented Apr 12, 2024 •

edited

Loading

charleskorn left a comment

charleskorn Apr 15, 2024

francoposa commented Apr 25, 2024

francoposa commented Apr 25, 2024

charleskorn Apr 29, 2024

charleskorn Apr 29, 2024

francoposa Apr 30, 2024

charleskorn May 1, 2024

jdbaldry left a comment

francoposa commented Apr 30, 2024

francoposa commented Apr 30, 2024

charleskorn left a comment

	The `pattern` from the blocked queries is compared against Prometheus' string format representation of the parsed query,
	The `pattern` from the blocked queries is compared against the formatted representation of the parsed query,

Query-Frontend Prometheus Codec: Metrics Query MinTime and MaxTime, Apply Fix to Query Component Assignment #7742

Query-Frontend Prometheus Codec: Metrics Query MinTime and MaxTime, Apply Fix to Query Component Assignment #7742

Conversation

francoposa commented Mar 27, 2024 • edited Loading

What this PR does

Which issue(s) this PR fixes or relates to

Checklist

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

charleskorn left a comment

Choose a reason for hiding this comment

francoposa commented Apr 4, 2024

francoposa commented Apr 8, 2024

charleskorn left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

francoposa commented Apr 9, 2024

charleskorn commented Apr 9, 2024

Choose a reason for hiding this comment

francoposa commented Apr 12, 2024

francoposa commented Apr 12, 2024 • edited Loading

charleskorn left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

francoposa commented Apr 25, 2024

francoposa commented Apr 25, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jdbaldry left a comment

Choose a reason for hiding this comment

francoposa commented Apr 30, 2024

francoposa commented Apr 30, 2024

charleskorn left a comment

Choose a reason for hiding this comment

francoposa commented Mar 27, 2024 •

edited

Loading

francoposa commented Apr 12, 2024 •

edited

Loading