-
Notifications
You must be signed in to change notification settings - Fork 512
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Query-Frontend Prometheus Codec: Metrics Query MinTime and MaxTime, Apply Fix to Query Component Assignment #7742
Query-Frontend Prometheus Codec: Metrics Query MinTime and MaxTime, Apply Fix to Query Component Assignment #7742
Conversation
@@ -270,16 +273,17 @@ func (prometheusCodec) decodeRangeQueryRequest(r *http.Request) (MetricsQueryReq | |||
return nil, err | |||
} | |||
|
|||
result.Query = r.FormValue("query") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
any use of formvalue is a potential bug as it consumes the body on a POST.
everything should use util.ParseRequestFormWithoutConsumingBody
, then work with the map of values
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we have a linting rule for this?
3188129
to
b3979cd
Compare
if err != nil { | ||
return nil, err | ||
} | ||
return a.queryComponentQueueDimensionFromTimeParams(tenantIDs, time, time, now), nil | ||
|
||
return a.queryComponentQueueDimensionFromTimeParams(tenantIDs, minT, maxT, now), nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is the actual bugfix - instead of using start/end or time for range and instant queries, we are determining this query component with actual minT and maxT for the query.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall approach looks good to me.
thanks for the feedback! Will end up needing a rebase or re-do here if I get #7810 through - point is to abandon proto representations so we can enforce relationships between minT, maxT, query, etc. |
fbfa226
to
eec43e1
Compare
@charleskorn updated approach to only parse query once, though it still walks the We could actually set |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could actually set
minT
andmaxT
on the object and only walk the tree on object creation or theWithStartEnd
/WithQuery
transforms, but for that we would want to create a constructor which doesn't currently exist and would probably create a bunch more boilerplate changes.
Given we only call GetMinT()
and GetMaxT()
in one place, and call both at the same time, what if we had a single GetTimeRange()
method that returns both? Then we won't pay the cost of walking the tree twice.
@@ -270,16 +273,17 @@ func (prometheusCodec) decodeRangeQueryRequest(r *http.Request) (MetricsQueryReq | |||
return nil, err | |||
} | |||
|
|||
result.Query = r.FormValue("query") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we have a linting rule for this?
I wanted to keep GetMinT and GetMaxT separate given the existing style of GetStart and GetEnd. If we collapsed GetMinT and GetMaxT together I would want to do the same for GetStart and GetEnd. Alternatively I could just go with the route mentioned of moving to a constructor and only parsing and walking on constructors or transforms. I think this is a bit more of a complete solution. Either option would be some more boilerplate-y changes - up to you if you want them to come in this PR or a follow-on one |
I'm OK with
I think it'd be best to do whichever choice you choose as part of this PR. |
@@ -62,47 +62,16 @@ func Test_queryStatsMiddleware_Do(t *testing.T) { | |||
Step: step, | |||
}, | |||
}, | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no longer a case as the query will have already been validated and parsed before it is in the request object
change to use constructor and only parse query on construct or transform is complete for the range requests - need to do instant now, should be shorter since there's a lot fewer components and tests using instant queries |
it's a doozy with all the test fixtures to update, but it's all ready. Constructors All type fields are now private and exposed through getters and setters to avoid being able to set query, start, end, step, or lookback delta without the derived minT and maxT being updated as well.
the only functional things that should have changed are
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall LGTM, thanks for adding the constructor.
- query blocker and caching should actually be a bit more stable now because they generate cache keys from the parsed-then-stringified query, rather than the unvalidated string query, as the parsed query's String() method always formats it in the same way, regardless of original query format.
- I doubt there were ever that many cases of this in the wild
- it reorders operators, like sum(container_memory_rss) by (namespace) becomes sum by (namespace) (container_memory_rss)
- and removes braces from metrics names when there is no selector - foo{} becomes foo
Might be worth calling out in the changelog entry that this will cause some query result cache churn when first deployed.
Will the pretty-printed version of the expression be logged anywhere? If not, it's going to be difficult to block problematic queries. If it is logged somewhere, we should mention both that the pretty-printed query is used and how to see the pretty-printed query in the docs for query blocking.
@@ -171,7 +171,8 @@ func (s *splitInstantQueryByIntervalMiddleware) Do(ctx context.Context, req Metr | |||
s.metrics.splitQueriesPerQuery.Observe(float64(mapperStats.GetSplitQueries())) | |||
|
|||
// Send hint with number of embedded queries to the sharding middleware | |||
req = req.WithQuery(instantSplitQuery.String()).WithTotalQueriesHint(int32(mapperStats.GetSplitQueries())) | |||
req, _ = req.WithQuery(instantSplitQuery.String()) // expect no error as instantSplitQuery is already a valid prometheus query |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if we added a WithExpr
method here? Then we wouldn't need to pay the price of turning the existing Expr
into a string and then parsing it again
So I looked at this from a few different angles, and the query is logged when it's blocked, but not when it's not blocked.
In either case we should enable the user to enter the query block correctly. $ promtool --experimental promql format "foo{}"
foo however this only pretty-prints it. There are two methods for string-formatting a promql expression: thankfully there's nothing magical in the promtool version we can't do ourselves in mimirtool:
I think the best approach here is to add something like this like |
…arser as a function
… MaxT as methods; add test cases for instant queries
…an up test structure
This reverts commit f7212cd18d11bb8df5a3b2a52995911956b1fa41.
b53fab6
to
f66a66e
Compare
everything I mentioned above is done, and Feedback welcome on changelog wording of course |
## Formatting queries to block | ||
|
||
Queries received by Mimir are parsed into PromQL expressions before blocking is applied. | ||
The `pattern` from the blocked queries is compared against Prometheus' string format representation of the parsed query, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nit] Maybe this would be clearer?
The `pattern` from the blocked queries is compared against Prometheus' string format representation of the parsed query, | |
The `pattern` from the blocked queries is compared against the formatted representation of the parsed query, |
If so, I would use the same term below (rather than referring to Prometheus).
|
||
// PromQL Format Query Command | ||
promqlFormatCmd := promqlCmd.Command("format", "Format PromQL query with Prometheus' string formatter; wrap query in quotes for CLI parsing.").Action(c.formatQuery) | ||
promqlFormatCmd.Flag("pretty", "use Prometheus' pretty-print formatter").BoolVar(&c.prettyPrint) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is Prometheus relevant here? I think we can just say something like "pretty-print expression"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the fact that it's using upstream code is relevant for a user technical enough to be doing this. It implies a level of stability and consistency with other tooling like promtool
- using the --pretty
flag here is identical to the promtool command. The promtool version just doesn't allow non-pretty at this point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure that the fact we're using the upstream code is so relevant here or in the docs above - query blocking is a Mimir-only feature, so alignment with how Prometheus formats queries isn't so important, but consistency with how the query-blocking feature formats queries is important.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we may need to reorder the conceptual material to focus on why the user wants to perform this task up front, after that is done, I can review the language specifically.
nit: the arbitrary line breaks are worse for line based diffing in GitHub in the future, consider using semantic line breaks instead.
docs shell style has been updated. The line breaks are already semantic, not arbitrary they all break at the latest possible sentence clause break while still keeping line length <= 120 chars. There are non-semantic breaks in the existing language in the doc but I left them in order not to create further meaningless changes in this diff. |
…-time-from-query-parsing
tests were split, Changelog feedback added. Resolved everything but the ideas about not referring to using the Prometheus formatter. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🚀
What this PR does
Prometheus Codec for Metrics Query types only expose the Start and End, with no understanding of the query itself or how a range vector or offset can change the actual time range queried.
First noticed as part of an issue where the frontend's query scheduler adapter was assigning the expected query component to be ingesters only, based on start and end timestamps, but the query was actually looking back over many hours or days, meaning the expected query component should have been assigned as ingesters and store gateways.
Which issue(s) this PR fixes or relates to
Fixes #
Checklist
CHANGELOG.md
updated - the order of entries should be[CHANGE]
,[FEATURE]
,[ENHANCEMENT]
,[BUGFIX]
.about-versioning.md
updated with experimental features.