Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update mimir-prometheus to 320f0c9c4a88 #6085

Merged
merged 34 commits into from
Sep 28, 2023
Merged

Conversation

aknuds1
Copy link
Contributor

@aknuds1 aknuds1 commented Sep 21, 2023

What this PR does

Update mimir-prometheus to latest main, i.e. 320f0c9c4a88.

The main change in Prometheus is that the context argument is moved from storage.Queryable.Querier and instead to individual query methods (Select, LabelValues, LabelNames). This requires some adapting of Mimir code, especially that which relies on the context being a querier field (e.g. tenant federation).

Also, due to (certain) Prometheus query methods gaining a context argument, they now respect context cancellation properly, unlike before.

TODO:

  • Add changes to the changelog
  • Verify whether query cancellation is properly implemented
  • Add tests to verify that queries respect context cancellation?
  • Check query traces - we do some things to workaround the previous state to ensure the context is propagated correctly (cc @charleskorn)
  • Make sure tenant federation works with GEM (cc @pstibrany)

Which issue(s) this PR fixes or relates to

Fixes #5314.

Checklist

  • Tests updated
  • Documentation added
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

@aknuds1 aknuds1 force-pushed the arve/revendor-prometheus branch 4 times, most recently from 984e312 to 3621699 Compare September 21, 2023 14:11
@aknuds1 aknuds1 force-pushed the arve/revendor-prometheus branch 5 times, most recently from 75d7793 to ae9b4c1 Compare September 22, 2023 14:31
@aknuds1 aknuds1 changed the title WIP: Update mimir-prometheus to 320f0c9c4a88 Update mimir-prometheus to 320f0c9c4a88 Sep 22, 2023
@aknuds1 aknuds1 force-pushed the arve/revendor-prometheus branch 2 times, most recently from b2212fc to 9b33724 Compare September 22, 2023 14:53
pkg/querier/querier.go Outdated Show resolved Hide resolved
aknuds1 and others added 13 commits September 25, 2023 15:42
Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
Copy link
Member

@pstibrany pstibrany left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, tenantfederation changes make sense to me.

pkg/querier/tenantfederation/merge_queryable.go Outdated Show resolved Hide resolved
pkg/querier/tenantfederation/merge_queryable.go Outdated Show resolved Hide resolved
Copy link
Contributor

@dimitarvdimitrov dimitarvdimitrov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I managed to look over the changes in the ingester, tenantfederation, and frontend/* packages. Unfortunately my battery will die soon, so I will need to get back to the rest later.

There were quite a few stylistic changes that make this already large change set even larger. For me at least this makes the review more error-prone. Perhaps doing them in a follow-up PR would have been easier to review. I don't think you should undo them at this point, but wanted to point this out

pkg/mimir/modules.go Outdated Show resolved Hide resolved
@@ -27,7 +27,7 @@ import (
// NewQueryable returns a queryable that iterates through all the tenant IDs
// that are part of the request and aggregates the results from each tenant's
// Querier by sending of subsequent requests.
// By setting bypassWithSingleQuerier to true the mergeQuerier gets bypassed
// By setting bypassWithSingleID to true the mergeQuerier gets bypassed
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i also don't understand the need for this set of changes, but i suppose they're not wrong. Can you also update the next line? It still talks about a querier.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dimitarvdimitrov changing to bypassWithSingleID makes sense, since the number of queriers had to be abstracted away (only GEM will need multiple queriers). The doc strings need fixing up. I was waiting for Peter's input first :)

)
func NewQueryable(upstream storage.Queryable, bypassWithSingleID bool, maxConcurrency int, logger log.Logger) storage.Queryable {
callbacks := MergeQuerierCallbacks{
IDs: func(ctx context.Context) ([]string, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wy do we need to call this via a callback? Can't we just call tenants.TenantIDs(ctx) in each of LabelValues, LabelNames, and Select?

Copy link
Contributor Author

@aknuds1 aknuds1 Sep 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's necessary to abstract since the enterprise code needs to do it differently (hence the callbacks). I'll give it another look to make sure I'm not missing something.

pkg/querier/tenantfederation/merge_queryable.go Outdated Show resolved Hide resolved

// tenantQuerier implements MergeQuerierUpstream, wrapping a storage.Querier.
// The federation ID gets injected into the context as a tenant ID.
type tenantQuerier struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can't we just call user.InjectOrgID and inject the tenatn id in the context in the three call sites of tenantQuerier's methods? like here

return m.upstream.LabelValues(ctx, id, name, filteredMatchers...)

If that's possible, then we can remove tenantQuerier, MergeQuerierUpstream and MergeQuerierCallbacks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't just call user.InjectOrgID, because the enterprise product doesn't deal with federation IDs in the same way. I.e., it doesn't deal with tenant IDs in the context.

pkg/querier/tenantfederation/merge_queryable.go Outdated Show resolved Hide resolved
pkg/querier/tenantfederation/merge_queryable_test.go Outdated Show resolved Hide resolved
pkg/querier/tenantfederation/merge_queryable_test.go Outdated Show resolved Hide resolved
@dimitarvdimitrov
Copy link
Contributor

looks like others are happy with the change; i couldn't find any bugs in the code i've reviewed so far, so if you're confident, you can merge this without waiting for my full review (which might only come next week or on Sunday)

@aknuds1
Copy link
Contributor Author

aknuds1 commented Sep 27, 2023

There were quite a few stylistic changes that make this already large change set even larger. For me at least this makes the review more error-prone

@dimitarvdimitrov are you referring to the tenantfederation package? Changes were necessary there to support our enterprise product, which customizes federation logic. In general, I do try to avoid purely stylistic changes.

Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
@pracucci pracucci self-requested a review September 28, 2023 09:37
Copy link
Collaborator

@pracucci pracucci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good job Arve! I can feel your pain while dealing with this PR 🤯 The changes to the query federation are quite involved, but I can't see anything wrong just looking at the diff.

I left few comments I would be glad if you could take a look before merging.

CHANGELOG.md Outdated Show resolved Hide resolved
@@ -2214,7 +2212,9 @@ func (i *Ingester) createTSDB(_ context.Context, userID string, walReplayConcurr
// this will actually create the blocks. If there is no data (empty TSDB), this is a no-op, although
// local blocks compaction may still take place if configured.
level.Info(userLogger).Log("msg", "Running compaction after WAL replay")
err = db.Compact()
// Note that we want to let TSDB creation finish without being interrupted by eventual context cancellation,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is necessarily true. createTSDB is also done when ingester is starting, and if we get signal to stop ingester in that time, we don't need to wait until compaction finishes.

But I think it may be safer to wait until Compact() has done. I don't think there's a urgency of shutting down ingester, but there's an high importancy of doing it safely.

I personally agree with the comment and using context.Background() to not introduce more chaos to Compact().

pkg/ingester/label_names_and_values.go Show resolved Hide resolved
pkg/ingester/user_tsdb.go Outdated Show resolved Hide resolved
aknuds1 and others added 4 commits September 28, 2023 13:05
Co-authored-by: Marco Pracucci <marco@pracucci.com>
Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
@aknuds1 aknuds1 merged commit d7567db into main Sep 28, 2023
28 checks passed
@aknuds1 aknuds1 deleted the arve/revendor-prometheus branch September 28, 2023 11:43
aknuds1 added a commit that referenced this pull request Oct 18, 2023
In Ingester.QueryStream, ignore context cancellation wrt. chunk
queriers, the way it worked prior to
#6085. If the context is canceled
though, log it with full diagnostics.

Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
aknuds1 added a commit that referenced this pull request Oct 19, 2023
…or chunk queriers (#6408)

* Ingester.QueryStream: Add support for ignoring context cancellation for chunk queriers

In Ingester.QueryStream, support ignoring of context cancellation wrt. chunk
queriers, the way it worked prior to
#6085. If the context is canceled
though, (span) log it with full diagnostics.

Also wrap some chunk querying errors.

---------

Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
grafanabot pushed a commit that referenced this pull request Oct 19, 2023
…or chunk queriers (#6408)

* Ingester.QueryStream: Add support for ignoring context cancellation for chunk queriers

In Ingester.QueryStream, support ignoring of context cancellation wrt. chunk
queriers, the way it worked prior to
#6085. If the context is canceled
though, (span) log it with full diagnostics.

Also wrap some chunk querying errors.

---------

Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
(cherry picked from commit f737a96)
aknuds1 added a commit that referenced this pull request Oct 19, 2023
…or chunk queriers (#6408) (#6440)

* Ingester.QueryStream: Add support for ignoring context cancellation for chunk queriers

In Ingester.QueryStream, support ignoring of context cancellation wrt. chunk
queriers, the way it worked prior to
#6085. If the context is canceled
though, (span) log it with full diagnostics.

Also wrap some chunk querying errors.

---------

Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
(cherry picked from commit f737a96)

Co-authored-by: Arve Knudsen <arve.knudsen@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Querier spans are not nested correctly
7 participants