Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Check for errors when iterating through chunks or samples in response to a query #6451

Merged
merged 5 commits into from
Oct 24, 2023

Conversation

charleskorn
Copy link
Contributor

@charleskorn charleskorn commented Oct 23, 2023

What this PR does

This PR fixes an issue where ingesters could ignore errors encountered while iterating through chunks or samples in response to a query request.

Note to reviewers: I haven't added a test for this case as introducing an error does not seem straightforward with how we test the ingester query methods. I'm open to feedback or suggestions on how to do this though.

Which issue(s) this PR fixes or relates to

(none)

Checklist

  • [n/a] Tests updated
  • [n/a] Documentation added
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

@charleskorn charleskorn marked this pull request as ready for review October 23, 2023 01:14
@charleskorn charleskorn requested a review from a team as a code owner October 23, 2023 01:14
@aknuds1 aknuds1 added component/ingester bug Something isn't working labels Oct 23, 2023
Copy link
Contributor

@aknuds1 aknuds1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fix looks correct to me, but I'd also like to see tests. Not sure off the top of my head how to implement tests though, I'll try to see if I can figure out how.

@aknuds1 aknuds1 requested a review from a team October 23, 2023 06:48
Copy link
Contributor

@dimitarvdimitrov dimitarvdimitrov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix. I agree that it's hard to test this. I think the direct usage of *userTSDB makes it difficult to inject behaviour.

What if we cancelled the context of the Query operation? Would that be susceptible to a race condition? It's a bit of a contrived test, but it might help uncover this bug.

@charleskorn
Copy link
Contributor Author

What if we cancelled the context of the Query operation? Would that be susceptible to a race condition? It's a bit of a contrived test, but it might help uncover this bug.

I think that wouldn't guarantee we're exercising the changes in this PR - there are a number of different places that might respond to the context cancellation depending on when it occurs.

@charleskorn charleskorn mentioned this pull request Oct 24, 2023
1 task
@charleskorn
Copy link
Contributor Author

Given the approval from @dimitarvdimitrov (and offline discussion with @aknuds1), I'm going to set this to merge once CI is passing - but if anyone has any post-merge ideas on how to test this (or any other feedback), I'd love to hear them.

@charleskorn charleskorn enabled auto-merge (squash) October 24, 2023 09:13
@charleskorn charleskorn merged commit 83d0d7b into main Oct 24, 2023
28 checks passed
@charleskorn charleskorn deleted the charleskorn/check-for-errors branch October 24, 2023 09:27
@dimitarvdimitrov
Copy link
Contributor

What if we cancelled the context of the Query operation? Would that be susceptible to a race condition? It's a bit of a contrived test, but it might help uncover this bug.

I think that wouldn't guarantee we're exercising the changes in this PR - there are a number of different places that might respond to the context cancellation depending on when it occurs.

my approach would be to have a for loop in the test and run it enough times to reliably trigger the bug at least once during one go test. Not the best approach and it's fine by me if you choose not to add tests for this change

@charleskorn
Copy link
Contributor Author

my approach would be to have a for loop in the test and run it enough times to reliably trigger the bug at least once during one go test. Not the best approach and it's fine by me if you choose not to add tests for this change

https://github.com/grafana/mimir/compare/charleskorn/test-for-6451 is the most reliable test I've been able to find. I don't love it: it's tied very closely to the current structure of the Ingester type, and I'd need to refactor the other two code paths (samples and non-streaming chunks) to be able to do something similar for them. I don't think it's worth it, but open to other opinions or ideas.

pstibrany added a commit that referenced this pull request Nov 6, 2023
@pstibrany pstibrany mentioned this pull request Nov 6, 2023
@pstibrany
Copy link
Member

Reverted in #6576.

pstibrany added a commit that referenced this pull request Nov 6, 2023
grafanabot pushed a commit that referenced this pull request Nov 6, 2023
…response to a query (#6451)" (#6576)

This reverts commit 83d0d7b.

(cherry picked from commit ad50cda)
grafanabot pushed a commit that referenced this pull request Nov 6, 2023
…response to a query (#6451)" (#6576)

This reverts commit 83d0d7b.

(cherry picked from commit ad50cda)
pstibrany added a commit that referenced this pull request Nov 6, 2023
…response to a query (#6451)" (#6576) (#6578)

This reverts commit 83d0d7b.

(cherry picked from commit ad50cda)

Co-authored-by: Peter Štibraný <pstibrany@gmail.com>
pstibrany added a commit that referenced this pull request Nov 6, 2023
…response to a query (#6451)" (#6576) (#6577)

This reverts commit 83d0d7b.

(cherry picked from commit ad50cda)

Co-authored-by: Peter Štibraný <pstibrany@gmail.com>
charleskorn added a commit that referenced this pull request Nov 9, 2023
…ples in response to a query (#6451)" (#6576) (#6578)"

This reverts commit 83dc62b.
charleskorn added a commit that referenced this pull request Nov 29, 2023
…ples in response to a query (#6451)" (#6576)"

This reverts commit ad50cda.
@charleskorn charleskorn mentioned this pull request Nov 29, 2023
charleskorn added a commit that referenced this pull request Nov 30, 2023
* Revert "Revert "Check for errors when iterating through chunks or samples in response to a query (#6451)" (#6576)"

This reverts commit ad50cda.

* Update CHANGELOG.md
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working component/ingester
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants