-
Notifications
You must be signed in to change notification settings - Fork 512
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ingester remains in the LEAVING state if starting() terminates #6923
Conversation
Signed-off-by: Yuri Nikolic <durica.nikolic@grafana.com>
8d17c98
to
8a63795
Compare
Signed-off-by: Yuri Nikolic <durica.nikolic@grafana.com>
7af85ab
to
096d332
Compare
Signed-off-by: Yuri Nikolic <durica.nikolic@grafana.com>
478887f
to
ea93f20
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the test is still doing too much.
Why do we need to start and stop ingester first? We can setup the ring to initial state that we want to test.
I think we should also add a scenario, where there is no ring entry before ingester starts, and it fails on first start.
Thinking about it, setting up the ring may end up being same amount of code :( |
@pstibrany I can add this scenario, but this will additionally increase the size of this source. |
We can move all the auxiliary structs as well as methods used for tests' intialization in a separate source (e.g., |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix lgtm.
e869bb3
to
0e0dbf4
Compare
Signed-off-by: Yuri Nikolic <durica.nikolic@grafana.com>
0e0dbf4
to
f78c73d
Compare
What this PR does
If starting the ingester service (
Ingester.starting()
) fails for any reason, the corresponding ingester might be in the ring in an inconsistent state. For example, if an ingester receives the SIGTERM while opening TSDBs (replaying WAL), the TSDB opening intentionally doesn't stop, although the context is cancelled (to reduce the likelihood of on-disk corruptions). Nevertheless, the ring lifecycler gets started, and the ingester switches fromLEAVING
toACTIVE
state in the ring, although theIngester.starting()
function was interrupted due to the context cancellation. The ingester, although terminated, remains in theACTIVE
state becauseIngester.stopping()
wasn’t called after the failedIngester.starting()
.This PR fixes the problem explained above, by explicitly stopping ingester's lifecycler, in case of an error during
Ingester.startint()
.Which issue(s) this PR fixes or relates to
Fixes #6753
Checklist
CHANGELOG.md
updated - the order of entries should be[CHANGE]
,[FEATURE]
,[ENHANCEMENT]
,[BUGFIX]
.about-versioning.md
updated with experimental features.