Unhealthy Compactors Stay in the Ring #142

joe-elliott · 2020-08-21T17:13:14Z

I've seen an unhealthy compactor stay in the ring for hours after it was gone. Research this and see if it's a matter of configuration or actually a bug of some kind.

Should we use a simpler discovery mechanism like DNS?

joe-elliott · 2020-09-16T13:27:54Z

This appears to happen occassionally on rollout. Also, even after manually forgetting them they come back. Perhaps due to gossip?

joe-elliott · 2020-11-03T15:56:24Z

If you are seeing this issue and are unable to successfully forget a compactor it is recommended to click the "Forget" button, wait a full 10 seconds, stand up, stretch, get all your grocery shopping done, come back and then hit F5. The compactor should be forgotten.

If you quickly spam "Forget" then old compactors seem to stay in the ring. This is believed to be an issue with the memberlist propagation of the ring.

annanay25 · 2020-12-17T14:05:07Z

Forget behaviour may be fixed with cortexproject/cortex#3603 and will reflect once we vendor in the latest cortex version.

Research for a way to not-care about a compactor disappearing from the ring.

gouthamve · 2021-01-22T19:22:19Z

Can this be closed?

joe-elliott · 2021-01-22T19:26:31Z

We believe that #442 fixed this issue, but have not seen it in again in our internal cluster to confirm. I'd rather keep this open until we verify.

slim-bean · 2021-02-03T13:39:57Z

This happened again, could not get the unhealthy compactor to leave, ended up port-forwarding 4-5 compactors between two people and clicking forget a lot and eventually it went away.

joe-elliott · 2021-02-03T13:43:18Z

@pstibrany has reported he feels the issue will be fixed in Cortex 1.7.0. We will keep an eye on it after the upgrade.

pstibrany · 2021-02-03T13:48:16Z

It's the same cortexproject/cortex#3603 fix, but Tempo currently doesn't use Cortex version with that fix in.

joe-elliott · 2021-02-09T19:12:01Z

Confirmed fixed in our environment by #512

Thanks @pstibrany!

joe-elliott · 2021-02-17T14:47:36Z

We've seen this again, but found a way to mitigate. The changes in Cortex have certainly made it easier to deal with, but it does still happen occassionally.

Details have been added to the appropriate runbook entries: #532

joe-elliott · 2021-03-09T20:27:24Z

Further updates on this. We have since switched to using this values in our memberlist config and have not been able to trigger this issue since:

memberlist:
    left_ingesters_timeout: 30m
    pull_push_interval: 15s

Still keeping an eye on things.

joe-elliott · 2021-08-13T12:51:59Z

Possible fixes going into Cortex now:

cortexproject/cortex#4420
cortexproject/cortex#4419

TODO:

revendor Cortex with these changes and confirm it fixes the issue
find sensible defaults for memberlist now that propagation is reduced
update runbooks to remove mention of this issue.
close this issue!!!

joe-elliott mentioned this issue Oct 1, 2020

Added override ring keys #202

Merged

joe-elliott mentioned this issue Oct 19, 2020

Clean up Gossip Shutdown #127

Closed

joe-elliott closed this as completed Feb 9, 2021

joe-elliott reopened this Feb 17, 2021

mdisibio added this to the kv store complete milestone Feb 25, 2021

joe-elliott mentioned this issue Aug 16, 2021

Upgrade Cortex to latest #878

Merged

3 tasks

joe-elliott closed this as completed in #878 Aug 16, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unhealthy Compactors Stay in the Ring #142

Unhealthy Compactors Stay in the Ring #142

joe-elliott commented Aug 21, 2020

joe-elliott commented Sep 16, 2020 •

edited

Loading

joe-elliott commented Nov 3, 2020 •

edited

Loading

annanay25 commented Dec 17, 2020

gouthamve commented Jan 22, 2021

joe-elliott commented Jan 22, 2021

slim-bean commented Feb 3, 2021

joe-elliott commented Feb 3, 2021

pstibrany commented Feb 3, 2021

joe-elliott commented Feb 9, 2021

joe-elliott commented Feb 17, 2021

joe-elliott commented Mar 9, 2021 •

edited

Loading

joe-elliott commented Aug 13, 2021 •

edited

Loading

Unhealthy Compactors Stay in the Ring #142

Unhealthy Compactors Stay in the Ring #142

Comments

joe-elliott commented Aug 21, 2020

joe-elliott commented Sep 16, 2020 • edited Loading

joe-elliott commented Nov 3, 2020 • edited Loading

annanay25 commented Dec 17, 2020

gouthamve commented Jan 22, 2021

joe-elliott commented Jan 22, 2021

slim-bean commented Feb 3, 2021

joe-elliott commented Feb 3, 2021

pstibrany commented Feb 3, 2021

joe-elliott commented Feb 9, 2021

joe-elliott commented Feb 17, 2021

joe-elliott commented Mar 9, 2021 • edited Loading

joe-elliott commented Aug 13, 2021 • edited Loading

joe-elliott commented Sep 16, 2020 •

edited

Loading

joe-elliott commented Nov 3, 2020 •

edited

Loading

joe-elliott commented Mar 9, 2021 •

edited

Loading

joe-elliott commented Aug 13, 2021 •

edited

Loading