Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow disabling of ring heartbeats by setting relevant options to zero. #4344

Merged
merged 3 commits into from
Jul 9, 2021

Conversation

stevesg
Copy link
Contributor

@stevesg stevesg commented Jul 7, 2021

What this PR does:
Allow disabling of ring heartbeats by setting relevant options to zero.

Depends on #4342

Checklist

  • Tests updated
  • Documentation added
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

Copy link
Contributor

@pracucci pracucci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The solution you've picked should work.

Initially, I thought that we should not set Timestamp when the instance is registered in the ring and we should not update it when an InstanceDesc is updated, but then I've realised memberlist actually requires that Timestamp change each time we do update InstanceDesc so we have to keep all that logic and just disabled the hearbeating part.

However, I would update the ring web page to clearly state when heartbeating is disabled.

CHANGELOG.md Show resolved Hide resolved
Copy link
Contributor

@tomwilkie tomwilkie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM;

+1 to Marco's suggestion on including the heartbeat value / disabled on the ring info page.

Not related to this change, but why is there so much duplication of ring config now?

@stevesg
Copy link
Contributor Author

stevesg commented Jul 8, 2021

Initially, I thought that we should not set Timestamp when the instance is registered in the ring and we should not update it when an InstanceDesc is updated, but then I've realised memberlist actually requires that Timestamp change each time we do update InstanceDesc so we have to keep all that logic and just disabled the hearbeating part.

I had the same worry that memberlist relies on the Timestamp to resolve conflicts but it looks that the Timestamp is updated whenever the ring state is updated (example). This makes sense as in my testing with very long heartbeats, I didn't encounter any issues.

Will update the status page

@stevesg
Copy link
Contributor Author

stevesg commented Jul 8, 2021

It doesn't seem trivial to add the heartbeat period to the status page, as the configuration actually exists in the lifecycler :/ Any suggestions?

@pracucci
Copy link
Contributor

pracucci commented Jul 8, 2021

It doesn't seem trivial to add the heartbeat period to the status page, as the configuration actually exists in the lifecycler :/ Any suggestions?

The easiest (but probably) hackiest one is to add the heartbeat period to ring.Config too, but not registered via a CLI flag and not exposed via YAML. Then, in the code, we do set it based on the setting coming from the lifecycler.

@bboreham
Copy link
Contributor

bboreham commented Jul 8, 2021

Not related to this change, but why is there so much duplication of ring config now?

Can't speak to the history but I agree there is a lot of duplication?
Maybe because the first variant didn't want the "lifecycler" behaviour?

@pracucci
Copy link
Contributor

pracucci commented Jul 9, 2021

Not related to this change, but why is there so much duplication of ring config now?

Can't speak to the history but I agree there is a lot of duplication?
Maybe because the first variant didn't want the "lifecycler" behaviour?

I'm the guilty one. Long time ago (when we built the blocks storage) we duplicated (once) the ring config to slim it down. The ring.Config has a bunch of options we didn't need for the compactor so to make the config more clear (instead of reusing the same config and having many options not working for compactor) we duplicated it. The same pattern than has been applied for other rings. It's something we can definitely cleanup.

Signed-off-by: Steve Simpson <steve.simpson@grafana.com>
Signed-off-by: Steve Simpson <steve.simpson@grafana.com>
Copy link
Contributor

@pracucci pracucci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The improvement to the HTTP ring page can be done in a follow up PR.

@pracucci pracucci enabled auto-merge (squash) July 9, 2021 11:29
@pracucci pracucci merged commit 6b8bd5a into cortexproject:master Jul 9, 2021
@edma2
Copy link
Contributor

edma2 commented Jul 20, 2021

I'd love to learn more about the motivation behind this PR (and #4342). I couldn't find much docs on this feature. Does this improve availability by ignoring heartbeat failures?

@bboreham
Copy link
Contributor

Updating a KV store every few seconds is difficult to scale past 100-200 ingesters.
As of #741 we have a heartbeat directly from distributors to ingesters, so the one via the ring should not be necessary.

Some of the story is here: https://grafana.com/blog/2020/02/11/how-were-abusing-hashicorps-consul-at-grafana-labs/

alvinlin123 pushed a commit to ac1214/cortex that referenced this pull request Jan 14, 2022
…o. (cortexproject#4344)

* Allow disabling of ring heartbeats by setting relevant options to zero.

Signed-off-by: Steve Simpson <steve.simpson@grafana.com>

* Review comments.

Signed-off-by: Steve Simpson <steve.simpson@grafana.com>
Signed-off-by: Alvin Lin <alvinlin@amazon.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants