Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

receive: Hashring Update Improvements #3141

Open
squat opened this issue Sep 8, 2020 · 17 comments
Open

receive: Hashring Update Improvements #3141

squat opened this issue Sep 8, 2020 · 17 comments

Comments

@squat
Copy link
Member

squat commented Sep 8, 2020

Currently, any change to the hashring configuration file will trigger all Thanos Receive nodes to flush their multi-TSDBs, causing them to enter an unready state until the flush is complete. This unavailability during a flush allows for a clear state transition, however it can result in downtimes on the order of five minutes for every configuration change. Moreover, during configuration changes, the hashring goes through an even longer period of partial unreadiness, where some nodes begin and finish flushing before and after others. During this partial unreadiness, the hashring can expect high internal request failure rates, which cause clients to retry their requests, resulting in even higher load. Therefore, when the hashring configuration is changed due to automatic horizontal scaling of a set of Thanos Receivers, the system can expect higher than normal resource utilization, which can create a positive feedback loop that continuously scales the hashring.

We propose modifying how the Thanos Receive component re-configures itself after the hashring configuration file has changed so that the system experiences no downtime. Our plan is for Thanos Receive to create a new multi-TSDB instance to replace the multi-TSDB instance it is using to ingest data. Once the swap has been completed in a concurrent-safe manner, the old multi-TSDB can be flushed. This live swap has the benefit of eliminating the unready state that would have occurred due to the configuration change. Furthermore, any partial unreadiness in the entire hashring will be shortened and limited exclusively to the instant when some nodes have loaded the new configuration before others. The duration of this configuration discrepancy can be further reduced in cloud native environments using sidecars that watch an API for updates to the configuration and apply it to disk as soon as a change is identified.

A major benefit of avoiding unreadiness during the application of configuration changes is that the generation of the configuration itself can now safely be based upon the readiness of the individual nodes in the hashring without causing a feedback loop. This means that as a hashring is incrementally scaled up, only nodes that are finished starting up will be considered for membership in the hashring, avoiding black holes in the internal request forwarding logic.

A downside of this multi-multi-TSDB approach is that the resource utilization of the Receive is now dependent on the frequency with which the configuration is changed, as frequent updates to the configuration would mean many multi-TSDB instances are open concurrently. This is likely a safe trade-off, given that short-lived multi-TSDB instances will likely have very little data in memory and will require relatively little resources to flush and close.

cc @thanos-io/thanos-maintainers
cc @brancz

@jaybatra26
Copy link

Hi! Can I take this up as a part of my community Bridge program?

@stale
Copy link

stale bot commented Nov 20, 2020

Hello 👋 Looks like there was no activity on this issue for the last two months.
Do you mind updating us on the status? Is this still reproducible or needed? If yes, just comment on this PR or push a commit. Thanks! 🤗
If there will be no activity in the next two weeks, this issue will be closed (we can always reopen an issue if we need!). Alternatively, use remind command if you wish to be reminded at some point in future.

@stale stale bot added the stale label Nov 20, 2020
@kakkoyun kakkoyun removed the stale label Nov 20, 2020
@stale
Copy link

stale bot commented Jan 19, 2021

Hello 👋 Looks like there was no activity on this issue for the last two months.
Do you mind updating us on the status? Is this still reproducible or needed? If yes, just comment on this PR or push a commit. Thanks! 🤗
If there will be no activity in the next two weeks, this issue will be closed (we can always reopen an issue if we need!). Alternatively, use remind command if you wish to be reminded at some point in future.

@stale stale bot added the stale label Jan 19, 2021
@jmichalek132
Copy link
Contributor

Still needed.

@stale stale bot removed the stale label Jan 20, 2021
@kakkoyun kakkoyun changed the title Thanos Receive: Hashring Update Improvements receive: Hashring Update Improvements Feb 12, 2021
@stale
Copy link

stale bot commented Apr 18, 2021

Hello 👋 Looks like there was no activity on this issue for the last two months.
Do you mind updating us on the status? Is this still reproducible or needed? If yes, just comment on this PR or push a commit. Thanks! 🤗
If there will be no activity in the next two weeks, this issue will be closed (we can always reopen an issue if we need!). Alternatively, use remind command if you wish to be reminded at some point in future.

@stale stale bot added the stale label Apr 18, 2021
@yashrsharma44
Copy link
Contributor

Our plan is for Thanos Receive to create a new multi-TSDB instance to replace the multi-TSDB instance it is using to ingest data. Once the swap has been completed in a concurrent-safe manner, the old multi-TSDB can be flushed.

Could you elaborate on the fact that why we need to swap the tsdb data before we could flush it?

@stale stale bot removed the stale label May 31, 2021
@onprem
Copy link
Member

onprem commented Jun 1, 2021

Could you elaborate on the fact that why we need to swap the tsdb data before we could flush it?

When we are flushing a TSDB instance, it can't ingest any new samples. This means that during such situations (when we are flushing the TSDB) the Receiver becomes unready. To avoid this, we can start a new multiTSDB and switch to that for ingestion, while in background, we flush the old multiTSDB.

@yashrsharma44
Copy link
Contributor

When we are flushing a TSDB instance, it can't ingest any new samples. This means that during such situations (when we are flushing the TSDB) the Receiver becomes unready. To avoid this, we can start a new multiTSDB and switch to that for ingestion, while in background, we flush the old multiTSDB.

So effectively we are switching to a new multiTSDB rather than swapping data, the original statement was little misleading.

@squat
Copy link
Member Author

squat commented Jun 1, 2021

Let's be careful with our words here: I don't think there is anything"misleading" in the text, as that implies negative intent.

"Our plan is for Thanos Receive to create a new multi-TSDB instance to replace the multi-TSDB instance it is using to ingest data."

To me, this says exactly what you paraphrased from Prem. It never mentions swapping data, only swapping, ie replacing TSDBs.

Maybe it was unclear to you? Or perhaps the word "swap" is confusing because of its use in memory management? Could you share which part of the text in your mind suggests copying data?

@yashrsharma44
Copy link
Contributor

Sure, I didn't mean the statement as "misleading", but more like "unclear", should have correctly used the adjective 😅.

Regarding the swap, I got confused with swapping the data of oldTsdb into newTsdb.
Especially this statement -

Once the swap has been completed in a concurrent-safe manner,

Suggests that we might be moving data or switching to new tsdb which is not clear, hence the confusion 😛

@yashrsharma44
Copy link
Contributor

Our plan is for Thanos Receive to create a new multi-TSDB instance to replace the multi-TSDB instance it is using to ingest data.

Regarding the newTSDB, how are we planning to switch to it in a concurrent manner? Should we use proceed as -

  1. Get reference to the oldMultiTSDB and start flushing the old tsdb using the reference.
  2. Create a newMultiTSDB and store it's reference to oldTSDB.
  3. We might need RWLock while we perform step 2.

Ideas?

@stale
Copy link

stale bot commented Aug 3, 2021

Hello 👋 Looks like there was no activity on this issue for the last two months.
Do you mind updating us on the status? Is this still reproducible or needed? If yes, just comment on this PR or push a commit. Thanks! 🤗
If there will be no activity in the next two weeks, this issue will be closed (we can always reopen an issue if we need!). Alternatively, use remind command if you wish to be reminded at some point in future.

@stale stale bot added the stale label Aug 3, 2021
@GiedriusS GiedriusS removed the stale label Aug 3, 2021
@stale
Copy link

stale bot commented Oct 11, 2021

Hello 👋 Looks like there was no activity on this issue for the last two months.
Do you mind updating us on the status? Is this still reproducible or needed? If yes, just comment on this PR or push a commit. Thanks! 🤗
If there will be no activity in the next two weeks, this issue will be closed (we can always reopen an issue if we need!). Alternatively, use remind command if you wish to be reminded at some point in future.

@stale stale bot added the stale label Oct 11, 2021
@stale
Copy link

stale bot commented Oct 30, 2021

Closing for now as promised, let us know if you need this to be reopened! 🤗

@kakkoyun kakkoyun removed the stale label Nov 18, 2021
@stale
Copy link

stale bot commented Mar 2, 2022

Hello 👋 Looks like there was no activity on this issue for the last two months.
Do you mind updating us on the status? Is this still reproducible or needed? If yes, just comment on this PR or push a commit. Thanks! 🤗
If there will be no activity in the next two weeks, this issue will be closed (we can always reopen an issue if we need!). Alternatively, use remind command if you wish to be reminded at some point in future.

@stale stale bot added the stale label Mar 2, 2022
@stale
Copy link

stale bot commented Apr 17, 2022

Closing for now as promised, let us know if you need this to be reopened! 🤗

@stale stale bot closed this as completed Apr 17, 2022
@GiedriusS GiedriusS reopened this Apr 17, 2022
@stale stale bot removed the stale label Apr 17, 2022
@stale
Copy link

stale bot commented Sep 21, 2022

Hello 👋 Looks like there was no activity on this issue for the last two months.
Do you mind updating us on the status? Is this still reproducible or needed? If yes, just comment on this PR or push a commit. Thanks! 🤗
If there will be no activity in the next two weeks, this issue will be closed (we can always reopen an issue if we need!). Alternatively, use remind command if you wish to be reminded at some point in future.

@stale stale bot added the stale label Sep 21, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants