Thanos planner component proposal #4458

andrejbranch · 2021-07-20T20:15:01Z

I added CHANGELOG entry for this change.
Change is not relevant to the end user.

jeromeinsf · 2021-07-21T15:12:55Z

Looks related to cortexproject/cortex#4272
Might be worth aligning the 2

bwplotka

Amazing work! Thanks for this, sorry for late review - was on PTO

@thanos-io/thanos-maintainers PTAL 🤗

Some initial comments, otherwise it looks good. Let's indeed check Cortex plan - we are touching same part and looks like we want to solve same problem

bwplotka · 2021-07-29T16:49:46Z

docs/proposals-accepted/202107-planner-component.md

+
+## Goals
+
+* To separate planner into its own component and have compactors communicate with the planner to get the next available plan.


I think the goals are behind this one. Would be nice outline our motivation here. I think it's prerquisite to scale horizontally compactor, especially within same tenant/external labels... no? 🤗 - which is essentially breaking from singleton model

bwplotka · 2021-07-29T16:51:03Z

docs/proposals-accepted/202107-planner-component.md

+  CompactionPlan = 1
+}
+```
+


Can we mention some alternatives?

I can think of one: Coordination free gossip-based planning and compaction

bwplotka · 2021-07-29T16:52:28Z

docs/proposals-accepted/202107-planner-component.md

+## Proposal
+![where](../img/thanos_proposal_planner_component.png)
+
+### Separate planner into its own component.


Suggested change

### Separate planner into its own component.

### Step 1: Separate planner into its own component.

bwplotka · 2021-07-29T16:52:36Z

docs/proposals-accepted/202107-planner-component.md

+### Separate planner into its own component.
+For the initial implementation a reasonable amount of the current planner code could be reused. One difference is the new planner will need to run plans for all tenants. The planner should maintain a priority queue of available plans with fair ordering between tenants. The planner should also be aware of which plans are currently running and be able to handle the failure of a compaction plan gracefully. After the completion of this proposal, planner can be updated to improve single tenant compaction performance, following the goals of the [compaction group concurrency pull request](https://github.com/thanos-io/thanos/pull/3807).
+
+### GRPC pubsub (bidirectional streaming)


Suggested change

### GRPC pubsub (bidirectional streaming)

### Step 2: GRPC pubsub (bidirectional streaming)

bwplotka · 2021-07-29T16:54:32Z

docs/proposals-accepted/202107-planner-component.md

+
+* Horizontally scaling compaction for a single tenant will be addressed following the completion of this proposal
+
+## Proposal


Can we have some ### Risks section where we can explain what we do to ensure Planner is not Single point of failure? (HA / Scalability)

bill3tt

Nice proposal 👍 As a community, agreeing upon an approach to refactoring and improving the compactor is long overdue.

Personally, I don't think implementing this as a separate Thanos component is the right approach. Critically, we would break existing users when they upgraded. Preferably, we would follow the pattern set by receive.

By default it will run as a single instance, but can also be configured in separate routing and ingesting modes for advanced users. Scaling up compactor would be an advanced feature, so we could reasonably expect the user to understand the differences and be able to configure it accordingly.

The default behaviour would be to run this work in a configurable number of local worker goroutines so vertically scaling would bring performance benefits, but the advanced feature would be to offload to remote workers so they could horizontally scale workloads with an HPA.

With this in mind, I think there are two sub-problems we should solve in order:

Make compactor scale vertically on one node by paralellizing compactions.
Make compactor offload compaction tasks to workers.

bill3tt · 2021-07-29T16:33:39Z

docs/proposals-accepted/202107-planner-component.md

+
+## Goals
+
+* To separate planner into its own component and have compactors communicate with the planner to get the next available plan.


IMO this is not a goal of the proposal per-se but how you propose satisfying the goal - reading this it sounds like the goal is to make compactor run faster? Or perhaps to increase block compaction throughput?

bill3tt · 2021-07-29T16:39:54Z

docs/proposals-accepted/202107-planner-component.md

+![where](../img/thanos_proposal_planner_component.png)
+
+### Separate planner into its own component.
+For the initial implementation a reasonable amount of the current planner code could be reused. One difference is the new planner will need to run plans for all tenants. The planner should maintain a priority queue of available plans with fair ordering between tenants. The planner should also be aware of which plans are currently running and be able to handle the failure of a compaction plan gracefully. After the completion of this proposal, planner can be updated to improve single tenant compaction performance, following the goals of the [compaction group concurrency pull request](https://github.com/thanos-io/thanos/pull/3807).


How would the planner prioritise plans? What would it be optimising for?

I think we can do that on later stages - there are many ideas we want to do (:

stale · 2021-10-02T00:13:28Z

Is this still relevant? If so, what is blocking it? Is there anything you can do to help move it forward?

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.

yeya24 · 2021-11-22T08:41:16Z

It would be nice to revisit this one someday.
This is very similar to https://rockset.com/blog/remote-compactions-in-rocksdb-cloud/.
We definitely want to make planner a separate component. And there are multiple ways to implement the workers:

Have separate compaction workers to listen on some compaction plan RPCs
For k8s specific environments, workers can be k8s jobs that download blocks, compact them and upload the final block
...

roystchiang · 2022-01-17T22:46:59Z

Is it possible for me to continue this work?

I would like the planner part to be compatible with Cortex, so that both systems can enjoy the benefit it brings. While we flesh out whether a dedicated planner component is required, I believe both projects can already enjoy some improvement if the compactor is able to spit out parallelizable plans.

bwplotka · 2022-01-20T09:21:45Z

Go for it @roystchiang ! Looks like it went stale. Feel free to take @andrejbranch work and open your own PR, we can close this one 🤗 Thanks! We definitely need help on this!

stale · 2022-04-16T02:54:23Z

Is this still relevant? If so, what is blocking it? Is there anything you can do to help move it forward?

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.

Thanos planner component proposal

6e79b91

bwplotka reviewed Jul 29, 2021

View reviewed changes

bill3tt reviewed Jul 29, 2021

View reviewed changes

stale bot added the stale label Oct 2, 2021

stale bot closed this Oct 11, 2021

yeya24 reopened this Nov 9, 2021

stale bot removed the stale label Nov 22, 2021

roystchiang mentioned this pull request Jan 6, 2022

Add shuffle-sharding for the compactor cortexproject/cortex#4433

Merged

3 tasks

stale bot added the stale label Apr 16, 2022

stale bot closed this Apr 25, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Thanos planner component proposal #4458

Thanos planner component proposal #4458

andrejbranch commented Jul 20, 2021

jeromeinsf commented Jul 21, 2021

bwplotka left a comment

bwplotka Jul 29, 2021

bwplotka Jul 29, 2021

bwplotka Jul 29, 2021

bwplotka Jul 29, 2021

bwplotka Jul 29, 2021

bill3tt left a comment

bill3tt Jul 29, 2021

bill3tt Jul 29, 2021

bwplotka Jul 31, 2021

stale bot commented Oct 2, 2021

yeya24 commented Nov 22, 2021

roystchiang commented Jan 17, 2022

bwplotka commented Jan 20, 2022

stale bot commented Apr 16, 2022


		## Goals

		* To separate planner into its own component and have compactors communicate with the planner to get the next available plan.

	### Separate planner into its own component.
	### Step 1: Separate planner into its own component.

	### GRPC pubsub (bidirectional streaming)
	### Step 2: GRPC pubsub (bidirectional streaming)


		* Horizontally scaling compaction for a single tenant will be addressed following the completion of this proposal

		## Proposal

Thanos planner component proposal #4458

Thanos planner component proposal #4458

Conversation

andrejbranch commented Jul 20, 2021

jeromeinsf commented Jul 21, 2021

bwplotka left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bill3tt left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

stale bot commented Oct 2, 2021

yeya24 commented Nov 22, 2021

roystchiang commented Jan 17, 2022

bwplotka commented Jan 20, 2022

stale bot commented Apr 16, 2022