Vertical Block Sharding Proposal #4191

Biswajitghosh98 · 2021-05-06T09:07:16Z

This is the first draft of the vertical block sharding proposal. Open for reviews :)

bwplotka

Looks actually guide good, especially when you elaborated it to me! (:

Some ideas during our 1:1

Add some diagram to make it easier to understand (similar to what you draw me)
Elaborate test cases we discussed offline
- How many resources/time would be needed to perform such multi shard compaction? Is there anything blocking to make streaming?
plus minor suggestions below.

docs/proposals/202011_vertical_block_sharding.md

bwplotka · 2021-05-07T10:47:08Z

docs/proposals/202011_vertical_block_sharding.md

+The current method ignores a block for further compaction once its index reaches a limit. We'd let that stay as it is, with a caveat that whenever a blocks index size hits the limit, we'd find the first 0-valued hash_number (let's say i), then further sub-divide (shard) the existing block with the corresponding `hash_function_i`, and then let the compaction process run as usual.
+Here also, we'd allow grouper and planner to compact (horizontally and vertically) only those blocks together whose `hash_number` external labels match completely. 
+The advantage of this method is, that it would allow us to help grow the size of the block in terms of time difference  (`maxt - mint`) exponentially, so we can make an estimated guess that n would not exceed 3 or 4. 
+The negative aspect of this approach would be, as blocks grow larger, it would become difficult to find a subsequent block to compact it with.


I would also add hash function to metada

Yes, makes sense to me.

docs/proposals/202011_vertical_block_sharding.md

hitanshu-mehta · 2021-05-13T17:55:09Z

docs/proposals/202011_vertical_block_sharding.md


+For this approach, we'd need to add two extra external label `hash_number`, initialized to 0, `shard_level` initialized to 0, denoting the number of times it has been sharded during the compaction process, and a set of 16 `hash_function`s. The reason for chosing 16 is because 2 raised to the power 16 is 65536, which is a logical upper bound for the number of shards a user might want to have.


Are hash_level and shard_level the same thing?

@hitanshu-mehta Yes. Apologies if it was unclear from my end :)

No worries :) I think we should use the same term everywhere in the proposal because it can lead up to confusion sometimes. WDYT?

@hitanshu-mehta Yes, definitely. I'll change it and add more pictures ASAP

hitanshu-mehta · 2021-05-13T18:05:21Z

docs/proposals/202011_vertical_block_sharding.md

+* If the compacted block is valid, then continue the usual compaction process until it results into an invalid block.
+* If the compacted block is invalid, then add the no compact filter to all the participating blocks.
+
+We can keep growing the binary tree by recursively using the same logic. The reason for calling this adaptive is because at any point of time, the total number of shards is equal to the number of nodes on the tree (say x). If any of the leaves overloads (a block in that particular shards hits the limit), that particular leaf would split into 2, effectively increasing the number of shards to x+2, hence providing just the right amount of shards to get the job done. 


I think the total number of shards should be equal to the number of leaf nodes and every vertical sharding will increase the number of shards to x + 1.

@hitanshu-mehta Say we have 10 blocks in shard (0,0), and we're compacting 3 blocks out of those together. Suppose the resultant block hits the limit, then we'll shard the 3 participating blocks, leaving 7 blocks at shard (0,0), 3 at (1,0) and 3 at (1,1), increasing the number of shards from 1 to 3

Thank you for clearing things up! I was calculating shards the different(maybe wrong) way. To give a rough idea how I calculated,

block_number(hash_number, hash_level)

1(0,0) 2(0,0) 3(0,0) *4(0,0) Compaction of blocks 1(0,0), 2(0,0) and 3(0,0) hits the limit / \ / \ / \ 1(0,1) *1(1,1) 2(0,1) *2(1,1) 3(0,1) *3(1,1) Compaction of blocks 1(0,1), 2(0,1) and 3(0,1) hits the limit / \ / \ / \ *1(0,2) *1(1,2) *2(0,2) *2(1,2) *3(0,2) *3(1,2)

The total number of stared(*) blocks is the number of shards.

@hitanshu-mehta Oh, I see. But this is different from what I had in mind. The proposal would need more pictures and re-wording to make it more clear for everyone, I suppose :)

docs/proposals/202011_vertical_block_sharding.md

Co-authored-by: Hitanshu Mehta <44025541+hitanshu-mehta@users.noreply.github.com>

stale · 2021-07-16T23:21:18Z

Is this still relevant? If so, what is blocking it? Is there anything you can do to help move it forward?

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.

stale · 2021-09-22T02:43:50Z

Is this still relevant? If so, what is blocking it? Is there anything you can do to help move it forward?

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.

yeya24 · 2022-05-28T17:51:08Z

Can we revisit this proposal?

bwplotka · 2022-06-23T08:40:09Z

Yes, it still needs some work.

fpetkovski · 2022-06-23T13:50:27Z

docs/proposals/202011_vertical_block_sharding.md


-* Special metadata in meta.json indicating a shard.
+* We'll be using a special metadata in meta.json indicating a shard.


Could this be another external label?

Yes, I think hash_number and shard_level are external labels as mentioned below.

fpetkovski · 2022-06-23T13:52:23Z

docs/proposals/202011_vertical_block_sharding.md


-## Alternatives
+We'll be using `hash_level` and `hash_number` to group and plan blocks to compact together. The way it'd work is, we'll allow grouper to group only those blocks together whose `hash_level`s are same. Also, we'd allow planner to compact only those blocks together that share the same `hash_number`. So, with this, the compactor would run as it is unless a compacted block has hit the limit for the first time.


Does this mean blocks can get merged even if they're not part of the same stream (have the same external labels)?

No, I think it is not mentioned explicitly but all external labels should be the same during the grouping.

fpetkovski · 2022-06-23T13:52:55Z

docs/proposals/202011_vertical_block_sharding.md

-* Shard number stored in external label.
+![vbs-image1](../img/vertical_block_sharding_image1.png)
+
+We're allowing a compacted block to hit the limit for a maximum of 16 times (if not specified otherwise by the user), and at the 16th level, if/when it further compacts and hits the limit, we're marking it for no more further compactions. The algorithm for sharding would come into play once a compacted block is declared invalid, or its size hits the limit. If the compacted block is still valid, but its index size is too large (or the number of series it holds exceeds the limit decided by the user), we'll look at the current `hash_level`(say `hl`) of the compacted block (if valid), or the blocks participating in the compaction process (if the compacted block is invalid), then we'll either shard the compacted block or the participating blocks (depending on the situation), `using hash_function[hl+1]`, and hence set the `hash_level` of the newly created blocks to `hl+1`, and then upload them for further compaction processes, if `hl` is not equal to `max_shard_level_limit`, which is 16 by default. In case `hl` is equal to `max_shard_level_limit`, we will take one of the 2 decisions depending on whether the compacted block is valid or invalid. 


What does it mean for an index size to be too large? Will this be configurable?

Yeah I think it should be configurable. We have supported planning considering index size from #3410 and the index limit is configurable.

yeya24 · 2022-06-24T02:17:33Z

Can you describe more about the cuckoo filter part? How can we use it to improve the query performance at store gateway? Is it related to this proposal?

yeya24 · 2022-06-26T06:23:02Z

docs/proposals/202011_vertical_block_sharding.md

@@ -87,17 +87,49 @@ index size).

 ## Proposal

-**We propose to vertically shard all blocks which goes beyond the specified index size limit during the compaction process.**
+**We propose to vertically shard blocks when it reaches the cap limit in terms of number of series or size of index file, and then adaptively decide the number of shards depending on the users requirement**


Do we still want to wait for the block reaching the cap limit to do sharding? With vertical sharding, I think we want to shard the blocks eagerly to get better query performance.

fpetkovski · 2022-06-27T06:56:58Z

docs/proposals/202011_vertical_block_sharding.md

+
+#### Decision 1: Using size of index file vs number of metrics in a block as an indicator for sharding
+
+The pros and cons for selecting one design choice over the other is yet to be discovered, and would be more clear after implementation and testing.


I wonder how the proposed approach compares to the current workaround of setting a maximum index size per block.

Biswajitghosh98 added 3 commits May 6, 2021 12:27

Update 202011_vertical_block_sharding.md

11ddafa

Update 202011_vertical_block_sharding.md

5de0618

Update 202011_vertical_block_sharding.md

c41d3ff

bwplotka reviewed May 7, 2021

View reviewed changes

docs/proposals/202011_vertical_block_sharding.md Outdated Show resolved Hide resolved

Biswajitghosh98 added 4 commits May 11, 2021 04:24

Update 202011_vertical_block_sharding.md

99429e9

Update 202011_vertical_block_sharding.md

d4d511d

Add files via upload

a18ad22

Update 202011_vertical_block_sharding.md

c88e9a4

yashrsharma44 mentioned this pull request May 13, 2021

Improvements on Thanos Compactor #4233

Open

hitanshu-mehta reviewed May 13, 2021

View reviewed changes

Update docs/proposals/202011_vertical_block_sharding.md

68e3413

Co-authored-by: Hitanshu Mehta <44025541+hitanshu-mehta@users.noreply.github.com>

stale bot added the stale label Jul 16, 2021

stale bot closed this Jul 24, 2021

yeya24 reopened this Jul 24, 2021

stale bot removed the stale label Jul 24, 2021

stale bot added the stale label Sep 22, 2021

stale bot closed this Oct 11, 2021

bwplotka reopened this Jun 23, 2022

pull-request-size bot added the size/M label Jun 23, 2022

stale bot removed the stale label Jun 23, 2022

bwplotka added the dont-go-stale Label for important issues which tells the stalebot not to close them label Jun 23, 2022

This was referenced Jun 23, 2022

proposal: Added proposal for vertical block sharding (limit size of block) #3390

Closed

Limit size of blocks to X bytes on compaction. #3068

Closed

Vertical Block Sharding #5437

Open

fpetkovski reviewed Jun 23, 2022

View reviewed changes

yeya24 reviewed Jun 26, 2022

View reviewed changes

fpetkovski reviewed Jun 27, 2022

View reviewed changes

fpetkovski mentioned this pull request Nov 11, 2022

Consider using sharded compactor from mimir-prometheus for vertical block splitting. #5890

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Vertical Block Sharding Proposal #4191

Vertical Block Sharding Proposal #4191

Biswajitghosh98 commented May 6, 2021

bwplotka left a comment

bwplotka May 7, 2021

Biswajitghosh98 May 10, 2021

hitanshu-mehta May 13, 2021

Biswajitghosh98 May 16, 2021

hitanshu-mehta May 17, 2021

Biswajitghosh98 May 17, 2021

hitanshu-mehta May 13, 2021

Biswajitghosh98 May 16, 2021

hitanshu-mehta May 17, 2021 •

edited

Loading

Biswajitghosh98 May 17, 2021

stale bot commented Jul 16, 2021

stale bot commented Sep 22, 2021

yeya24 commented May 28, 2022 •

edited

Loading

bwplotka commented Jun 23, 2022

fpetkovski Jun 23, 2022

yeya24 Jun 24, 2022

fpetkovski Jun 23, 2022

yeya24 Jun 24, 2022 •

edited

Loading

fpetkovski Jun 23, 2022

yeya24 Jun 24, 2022

yeya24 commented Jun 24, 2022

yeya24 Jun 26, 2022

fpetkovski Jun 27, 2022 •

edited

Loading


		For this approach, we'd need to add two extra external label `hash_number`, initialized to 0, `shard_level` initialized to 0, denoting the number of times it has been sharded during the compaction process, and a set of 16 `hash_function`s. The reason for chosing 16 is because 2 raised to the power 16 is 65536, which is a logical upper bound for the number of shards a user might want to have.


		* Special metadata in meta.json indicating a shard.
		* We'll be using a special metadata in meta.json indicating a shard.


		## Alternatives
		We'll be using `hash_level` and `hash_number` to group and plan blocks to compact together. The way it'd work is, we'll allow grouper to group only those blocks together whose `hash_level`s are same. Also, we'd allow planner to compact only those blocks together that share the same `hash_number`. So, with this, the compactor would run as it is unless a compacted block has hit the limit for the first time.


		#### Decision 1: Using size of index file vs number of metrics in a block as an indicator for sharding

		The pros and cons for selecting one design choice over the other is yet to be discovered, and would be more clear after implementation and testing.

Vertical Block Sharding Proposal #4191

Are you sure you want to change the base?

Vertical Block Sharding Proposal #4191

Conversation

Biswajitghosh98 commented May 6, 2021

bwplotka left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hitanshu-mehta May 17, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

stale bot commented Jul 16, 2021

stale bot commented Sep 22, 2021

yeya24 commented May 28, 2022 • edited Loading

bwplotka commented Jun 23, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yeya24 Jun 24, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yeya24 commented Jun 24, 2022

Choose a reason for hiding this comment

fpetkovski Jun 27, 2022 • edited Loading

Choose a reason for hiding this comment

hitanshu-mehta May 17, 2021 •

edited

Loading

yeya24 commented May 28, 2022 •

edited

Loading

yeya24 Jun 24, 2022 •

edited

Loading

fpetkovski Jun 27, 2022 •

edited

Loading