Query: parallize query expression execution and sharding #5748

yeya24 · 2022-10-01T23:27:21Z

Is your proposal related to a problem?

#5342 This pr introduces the feature of sharding by clause. However, this requires grouping and limits the type of queries we can support to shard.

For a non shardable query like http_requests_total{code="400"} / http_requests_total, it is still executed by a single querier. The promql query engine analyzes the query, executes two Select call for {__name__="http_requests_total", code="400"} and {__name__="http_requests_total"} separately, and finally performs the binary operator calculation.

The issue is the two Select calls are still executed in the same querier.

Describe the solution you'd like

At the query frontend side, it can also analyze the query and splits the original query to 2 small queries http_requests_total{code="400"} and http_requests_total, and let the downstream queries execute the 2 queries separately. Query frontend performs the final calculation and returns the result. This is very similar to make query frontend as the query engine to perform the query analysis, results merge, while the work of Select are distributed to different querier instances.

Since we use the query frontend to split and generate jobs for each downstream querier, we can also make each expression more
parallelized with sharding. For example, the query http_requests_total{code="400"} / http_requests_total can be sharded into 8 small queries and executed in parallel if each expression shard number is 4.

http_requests_total{code="400"} shard 1 of 4 -> series hash mod % 4 = 0
http_requests_total{code="400"} shard 2 of 4 -> series hash mod % 4 = 1
http_requests_total{code="400"} shard 3 of 4 -> series hash mod % 4 = 2
http_requests_total{code="400"} shard 4 of 4 -> series hash mod % 4 = 3
http_requests_total shard 1 of 4 -> series hash mod % 4 = 0
http_requests_total shard 2 of 4 -> series hash mod % 4 = 1
http_requests_total shard 3 of 4 -> series hash mod % 4 = 2
http_requests_total shard 4 of 4 -> series hash mod % 4 = 3

Describe alternatives you've considered

I think this is something also doable at the new experimental engine https://github.com/thanos-community/promql-engine. But having the feature there will lose the ability to cache at the query frontend.

Additional context

(Write your answer here.)

The text was updated successfully, but these errors were encountered:

bwplotka · 2022-10-04T15:42:35Z

Hm, I am missing the benefit of splitting those two into separate queries. It feels as if we can shard the same no matter if it's split or not. On top of that I think it will be more efficient (and faster) to do / operation on one shard in the same engine and step, not perform / operation once we have all vectors result completed (unless we stream those, which we cannot on PromQL API level).

2c: Generally, I see your point, but I feel doing optimization and caching on query-frontend is not ideal as it treats promql as close box. Instead, I would lean towards an architecture where there is NO query frontend, and all the frontend logic is doable in Querier and its PromQL engine. Then Querier can offload work to other queries or some queue. It is capable of caching and distributing the work among each other. Think about this as receivers peers and querier peers. Less components and more efficient solution.

cc @fpetkovski

yeya24 · 2022-10-04T16:55:47Z

Overall I think they are all very good points, thanks for the feedback!

It feels as if we can shard the same no matter if it's split or not. On top of that I think it will be more efficient (and faster) to do / operation on one shard in the same engine and step, not perform / operation once we have all vectors result completed (unless we stream those, which we cannot on PromQL API level).

We can also do / per shard at the query frontend and then merge the results so that should be the same? It is basically the order of the 2 middlewares, we want to shard first or split by expression first. But one point is that the QF needs to deal with more data now.

Instead, I would lean towards an architecture where there is NO query frontend, and all the frontend logic is doable in Querier and its PromQL engine. Then Querier can offload work to other queries or some queue.

It would be ideal to have that kind of distributed query engine.

matej-g · 2022-10-05T14:40:19Z

2c: Generally, I see your point, but I feel doing optimization and caching on query-frontend is not ideal as it treats promql as close box. Instead, I would lean towards an architecture where there is NO query frontend, and all the frontend logic is doable in Querier and its PromQL engine. Then Querier can offload work to other queries or some queue. It is capable of caching and distributing the work among each other. Think about this as receivers peers and querier peers. Less components and more efficient solution.

I mean, don't we already do quite some of this front-end logic in query frontend? If we could eventually move this to architecture you're suggesting, that would be beneficial, but I guess until we'll able to do this on the Querier / PromQL level, Query Frontend is the second best option.

fpetkovski · 2022-10-07T11:27:46Z

My take on this is that if we are going to add additional computation to query frontend in the form of a PromQL engine, it might be better to work towards a general solution in QFE, similar to what Mimir does.

But in general, I tend to also lean in the direction of doubling down on the new engine and adding functionality there. I think it is now much easier to extend the engine with distributed computation, than to add additional logic to Query Frontend. It should also be more easily testable and maintainable. I agree that caching is still missing, and maybe even out of scope for an engine component, but I also don't have a good feeling on how useful it is to cache a single subtree.

yeya24 · 2023-01-12T08:03:18Z

I will close this one as I feel not worth it for now.

yeya24 closed this as completed Jan 12, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Query: parallize query expression execution and sharding #5748

Query: parallize query expression execution and sharding #5748

yeya24 commented Oct 1, 2022

bwplotka commented Oct 4, 2022 •

edited

Loading

yeya24 commented Oct 4, 2022

matej-g commented Oct 5, 2022

fpetkovski commented Oct 7, 2022

yeya24 commented Jan 12, 2023

Query: parallize query expression execution and sharding #5748

Query: parallize query expression execution and sharding #5748

Comments

yeya24 commented Oct 1, 2022

Is your proposal related to a problem?

Describe the solution you'd like

Describe alternatives you've considered

Additional context

bwplotka commented Oct 4, 2022 • edited Loading

yeya24 commented Oct 4, 2022

matej-g commented Oct 5, 2022

fpetkovski commented Oct 7, 2022

yeya24 commented Jan 12, 2023

bwplotka commented Oct 4, 2022 •

edited

Loading