apply multidimensional query request queuing: supply queue dimensions from frontend & utilize in scheduler #6772

francoposa · 2023-11-28T23:14:46Z

Changelog to be done once we solidify the shape of this.

This is the second part of our multi-dimensional queuing project to mitigate the issue where a single slow query component (like store-gateway) can also slow down a tenant's requests that don't hit that component (ingesters-only queries), or vice versa.

The first pull request #6533 introduced the TreeQueue datastructures which can handle an arbirary number of queue dimensions, but kept it to the single queue dimension of tenantID.

This pull request adds code to query-frontend to assign the additional queue dimension query component for a query (ingesters, store-gateway, or both), and adds the handling of that into the scheduler as well.

Various design discussions went into this, but the most critical were the decision that the calculation of additional queue dimensions should take place in the query-frontend.
The idea is that the query-frontend is already Prometheus-aware, that is it understands the shape of Prometheus queries and can make decisions based on Prometheus query attributes. The scheduler needs to utilize some arbitrary information to create additional queue dimensions, but it should not need to know anything about the shape of the items in its queue, Prometheus queries or otherwise.

As noted in the comments below, as of PR submission, this is only applied to the v2 frontend - there were some conflicting ideas on whether the functionality should be added to the v1 frontend.

The multi-dimension queue implementation is robust to enqueuing and fairly dequeuing a tenant's request even if the requests are mixed between having an additional queue dimension or not.
This means the scheduler can tolerate failures to calculate additional queue dimensions as well as receiving requests from a frontend which has not implemented the additional queue dimension calculations at the same time as receiving requests from a frontend which has.

What this PR does

Which issue(s) this PR fixes or relates to

Fixes #

Checklist

Tests updated.
Documentation added.
CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX].
about-versioning.md updated with experimental features.

francoposa · 2023-12-04T21:22:19Z

saving this benchmarking output here:

I thought this didn't look great at first, but the increase in memory usage from adding the second queue dimension scales sublinearly with the number of tenants.

There also is a smaller linear increase with the number of requests due to adding a field to the SchedulerRequest type, but this in production would be dwarfed by the size of the rest of the SchedulerRequest, of which most of the size is the actual prometheus request.
I updated the benchmark test to have a larger request so the % deltas weren't so shocking, but I think it's probably still much much smaller than a typical request size.

With our worst case 1000_tenants/25_concurrent_producers/1600_concurrent_consumers-16, we have less than 300 bytes added.
Production scheduler pods memory usage vary widely, but range from 10s of MiB to >1GiB.

benchstat one-queue-dims-nil-map-alloc-then-one-map-alloc-longnames-heavier-request.txt two-queue-dims-nil-map-alloc-then-one-map-alloc-longnames-heavier-request.txt
name                                                                                         old time/op    new time/op    delta
ConcurrentQueueOperations/1_tenants/10_concurrent_producers/16_concurrent_consumers-16         2.62µs ± 4%    2.65µs ± 3%   +1.25%  (p=0.000 n=49+50)
ConcurrentQueueOperations/1_tenants/10_concurrent_producers/160_concurrent_consumers-16        2.99µs ± 9%    3.00µs ±16%     ~     (p=0.932 n=48+50)
ConcurrentQueueOperations/1_tenants/10_concurrent_producers/1600_concurrent_consumers-16       3.45µs ±12%    3.60µs ±17%   +4.29%  (p=0.020 n=49+50)
ConcurrentQueueOperations/1_tenants/25_concurrent_producers/16_concurrent_consumers-16         2.56µs ± 3%    2.59µs ± 3%   +1.19%  (p=0.000 n=49+49)
ConcurrentQueueOperations/1_tenants/25_concurrent_producers/160_concurrent_consumers-16        3.04µs ±10%    2.94µs ±21%   -3.15%  (p=0.006 n=50+50)
ConcurrentQueueOperations/1_tenants/25_concurrent_producers/1600_concurrent_consumers-16       3.50µs ± 9%    3.69µs ±17%   +5.56%  (p=0.000 n=49+50)
ConcurrentQueueOperations/10_tenants/10_concurrent_producers/16_concurrent_consumers-16        2.62µs ± 5%    2.78µs ±10%   +5.87%  (p=0.000 n=50+50)
ConcurrentQueueOperations/10_tenants/10_concurrent_producers/160_concurrent_consumers-16       2.91µs ±16%    3.12µs ±14%   +7.29%  (p=0.000 n=50+50)
ConcurrentQueueOperations/10_tenants/10_concurrent_producers/1600_concurrent_consumers-16      3.49µs ±12%    3.78µs ±14%   +8.05%  (p=0.000 n=50+50)
ConcurrentQueueOperations/10_tenants/25_concurrent_producers/16_concurrent_consumers-16        2.64µs ± 5%    2.78µs ± 8%   +5.35%  (p=0.000 n=50+50)
ConcurrentQueueOperations/10_tenants/25_concurrent_producers/160_concurrent_consumers-16       2.93µs ±15%    3.11µs ±14%   +6.06%  (p=0.000 n=50+50)
ConcurrentQueueOperations/10_tenants/25_concurrent_producers/1600_concurrent_consumers-16      3.53µs ±13%    3.76µs ±14%   +6.70%  (p=0.000 n=50+50)
ConcurrentQueueOperations/1000_tenants/10_concurrent_producers/16_concurrent_consumers-16      2.83µs ± 4%    3.05µs ± 3%   +7.80%  (p=0.000 n=45+49)
ConcurrentQueueOperations/1000_tenants/10_concurrent_producers/160_concurrent_consumers-16     3.10µs ±11%    3.35µs ±10%   +7.91%  (p=0.000 n=50+50)
ConcurrentQueueOperations/1000_tenants/10_concurrent_producers/1600_concurrent_consumers-16    3.71µs ± 8%    3.93µs ± 7%   +5.94%  (p=0.000 n=50+43)
ConcurrentQueueOperations/1000_tenants/25_concurrent_producers/16_concurrent_consumers-16      2.83µs ± 6%    3.04µs ± 9%   +7.50%  (p=0.000 n=50+50)
ConcurrentQueueOperations/1000_tenants/25_concurrent_producers/160_concurrent_consumers-16     3.15µs ± 9%    3.39µs ± 9%   +7.75%  (p=0.000 n=50+50)
ConcurrentQueueOperations/1000_tenants/25_concurrent_producers/1600_concurrent_consumers-16    3.72µs ± 7%    4.05µs ± 7%   +8.81%  (p=0.000 n=50+50)

name                                                                                         old alloc/op   new alloc/op   delta
ConcurrentQueueOperations/1_tenants/10_concurrent_producers/16_concurrent_consumers-16           941B ± 1%      950B ± 5%   +1.02%  (p=0.001 n=47+50)
ConcurrentQueueOperations/1_tenants/10_concurrent_producers/160_concurrent_consumers-16        1.06kB ± 6%    1.14kB ±25%   +7.78%  (p=0.003 n=49+50)
ConcurrentQueueOperations/1_tenants/10_concurrent_producers/1600_concurrent_consumers-16       1.09kB ±11%    1.27kB ±26%  +17.01%  (p=0.000 n=50+50)
ConcurrentQueueOperations/1_tenants/25_concurrent_producers/16_concurrent_consumers-16           941B ± 1%      946B ± 3%   +0.58%  (p=0.009 n=50+50)
ConcurrentQueueOperations/1_tenants/25_concurrent_producers/160_concurrent_consumers-16        1.06kB ± 6%    1.13kB ±24%   +6.53%  (p=0.002 n=50+50)
ConcurrentQueueOperations/1_tenants/25_concurrent_producers/1600_concurrent_consumers-16       1.09kB ± 9%    1.30kB ±27%  +19.24%  (p=0.000 n=50+50)
ConcurrentQueueOperations/10_tenants/10_concurrent_producers/16_concurrent_consumers-16          930B ± 3%      982B ±10%   +5.63%  (p=0.000 n=49+50)
ConcurrentQueueOperations/10_tenants/10_concurrent_producers/160_concurrent_consumers-16       1.00kB ±12%    1.17kB ±24%  +17.27%  (p=0.000 n=50+50)
ConcurrentQueueOperations/10_tenants/10_concurrent_producers/1600_concurrent_consumers-16      1.07kB ±13%    1.31kB ±27%  +22.04%  (p=0.000 n=50+50)
ConcurrentQueueOperations/10_tenants/25_concurrent_producers/16_concurrent_consumers-16          930B ± 3%      974B ±13%   +4.70%  (p=0.000 n=50+49)
ConcurrentQueueOperations/10_tenants/25_concurrent_producers/160_concurrent_consumers-16       1.01kB ±12%    1.15kB ±24%  +13.91%  (p=0.000 n=50+50)
ConcurrentQueueOperations/10_tenants/25_concurrent_producers/1600_concurrent_consumers-16      1.06kB ±13%    1.26kB ±25%  +18.90%  (p=0.000 n=50+50)
ConcurrentQueueOperations/1000_tenants/10_concurrent_producers/16_concurrent_consumers-16        963B ± 3%     1069B ± 8%  +11.11%  (p=0.000 n=50+50)
ConcurrentQueueOperations/1000_tenants/10_concurrent_producers/160_concurrent_consumers-16     1.05kB ±10%    1.24kB ±17%  +18.88%  (p=0.000 n=50+50)
ConcurrentQueueOperations/1000_tenants/10_concurrent_producers/1600_concurrent_consumers-16    1.11kB ±13%    1.35kB ±20%  +21.89%  (p=0.000 n=50+50)
ConcurrentQueueOperations/1000_tenants/25_concurrent_producers/16_concurrent_consumers-16        958B ± 4%     1062B ±14%  +10.81%  (p=0.000 n=49+50)
ConcurrentQueueOperations/1000_tenants/25_concurrent_producers/160_concurrent_consumers-16     1.05kB ± 9%    1.26kB ±20%  +19.82%  (p=0.000 n=50+50)
ConcurrentQueueOperations/1000_tenants/25_concurrent_producers/1600_concurrent_consumers-16    1.09kB ±10%    1.38kB ±20%  +26.77%  (p=0.000 n=50+50)

pkg/scheduler/queue/queue.go

pkg/frontend/v2/frontend_scheduler_adapter.go

pkg/scheduler/queue/tree_queue.go

pkg/scheduler/scheduler.go

pkg/frontend/config.go

pkg/frontend/v2/frontend_scheduler_adapter.go

charleskorn

Overall design looks good to me, thanks for working on this @francoposa.

You've already pointed out that there are still some tests you want to add - one in particular that I'd like to see is a test that shows the max queue length is enforced regardless of the distribution of requests across dimensions. (For example, if the max queue length is 100, then it shouldn't matter if there are 100 ingester-only requests waiting, or 50 for ingesters and 50 for store-gateways: either way, another request should be rejected.)

pkg/frontend/v2/frontend.go

pkg/frontend/v2/frontend_scheduler_adapter.go

pkg/scheduler/queue/tenant_queues.go

pkg/scheduler/scheduler.go

pkg/frontend/config.go

francoposa · 2023-12-06T22:33:28Z

pkg/frontend/v2/frontend_scheduler_adapter.go

+	default:
+		// no query time params to parse; cannot infer query component
+		return nil, nil


Main open question here is: do we give this case no additional queue dimension or do we make the "ingester-and-store-gateway" the "default"/"we don't know" queue.

If we do not give it in additional queue dimensions it would get enqueued with only the tenant dimension, not into the subqueues. This works just fine as mentioned before - any requests from the v1 frontend will work this way as well.

But I see a case for doing our best to place everything coming in from the v2 frontend into one of the subqueues, and it seems appropriate to make "ingester-and-store-gateway" the default.

What queries would end up in this bucket?

If we're not expecting anything to end up in this bucket, then logging that we got a type of query we didn't understand and assigning them to the tenant-only dimension seems reasonable to me.

added a warning log.
the only thing that would end up in this bucket is a something that's not recognized as a range, instant, labels, or cardinality query, but somehow passed all validation above this in the query-frontend stack. I do not know if such a query exists.

francoposa · 2023-12-06T23:12:28Z

this should be ready, except for the open question on whether the default additional queue dimension should be none or ingester-and-store-gateway, plus a CHANGELOG blurb.

Tests were added to cover the adapter pretty completely, as well as the issue of the multi-dimensional queues respecting the tenant-level max requests limit.

pkg/frontend/v2/frontend_scheduler_adapter.go

charleskorn · 2023-12-10T22:21:23Z

pkg/frontend/v2/frontend_scheduler_adapter.go

+	default:
+		// no query time params to parse; cannot infer query component
+		return nil, nil


What queries would end up in this bucket?

If we're not expecting anything to end up in this bucket, then logging that we got a type of query we didn't understand and assigning them to the tenant-only dimension seems reasonable to me.

charleskorn · 2023-12-10T22:22:06Z

pkg/frontend/v2/frontend_scheduler_adapter_test.go

+	"github.com/stretchr/testify/require"
+)
+
+const rangeURLFormat = "/api/v1/query_range?end=%d&query=&start=%d&step=%d"


[nit] Do these need to be constants? Looks like they're only used in one place

pkg/frontend/v2/frontend_scheduler_adapter_test.go

pkg/frontend/v2/frontend_scheduler_worker.go

pkg/scheduler/queue/tenant_queues_test.go

francoposa · 2023-12-12T19:41:46Z

feature flagging, all additional test cases, and warning log should be done.

Have continued to run it locally under load gen and with and without the feature flags and everything seems happy.

…to models

…o use

… requests

…cter handling of query time params, some tests on adapter

…r tests for adapter

…ant queue node and all subqueue nodes

…sions; fix log message

…end and scheduler tests

…to query ingesters within and query store after

charleskorn

LGTM modulo some minor comments below - thanks for working on this @francoposa!

pkg/frontend/v2/frontend_scheduler_adapter_test.go

CHANGELOG.md

pkg/frontend/v2/frontend_scheduler_adapter.go

charleskorn · 2023-12-12T23:59:16Z

pkg/scheduler/queue/tenant_queues.go

+	if qb.additionalQueueDimensionsEnabled {
+		if schedulerRequest, ok := request.req.(*SchedulerRequest); ok {
+			return append(QueuePath{string(request.tenantID)}, schedulerRequest.AdditionalQueueDimensions...), nil
+		}
+	}


Do we need this check (and the corresponding query-scheduler config flag) here? I think the feature flag on query-frontends is enough: if that flag is disabled, frontends won't send any extra dimensions, so the path created here will just have the tenant dimension.

context from slack, just decided to leave it as is for now:

We don’t technically need it, but I had the separate flag because:
a) it seemed odd to just have the flag in the frontend when all the important/complicated stuff is happening in the scheduler and
b) It just seemed more proper/less coupled? If something needs troubleshooting, we could use on flag to stop sending extra dimensions from the frontend or another flag for the scheduler to just ignore the extra dimensions.

Don’t feel strongly though. I scanned the config docs and there are other multi-component features that need a flag on both components to work.

francoposa · 2023-12-13T22:24:54Z

The only member of docs metrics is out for some time, so I will need get a Mimir repo admin to override the merge requirements here

lwandz13

Tee index file looks good.

osg-grafana · 2024-01-01T15:10:26Z

@francoposa and @lwandz13, thank you for your work on this while I was on PTO. I have added the type/docs label and added it to the Docs Squad project as Done in case there is a need to look back in history.

francoposa force-pushed the francoposa/query-scheduler-query-component-hints-for-multidimensional-queueing branch from 782a372 to af50eaa Compare November 29, 2023 17:27

francoposa commented Dec 4, 2023

View reviewed changes

pkg/scheduler/queue/queue.go Show resolved Hide resolved

francoposa commented Dec 4, 2023

View reviewed changes

pkg/frontend/v2/frontend_scheduler_adapter.go Outdated Show resolved Hide resolved

francoposa commented Dec 4, 2023

View reviewed changes

pkg/frontend/v2/frontend_scheduler_adapter.go Show resolved Hide resolved

francoposa commented Dec 4, 2023

View reviewed changes

pkg/scheduler/queue/tree_queue.go Show resolved Hide resolved

francoposa commented Dec 4, 2023

View reviewed changes

pkg/scheduler/scheduler.go Show resolved Hide resolved

francoposa commented Dec 5, 2023

View reviewed changes

pkg/frontend/config.go Outdated Show resolved Hide resolved

francoposa commented Dec 5, 2023

View reviewed changes

pkg/frontend/v2/frontend_scheduler_adapter.go Outdated Show resolved Hide resolved

francoposa changed the title ~~query scheduler query component hints for multidimensional queueing~~ Apply MultiDimensional Request Queuing: Supply Queue Dimensions from Frontend & Utilize in Scheduler Dec 5, 2023

francoposa changed the title ~~Apply MultiDimensional Request Queuing: Supply Queue Dimensions from Frontend & Utilize in Scheduler~~ apply multidimensional query request queuing: supply queue dimensions from frontend & utilize in scheduler Dec 5, 2023

charleskorn reviewed Dec 5, 2023

View reviewed changes

francoposa commented Dec 6, 2023

View reviewed changes

francoposa marked this pull request as ready for review December 6, 2023 23:09

francoposa requested a review from a team as a code owner December 6, 2023 23:09

francoposa requested a review from charleskorn December 7, 2023 18:51

charleskorn reviewed Dec 10, 2023

View reviewed changes

francoposa force-pushed the francoposa/query-scheduler-query-component-hints-for-multidimensional-queueing branch from 084c9bd to 4280867 Compare December 11, 2023 17:20

francoposa requested a review from a team as a code owner December 12, 2023 18:01

francoposa added 9 commits December 12, 2023 11:53

add query component hints to prometheus range and instant request pro…

62f9994

…to models

linting

31eff75

working middleware to determine query component hints in query frontend

bebed29

fixing config descriptor

85c50aa

receiver naming for linting

79b0ed2

error handling fix

ee94da1

try to avoid copying and data race

ad0fe16

attempting integration test fix

e815371

re-introducing request updating to check integration tests

5be511e

francoposa added 18 commits December 12, 2023 11:53

move SchedulerRequest to queue package

ffc3f58

enqueuing with the second queue dimension; minimal updates to tests t…

8aba84b

…o use

minor allocation reduction; udpate benchmarking test to create larger…

6da584b

… requests

move max queue item count up into tenant queues

74ae973

linting

0b996c0

remove unused config

e2be9aa

WIP PR feedback: splitting v1 and v2 limits, updating adapter to stri…

62c11ef

…cter handling of query time params, some tests on adapter

improve adapter logic to handle labels and cardinality queries; bette…

54c1841

…r tests for adapter

remove additional dimensions length check

bb26a03

remove unused prometheus codec

decb05d

test for queue broker respecting tenant request limit with subqueues

402737a

update max tenant queue size test, test for failure to enqueue in ten…

9aca35b

…ant queue node and all subqueue nodes

add feature flag to frontend and scheduler for additional queue dimen…

52c810d

…sions; fix log message

gate on feature flag

223620c

gate on feature flag

51a7863

add log warn for unparseable request type; add config option to front…

d662a33

…end and scheduler tests

add and illustrate test cases for all query time ranges with respect …

aad1448

…to query ingesters within and query store after

CHANGELOG

6a15ef0

francoposa force-pushed the francoposa/query-scheduler-query-component-hints-for-multidimensional-queueing branch from 6845899 to 6a15ef0 Compare December 12, 2023 19:57

charleskorn approved these changes Dec 13, 2023

View reviewed changes

PR feedback

b65b5c8

lwandz13 self-requested a review December 13, 2023 22:29

lwandz13 approved these changes Dec 13, 2023

View reviewed changes

francoposa merged commit e14de95 into main Dec 13, 2023
28 checks passed

francoposa deleted the francoposa/query-scheduler-query-component-hints-for-multidimensional-queueing branch December 13, 2023 22:32

osg-grafana added the type/docs Improvements or additions to documentation label Jan 1, 2024

chencs mentioned this pull request Mar 27, 2024

Encapsulating round-robin queuing logic in tree queue #7743

Closed

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

apply multidimensional query request queuing: supply queue dimensions from frontend & utilize in scheduler #6772

apply multidimensional query request queuing: supply queue dimensions from frontend & utilize in scheduler #6772

francoposa commented Nov 28, 2023 •

edited

Loading

francoposa commented Dec 4, 2023 •

edited

Loading

charleskorn left a comment

francoposa Dec 6, 2023

charleskorn Dec 10, 2023

francoposa Dec 12, 2023

francoposa commented Dec 6, 2023

charleskorn Dec 10, 2023

charleskorn Dec 10, 2023

francoposa commented Dec 12, 2023

charleskorn left a comment

charleskorn Dec 12, 2023

francoposa Dec 13, 2023

francoposa commented Dec 13, 2023

lwandz13 left a comment

osg-grafana commented Jan 1, 2024

apply multidimensional query request queuing: supply queue dimensions from frontend & utilize in scheduler #6772

apply multidimensional query request queuing: supply queue dimensions from frontend & utilize in scheduler #6772

Conversation

francoposa commented Nov 28, 2023 • edited Loading

What this PR does

Which issue(s) this PR fixes or relates to

Checklist

francoposa commented Dec 4, 2023 • edited Loading

charleskorn left a comment

Choose a reason for hiding this comment

francoposa Dec 6, 2023

Choose a reason for hiding this comment

charleskorn Dec 10, 2023

Choose a reason for hiding this comment

francoposa Dec 12, 2023

Choose a reason for hiding this comment

francoposa commented Dec 6, 2023

charleskorn Dec 10, 2023

Choose a reason for hiding this comment

charleskorn Dec 10, 2023

Choose a reason for hiding this comment

francoposa commented Dec 12, 2023

charleskorn left a comment

Choose a reason for hiding this comment

charleskorn Dec 12, 2023

Choose a reason for hiding this comment

francoposa Dec 13, 2023

Choose a reason for hiding this comment

francoposa commented Dec 13, 2023

lwandz13 left a comment

Choose a reason for hiding this comment

osg-grafana commented Jan 1, 2024

francoposa commented Nov 28, 2023 •

edited

Loading

francoposa commented Dec 4, 2023 •

edited

Loading