Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

apply multidimensional query request queuing: supply queue dimensions from frontend & utilize in scheduler #6772

Conversation

francoposa
Copy link
Member

@francoposa francoposa commented Nov 28, 2023

Changelog to be done once we solidify the shape of this.

This is the second part of our multi-dimensional queuing project to mitigate the issue where a single slow query component (like store-gateway) can also slow down a tenant's requests that don't hit that component (ingesters-only queries), or vice versa.

The first pull request #6533 introduced the TreeQueue datastructures which can handle an arbirary number of queue dimensions, but kept it to the single queue dimension of tenantID.

This pull request adds code to query-frontend to assign the additional queue dimension query component for a query (ingesters, store-gateway, or both), and adds the handling of that into the scheduler as well.

Various design discussions went into this, but the most critical were the decision that the calculation of additional queue dimensions should take place in the query-frontend.
The idea is that the query-frontend is already Prometheus-aware, that is it understands the shape of Prometheus queries and can make decisions based on Prometheus query attributes. The scheduler needs to utilize some arbitrary information to create additional queue dimensions, but it should not need to know anything about the shape of the items in its queue, Prometheus queries or otherwise.

As noted in the comments below, as of PR submission, this is only applied to the v2 frontend - there were some conflicting ideas on whether the functionality should be added to the v1 frontend.

The multi-dimension queue implementation is robust to enqueuing and fairly dequeuing a tenant's request even if the requests are mixed between having an additional queue dimension or not.
This means the scheduler can tolerate failures to calculate additional queue dimensions as well as receiving requests from a frontend which has not implemented the additional queue dimension calculations at the same time as receiving requests from a frontend which has.

What this PR does

Which issue(s) this PR fixes or relates to

Fixes #

Checklist

  • Tests updated.
  • Documentation added.
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX].
  • about-versioning.md updated with experimental features.

@francoposa francoposa force-pushed the francoposa/query-scheduler-query-component-hints-for-multidimensional-queueing branch from 782a372 to af50eaa Compare November 29, 2023 17:27
@francoposa
Copy link
Member Author

francoposa commented Dec 4, 2023

saving this benchmarking output here:

I thought this didn't look great at first, but the increase in memory usage from adding the second queue dimension scales sublinearly with the number of tenants.

There also is a smaller linear increase with the number of requests due to adding a field to the SchedulerRequest type, but this in production would be dwarfed by the size of the rest of the SchedulerRequest, of which most of the size is the actual prometheus request.
I updated the benchmark test to have a larger request so the % deltas weren't so shocking, but I think it's probably still much much smaller than a typical request size.

With our worst case 1000_tenants/25_concurrent_producers/1600_concurrent_consumers-16, we have less than 300 bytes added.
Production scheduler pods memory usage vary widely, but range from 10s of MiB to >1GiB.

benchstat one-queue-dims-nil-map-alloc-then-one-map-alloc-longnames-heavier-request.txt two-queue-dims-nil-map-alloc-then-one-map-alloc-longnames-heavier-request.txt
name                                                                                         old time/op    new time/op    delta
ConcurrentQueueOperations/1_tenants/10_concurrent_producers/16_concurrent_consumers-16         2.62µs ± 4%    2.65µs ± 3%   +1.25%  (p=0.000 n=49+50)
ConcurrentQueueOperations/1_tenants/10_concurrent_producers/160_concurrent_consumers-16        2.99µs ± 9%    3.00µs ±16%     ~     (p=0.932 n=48+50)
ConcurrentQueueOperations/1_tenants/10_concurrent_producers/1600_concurrent_consumers-16       3.45µs ±12%    3.60µs ±17%   +4.29%  (p=0.020 n=49+50)
ConcurrentQueueOperations/1_tenants/25_concurrent_producers/16_concurrent_consumers-16         2.56µs ± 3%    2.59µs ± 3%   +1.19%  (p=0.000 n=49+49)
ConcurrentQueueOperations/1_tenants/25_concurrent_producers/160_concurrent_consumers-16        3.04µs ±10%    2.94µs ±21%   -3.15%  (p=0.006 n=50+50)
ConcurrentQueueOperations/1_tenants/25_concurrent_producers/1600_concurrent_consumers-16       3.50µs ± 9%    3.69µs ±17%   +5.56%  (p=0.000 n=49+50)
ConcurrentQueueOperations/10_tenants/10_concurrent_producers/16_concurrent_consumers-16        2.62µs ± 5%    2.78µs ±10%   +5.87%  (p=0.000 n=50+50)
ConcurrentQueueOperations/10_tenants/10_concurrent_producers/160_concurrent_consumers-16       2.91µs ±16%    3.12µs ±14%   +7.29%  (p=0.000 n=50+50)
ConcurrentQueueOperations/10_tenants/10_concurrent_producers/1600_concurrent_consumers-16      3.49µs ±12%    3.78µs ±14%   +8.05%  (p=0.000 n=50+50)
ConcurrentQueueOperations/10_tenants/25_concurrent_producers/16_concurrent_consumers-16        2.64µs ± 5%    2.78µs ± 8%   +5.35%  (p=0.000 n=50+50)
ConcurrentQueueOperations/10_tenants/25_concurrent_producers/160_concurrent_consumers-16       2.93µs ±15%    3.11µs ±14%   +6.06%  (p=0.000 n=50+50)
ConcurrentQueueOperations/10_tenants/25_concurrent_producers/1600_concurrent_consumers-16      3.53µs ±13%    3.76µs ±14%   +6.70%  (p=0.000 n=50+50)
ConcurrentQueueOperations/1000_tenants/10_concurrent_producers/16_concurrent_consumers-16      2.83µs ± 4%    3.05µs ± 3%   +7.80%  (p=0.000 n=45+49)
ConcurrentQueueOperations/1000_tenants/10_concurrent_producers/160_concurrent_consumers-16     3.10µs ±11%    3.35µs ±10%   +7.91%  (p=0.000 n=50+50)
ConcurrentQueueOperations/1000_tenants/10_concurrent_producers/1600_concurrent_consumers-16    3.71µs ± 8%    3.93µs ± 7%   +5.94%  (p=0.000 n=50+43)
ConcurrentQueueOperations/1000_tenants/25_concurrent_producers/16_concurrent_consumers-16      2.83µs ± 6%    3.04µs ± 9%   +7.50%  (p=0.000 n=50+50)
ConcurrentQueueOperations/1000_tenants/25_concurrent_producers/160_concurrent_consumers-16     3.15µs ± 9%    3.39µs ± 9%   +7.75%  (p=0.000 n=50+50)
ConcurrentQueueOperations/1000_tenants/25_concurrent_producers/1600_concurrent_consumers-16    3.72µs ± 7%    4.05µs ± 7%   +8.81%  (p=0.000 n=50+50)

name                                                                                         old alloc/op   new alloc/op   delta
ConcurrentQueueOperations/1_tenants/10_concurrent_producers/16_concurrent_consumers-16           941B ± 1%      950B ± 5%   +1.02%  (p=0.001 n=47+50)
ConcurrentQueueOperations/1_tenants/10_concurrent_producers/160_concurrent_consumers-16        1.06kB ± 6%    1.14kB ±25%   +7.78%  (p=0.003 n=49+50)
ConcurrentQueueOperations/1_tenants/10_concurrent_producers/1600_concurrent_consumers-16       1.09kB ±11%    1.27kB ±26%  +17.01%  (p=0.000 n=50+50)
ConcurrentQueueOperations/1_tenants/25_concurrent_producers/16_concurrent_consumers-16           941B ± 1%      946B ± 3%   +0.58%  (p=0.009 n=50+50)
ConcurrentQueueOperations/1_tenants/25_concurrent_producers/160_concurrent_consumers-16        1.06kB ± 6%    1.13kB ±24%   +6.53%  (p=0.002 n=50+50)
ConcurrentQueueOperations/1_tenants/25_concurrent_producers/1600_concurrent_consumers-16       1.09kB ± 9%    1.30kB ±27%  +19.24%  (p=0.000 n=50+50)
ConcurrentQueueOperations/10_tenants/10_concurrent_producers/16_concurrent_consumers-16          930B ± 3%      982B ±10%   +5.63%  (p=0.000 n=49+50)
ConcurrentQueueOperations/10_tenants/10_concurrent_producers/160_concurrent_consumers-16       1.00kB ±12%    1.17kB ±24%  +17.27%  (p=0.000 n=50+50)
ConcurrentQueueOperations/10_tenants/10_concurrent_producers/1600_concurrent_consumers-16      1.07kB ±13%    1.31kB ±27%  +22.04%  (p=0.000 n=50+50)
ConcurrentQueueOperations/10_tenants/25_concurrent_producers/16_concurrent_consumers-16          930B ± 3%      974B ±13%   +4.70%  (p=0.000 n=50+49)
ConcurrentQueueOperations/10_tenants/25_concurrent_producers/160_concurrent_consumers-16       1.01kB ±12%    1.15kB ±24%  +13.91%  (p=0.000 n=50+50)
ConcurrentQueueOperations/10_tenants/25_concurrent_producers/1600_concurrent_consumers-16      1.06kB ±13%    1.26kB ±25%  +18.90%  (p=0.000 n=50+50)
ConcurrentQueueOperations/1000_tenants/10_concurrent_producers/16_concurrent_consumers-16        963B ± 3%     1069B ± 8%  +11.11%  (p=0.000 n=50+50)
ConcurrentQueueOperations/1000_tenants/10_concurrent_producers/160_concurrent_consumers-16     1.05kB ±10%    1.24kB ±17%  +18.88%  (p=0.000 n=50+50)
ConcurrentQueueOperations/1000_tenants/10_concurrent_producers/1600_concurrent_consumers-16    1.11kB ±13%    1.35kB ±20%  +21.89%  (p=0.000 n=50+50)
ConcurrentQueueOperations/1000_tenants/25_concurrent_producers/16_concurrent_consumers-16        958B ± 4%     1062B ±14%  +10.81%  (p=0.000 n=49+50)
ConcurrentQueueOperations/1000_tenants/25_concurrent_producers/160_concurrent_consumers-16     1.05kB ± 9%    1.26kB ±20%  +19.82%  (p=0.000 n=50+50)
ConcurrentQueueOperations/1000_tenants/25_concurrent_producers/1600_concurrent_consumers-16    1.09kB ±10%    1.38kB ±20%  +26.77%  (p=0.000 n=50+50)

pkg/frontend/config.go Outdated Show resolved Hide resolved
@francoposa francoposa changed the title query scheduler query component hints for multidimensional queueing Apply MultiDimensional Request Queuing: Supply Queue Dimensions from Frontend & Utilize in Scheduler Dec 5, 2023
@francoposa francoposa changed the title Apply MultiDimensional Request Queuing: Supply Queue Dimensions from Frontend & Utilize in Scheduler apply multidimensional query request queuing: supply queue dimensions from frontend & utilize in scheduler Dec 5, 2023
Copy link
Contributor

@charleskorn charleskorn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall design looks good to me, thanks for working on this @francoposa.

You've already pointed out that there are still some tests you want to add - one in particular that I'd like to see is a test that shows the max queue length is enforced regardless of the distribution of requests across dimensions. (For example, if the max queue length is 100, then it shouldn't matter if there are 100 ingester-only requests waiting, or 50 for ingesters and 50 for store-gateways: either way, another request should be rejected.)

pkg/frontend/v2/frontend.go Outdated Show resolved Hide resolved
pkg/frontend/v2/frontend_scheduler_adapter.go Outdated Show resolved Hide resolved
pkg/frontend/v2/frontend_scheduler_adapter.go Outdated Show resolved Hide resolved
pkg/scheduler/queue/tenant_queues.go Outdated Show resolved Hide resolved
pkg/scheduler/scheduler.go Show resolved Hide resolved
pkg/frontend/config.go Outdated Show resolved Hide resolved
Comment on lines 82 to 93
default:
// no query time params to parse; cannot infer query component
return nil, nil
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Main open question here is: do we give this case no additional queue dimension or do we make the "ingester-and-store-gateway" the "default"/"we don't know" queue.

If we do not give it in additional queue dimensions it would get enqueued with only the tenant dimension, not into the subqueues. This works just fine as mentioned before - any requests from the v1 frontend will work this way as well.

But I see a case for doing our best to place everything coming in from the v2 frontend into one of the subqueues, and it seems appropriate to make "ingester-and-store-gateway" the default.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What queries would end up in this bucket?

If we're not expecting anything to end up in this bucket, then logging that we got a type of query we didn't understand and assigning them to the tenant-only dimension seems reasonable to me.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added a warning log.
the only thing that would end up in this bucket is a something that's not recognized as a range, instant, labels, or cardinality query, but somehow passed all validation above this in the query-frontend stack. I do not know if such a query exists.

@francoposa francoposa marked this pull request as ready for review December 6, 2023 23:09
@francoposa francoposa requested a review from a team as a code owner December 6, 2023 23:09
@francoposa
Copy link
Member Author

this should be ready, except for the open question on whether the default additional queue dimension should be none or ingester-and-store-gateway, plus a CHANGELOG blurb.

Tests were added to cover the adapter pretty completely, as well as the issue of the multi-dimensional queues respecting the tenant-level max requests limit.

pkg/frontend/v2/frontend_scheduler_adapter.go Outdated Show resolved Hide resolved
Comment on lines 82 to 93
default:
// no query time params to parse; cannot infer query component
return nil, nil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What queries would end up in this bucket?

If we're not expecting anything to end up in this bucket, then logging that we got a type of query we didn't understand and assigning them to the tenant-only dimension seems reasonable to me.

"github.com/stretchr/testify/require"
)

const rangeURLFormat = "/api/v1/query_range?end=%d&query=&start=%d&step=%d"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nit] Do these need to be constants? Looks like they're only used in one place

pkg/frontend/v2/frontend_scheduler_adapter_test.go Outdated Show resolved Hide resolved
pkg/frontend/v2/frontend_scheduler_worker.go Outdated Show resolved Hide resolved
pkg/scheduler/queue/tenant_queues_test.go Outdated Show resolved Hide resolved
@francoposa francoposa force-pushed the francoposa/query-scheduler-query-component-hints-for-multidimensional-queueing branch from 084c9bd to 4280867 Compare December 11, 2023 17:20
@francoposa francoposa requested a review from a team as a code owner December 12, 2023 18:01
@francoposa
Copy link
Member Author

feature flagging, all additional test cases, and warning log should be done.

Have continued to run it locally under load gen and with and without the feature flags and everything seems happy.

@francoposa francoposa force-pushed the francoposa/query-scheduler-query-component-hints-for-multidimensional-queueing branch from 6845899 to 6a15ef0 Compare December 12, 2023 19:57
Copy link
Contributor

@charleskorn charleskorn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM modulo some minor comments below - thanks for working on this @francoposa!

CHANGELOG.md Outdated Show resolved Hide resolved
pkg/frontend/v2/frontend_scheduler_adapter.go Outdated Show resolved Hide resolved
Comment on lines +173 to +177
if qb.additionalQueueDimensionsEnabled {
if schedulerRequest, ok := request.req.(*SchedulerRequest); ok {
return append(QueuePath{string(request.tenantID)}, schedulerRequest.AdditionalQueueDimensions...), nil
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this check (and the corresponding query-scheduler config flag) here? I think the feature flag on query-frontends is enough: if that flag is disabled, frontends won't send any extra dimensions, so the path created here will just have the tenant dimension.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

context from slack, just decided to leave it as is for now:

We don’t technically need it, but I had the separate flag because:
a) it seemed odd to just have the flag in the frontend when all the important/complicated stuff is happening in the scheduler and
b) It just seemed more proper/less coupled? If something needs troubleshooting, we could use on flag to stop sending extra dimensions from the frontend or another flag for the scheduler to just ignore the extra dimensions.

Don’t feel strongly though. I scanned the config docs and there are other multi-component features that need a flag on both components to work.

@francoposa
Copy link
Member Author

The only member of docs metrics is out for some time, so I will need get a Mimir repo admin to override the merge requirements here

@lwandz13 lwandz13 self-requested a review December 13, 2023 22:29
Copy link

@lwandz13 lwandz13 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tee index file looks good.

@francoposa francoposa merged commit e14de95 into main Dec 13, 2023
28 checks passed
@francoposa francoposa deleted the francoposa/query-scheduler-query-component-hints-for-multidimensional-queueing branch December 13, 2023 22:32
@osg-grafana osg-grafana added the type/docs Improvements or additions to documentation label Jan 1, 2024
@osg-grafana
Copy link
Contributor

@francoposa and @lwandz13, thank you for your work on this while I was on PTO. I have added the type/docs label and added it to the Docs Squad project as Done in case there is a need to look back in history.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/docs Improvements or additions to documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants