-
Notifications
You must be signed in to change notification settings - Fork 512
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add additional queue dimensions to query scheduler queue duration histogram #6960
add additional queue dimensions to query scheduler queue duration histogram #6960
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Name: "cortex_query_scheduler_queue_duration_seconds", | ||
Help: "Time spent by requests in queue before getting picked up by a querier.", | ||
Buckets: prometheus.DefBuckets, | ||
}) | ||
}, []string{"user", "additional_queue_dimensions"}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something shorter for this label like dimensions
would be my preference
@@ -399,7 +400,8 @@ func (s *Scheduler) QuerierLoop(querier schedulerpb.SchedulerForQuerier_QuerierL | |||
r := req.(*queue.SchedulerRequest) | |||
|
|||
queueTime := time.Since(r.EnqueueTime) | |||
s.queueDuration.Observe(queueTime.Seconds()) | |||
additionalQueueDimensionLabels := strings.Join(r.AdditionalQueueDimensions, ":") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This there any guarantee that these are always in the same order?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, but we don't want to sort either because order does matter, as the additional dimensions represent a path through the tree.
Sending additional queue dimensions ["A" "B"]
would represent path root->tenantID->A->B
, which is a different queue than additional queue dimensions ["B" "A"]
, which would represent path root->tenantID->B->A
.
But for right now we send no (nil slice) additional dimensions ,or one additional dimension representing the expected query component(s) used, making the path through the tree essentially like root -> tenantID -> queryComponent
.
The fact that we always have the first queuing dimension as tenant ID for now is also what makes me hesitant to remove "additional" from the variable and label naming.
This is to allow us to observe the effects of turning on multidimensional queueing, by breaking out the queue duration metric used in the our Mimir / Reads Latency (Time in Queue) panel as well as alerts.
Went back and forth on how to label this, but stuck with the idea that we shouldn't label the additional queue dimensions with a specific meaning in Mimir by doing something like "take the first additional queue dimension and assign it to a label named
query_component
.Instead I am just concatenating and shipping the additional queue dimensions as is, and we can use the alerts & dashboards to assign meanings to the labels.
Open to feedback on that choice of approach!
What this PR does
Which issue(s) this PR fixes or relates to
Fixes #
Checklist
CHANGELOG.md
updated - the order of entries should be[CHANGE]
,[FEATURE]
,[ENHANCEMENT]
,[BUGFIX]
.about-versioning.md
updated with experimental features.