Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add additional queue dimensions to query scheduler queue duration histogram #6960

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion pkg/frontend/v2/frontend_scheduler_adapter.go
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@ package v2

import (
"context"
"fmt"
"time"

"github.com/go-kit/log"
Expand Down Expand Up @@ -89,7 +90,7 @@ func (a *frontendToSchedulerAdapter) extractAdditionalQueueDimensions(
return []string{ShouldQueryIngestersQueueDimension}, nil
default:
// no query time params to parse; cannot infer query component
level.Warn(a.log).Log("msg", "unsupported request type", "query", httpRequest)
level.Warn(a.log).Log("msg", "unsupported request type", "query", fmt.Sprintf("%+v", httpRequest))
return nil, nil
}
}
Expand Down
10 changes: 6 additions & 4 deletions pkg/scheduler/scheduler.go
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@ import (
"flag"
"io"
"net/http"
"strings"
"sync"
"time"

Expand Down Expand Up @@ -72,7 +73,7 @@ type Scheduler struct {
cancelledRequests *prometheus.CounterVec
connectedQuerierClients prometheus.GaugeFunc
connectedFrontendClients prometheus.GaugeFunc
queueDuration prometheus.Histogram
queueDuration *prometheus.HistogramVec
inflightRequests prometheus.Summary
}

Expand Down Expand Up @@ -145,11 +146,11 @@ func NewScheduler(cfg Config, limits Limits, log log.Logger, registerer promethe
})
s.requestQueue = queue.NewRequestQueue(s.log, cfg.MaxOutstandingPerTenant, cfg.AdditionalQueryQueueDimensionsEnabled, cfg.QuerierForgetDelay, s.queueLength, s.discardedRequests, enqueueDuration)

s.queueDuration = promauto.With(registerer).NewHistogram(prometheus.HistogramOpts{
s.queueDuration = promauto.With(registerer).NewHistogramVec(prometheus.HistogramOpts{
Name: "cortex_query_scheduler_queue_duration_seconds",
Help: "Time spent by requests in queue before getting picked up by a querier.",
Buckets: prometheus.DefBuckets,
})
}, []string{"user", "additional_queue_dimensions"})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something shorter for this label like dimensions would be my preference

s.connectedQuerierClients = promauto.With(registerer).NewGaugeFunc(prometheus.GaugeOpts{
Name: "cortex_query_scheduler_connected_querier_clients",
Help: "Number of querier worker clients currently connected to the query-scheduler.",
Expand Down Expand Up @@ -399,7 +400,8 @@ func (s *Scheduler) QuerierLoop(querier schedulerpb.SchedulerForQuerier_QuerierL
r := req.(*queue.SchedulerRequest)

queueTime := time.Since(r.EnqueueTime)
s.queueDuration.Observe(queueTime.Seconds())
additionalQueueDimensionLabels := strings.Join(r.AdditionalQueueDimensions, ":")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This there any guarantee that these are always in the same order?

Copy link
Member Author

@francoposa francoposa Dec 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, but we don't want to sort either because order does matter, as the additional dimensions represent a path through the tree.

Sending additional queue dimensions ["A" "B"] would represent path root->tenantID->A->B, which is a different queue than additional queue dimensions ["B" "A"], which would represent path root->tenantID->B->A.

But for right now we send no (nil slice) additional dimensions ,or one additional dimension representing the expected query component(s) used, making the path through the tree essentially like root -> tenantID -> queryComponent.

The fact that we always have the first queuing dimension as tenant ID for now is also what makes me hesitant to remove "additional" from the variable and label naming.

s.queueDuration.WithLabelValues(r.UserID, additionalQueueDimensionLabels).Observe(queueTime.Seconds())
r.QueueSpan.Finish()

/*
Expand Down
Loading