Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

decouple workqueue metrics from prometheus #33792

Conversation

caesarxuchao
Copy link
Member

@caesarxuchao caesarxuchao commented Sep 29, 2016

What this PR does / why we need it:
We want to include the workqueue in client-go, but do not want to having to import Prometheus. This PR decouples the workqueue from prometheus.

Which issue this PR fixes (optional, in fixes #<issue number>(, #<issue_number>, ...) format, will close that issue when PR gets merged): fixes #

Partially address #33497
User requested for workqueue in client-go: kubernetes/client-go#4 (comment)

Special notes for your reviewer:

Release note:

The implicit registration of Prometheus metrics for workqueue has been removed, and a plug-able interface was added. If you were using workqueue in your own binaries and want these metrics, add the following to your imports in the main package: "k8s.io/pkg/util/workqueue/prometheus".

This change is Reviewable

@caesarxuchao caesarxuchao added the sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. label Sep 29, 2016
@k8s-github-robot k8s-github-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. release-note-label-needed labels Sep 29, 2016
@k8s-ci-robot
Copy link
Contributor

Jenkins GCI GCE e2e failed for commit 1e8c071e1910525b6651d888f876a4bda9283ac4. Full PR test history.

The magic incantation to run this job again is @k8s-bot gci gce e2e test this. Please help us cut down flakes by linking to an open flake issue when you hit one in your PR.

@krousey
Copy link
Contributor

krousey commented Sep 29, 2016

The new public types you're introducing lack documentation, so it's hard to try to infer what they're supposed to be. Like PartialDefaultMetrics. Why is it partial, and defaulted? Also you seem to have exposed a factory variable that no one outside of the queue package can effectively use.

It would be a lot simpler to just have a private interface variable that defaulted to a no-op, and a public registration function. Other than the registration function, and the interface's methods, everything else should be private. e.g.

var metricsProvider MetricsProvider = &noopProvider{}

func SetProvider(p MetricsProvider) { // set once }

It also seems like the recording of queue and processing start times are being conflated with the metrics provider here. They should be separated out because one is how you calculate metrics, the other is how you expose and record them.

@caesarxuchao
Copy link
Member Author

Sorry for the confusion, I'll add the comments.

Other than the registration function, and the interface's methods, everything else should be private.

I thought of a better pattern, similar to what you asked for. I'll update the code.

@caesarxuchao
Copy link
Member Author

@krousey I changed the pattern to similar to what you suggested. I also added comments. Hopefully it's easier to read now.

It also seems like the recording of queue and processing start times are being conflated with the metrics provider here. They should be separated out because one is how you calculate metrics, the other is how you expose and record them.

The new version should have addressed this comment, if I understood you correctly.

@k8s-ci-robot
Copy link
Contributor

Jenkins verification failed for commit 21b46bf. Full PR test history.

The magic incantation to run this job again is @k8s-bot verify test this. Please help us cut down flakes by linking to an open flake issue when you hit one in your PR.

type queueMetrics interface {
add(item t)
get(item t)
done(item t)
}

// GaugeMetric that represents a single numerical value that can
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// GaugeMetric represents ...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

type queueMetrics interface {
add(item t)
get(item t)
done(item t)
}

// GaugeMetric that represents a single numerical value that can
// arbitrarily go up and down.
// Its methods are a subset of prometheus.Gauge.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to call out prometheus here. Other implementations could exist.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

)

// You need to import the prometheus package to create and register prometheus
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe make this more generic and not prometheus specific.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Dec()
}

type noopGaugeMetric struct{}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your no-ops can be condensed to one type:

type noopMetric struct{}

func (noopMetric) Inc() {}
func (noopMetric) Dec() {}
func (noopMetric) Observe(float64) {}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

return ret
}
// DepthMetricProvider creates DepthMetric.
type DepthMetricProvider interface {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These should be condensed to the types of metrics they provide, not what the metrics are used for. i.e.

GaugeMetricProvider
ConterMetricProvider
SummaryMetricProvider

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I condensed them all to a MetricsProvider interface.

addTimes: map[t]time.Time{},
processingStartTimes: map[t]time.Time{},
}
type noopDepthMetricProvider struct{}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These noop providers can also be condensed to one type.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

// prometheus metrics. To use this package, you just have to import it.

func init() {
workqueue.DefaultMetricsFactory.SetProviders(
Copy link
Contributor

@krousey krousey Sep 30, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Calling a method on a type whose definition is private to another package. Not that it won't work, it just feels wrong. Like if I wanted to search godoc.org for the documentation on SetProviders, I wouldn't find it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the explanation.

@@ -151,3 +188,63 @@ func (m *defaultRetryMetrics) retry() {

m.retries.Inc()
}

type metricsFactory struct {
Copy link
Contributor

@krousey krousey Sep 30, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it might be easiest for me to just show what I meant by not needing a factory variable exposed.

var metricsFactory = struct {
    counterMetricProvider CounterMetricProvider
    gaugeMetricProvider   GaugeMetricProvider
    summaryMetricProvider SummaryMetricProvider
    setProviders          sync.Once
}{
    counterMetricProvider: noopProvider{},
    gaugeMetricProvider:   noopProvider{},
    summaryMetricProvider: noopProvider{},
}

func newQueueMetrics(name string) queueMetrics {
    var ret *defaultQueueMetrics
    if len(name) == 0 {
        return ret
    }
    return &defaultQueueMetrics{
        depth:                metricsFactory.gaugeMetricProvider.NewGuageMetric(name, "depth"),
        adds:                 metricsFactory.counterMetricProvider.NewCounterMetric(name, "adds"),
        latency:              metricsFactory.summaryMetricProvider.NewSummaryMetric(name, "latency"),
        workDuration:         metricsFactory.summaryMetricProvider.NewSummaryMetric(name, "duration"),
        addTimes:             map[t]time.Time{},
        processingStartTimes: map[t]time.Time{},
    }
}

func newRetryMetrics(name string) retryMetrics {
    var ret *defaultRetryMetrics
    if len(name) == 0 {
        return ret
    }
    return &defaultRetryMetrics{
        retries: metricsFactory.counterMetricProvider.NewCounterMetric(name, "retries"),
    }
}

// SetProviders sets the metrics providers of the metricsFactory.
func SetProviders(
    counterMetricProvider CounterMetricProvider,
    gaugeMetricProvider GaugeMetricProvider,
    summaryMetricProvider SummaryMetricProvider,
) {
    metricsFactory.setProviders.Do(func() {
        metricsFactory.counterMetricProvider = counterMetricProvider
        metricsFactory.gaugeMetricProvider = gaugeMetricProvider
        metricsFactory.summaryMetricProvider = summaryMetricProvider
    })
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

)
}

type prometheusDepthMetricProvider struct{}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These can also be condensed into one struct with multiple methods.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -43,6 +43,7 @@ import (
"k8s.io/kubernetes/pkg/util/flowcontrol"
"k8s.io/kubernetes/pkg/util/wait"
"k8s.io/kubernetes/pkg/util/workqueue"
_ "k8s.io/kubernetes/pkg/util/workqueue/prometheus"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you try to put this import in package main instead of in these libraries.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new github review UI is confusing. I definitely didn't type "Done" to this comment..

Anyway, do you mean importing it in pkg/controller? I'd rather keep this line next to the workqueue import line for easier tracking.

Copy link
Contributor

@krousey krousey Oct 3, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant importing it in the relevant main packages in cmd/... Like the throttle metrics are done. Perhaps we should consolidate these metrics implementations into one package. Just import that one package and you get prometheus everywhere.

Another PR of course.

@caesarxuchao caesarxuchao force-pushed the decouple-workqueue-prometheus branch 3 times, most recently from f4864f1 to b094d6c Compare September 30, 2016 22:10
@k8s-ci-robot
Copy link
Contributor

Jenkins Kubemark GCE e2e failed for commit b094d6c. Full PR test history.

The magic incantation to run this job again is @k8s-bot kubemark e2e test this. Please help us cut down flakes by linking to an open flake issue when you hit one in your PR.

@k8s-ci-robot
Copy link
Contributor

Jenkins GCI GKE smoke e2e failed for commit b094d6c. Full PR test history.

The magic incantation to run this job again is @k8s-bot gci gke e2e test this. Please help us cut down flakes by linking to an open flake issue when you hit one in your PR.

@caesarxuchao
Copy link
Member Author

@k8s-bot unit test this, issue #33838

@k8s-ci-robot
Copy link
Contributor

Jenkins unit/integration failed for commit b094d6c. Full PR test history.

The magic incantation to run this job again is @k8s-bot unit test this. Please help us cut down flakes by linking to an open flake issue when you hit one in your PR.

@@ -43,6 +43,7 @@ import (
"k8s.io/kubernetes/pkg/util/flowcontrol"
"k8s.io/kubernetes/pkg/util/wait"
"k8s.io/kubernetes/pkg/util/workqueue"
_ "k8s.io/kubernetes/pkg/util/workqueue/prometheus"
Copy link
Contributor

@krousey krousey Oct 3, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant importing it in the relevant main packages in cmd/... Like the throttle metrics are done. Perhaps we should consolidate these metrics implementations into one package. Just import that one package and you get prometheus everywhere.

Another PR of course.

var metricsFactory = struct {
metricsProvider MetricsProvider
setProviders sync.Once
}{}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should initialize metricsProvider here to noopMetricsProvider.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@caesarxuchao
Copy link
Member Author

I centralized the imports. I removed the import in the scheduler because it's just using workqueue.Parallelize(), so it does not need the metrics. PTAL. Thanks.

@k8s-ci-robot
Copy link
Contributor

Jenkins GCE e2e failed for commit feb0d1d. Full PR test history.

The magic incantation to run this job again is @k8s-bot gce e2e test this. Please help us cut down flakes by linking to an open flake issue when you hit one in your PR.

@caesarxuchao
Copy link
Member Author

@k8s-bot e2e test this, issue #33867

@k8s-ci-robot
Copy link
Contributor

Jenkins GKE smoke e2e failed for commit feb0d1d. Full PR test history.

The magic incantation to run this job again is @k8s-bot gke e2e test this. Please help us cut down flakes by linking to an open flake issue when you hit one in your PR.

@caesarxuchao
Copy link
Member Author

@k8s-bot gce e2e test this

@caesarxuchao
Copy link
Member Author

@krousey can I get the lgtm? Thanks.

@caesarxuchao
Copy link
Member Author

@lavalamp could you apply the label for this one? I think @krousey is not available.

Copy link
Contributor

@krousey krousey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry... was going through email and this was at the bottom of it... until you bumped it that is

@krousey krousey added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 5, 2016
@k8s-github-robot k8s-github-robot added the do-not-merge DEPRECATED. Indicates that a PR should not merge. Label can only be manually applied/removed. label Oct 5, 2016
@caesarxuchao caesarxuchao added release-note-none Denotes a PR that doesn't merit a release note. release-note Denotes a PR that will be considered when it comes time to generate release notes. and removed do-not-merge DEPRECATED. Indicates that a PR should not merge. Label can only be manually applied/removed. release-note-label-needed release-note-none Denotes a PR that doesn't merit a release note. labels Oct 5, 2016
@caesarxuchao
Copy link
Member Author

Thanks @krousey. I just added a release note (copied from yours), let me know if you have comments on that.

@k8s-ci-robot
Copy link
Contributor

Jenkins GCE Node e2e failed for commit feb0d1d. Full PR test history.

The magic incantation to run this job again is @k8s-bot node e2e test this. Please help us cut down flakes by linking to an open flake issue when you hit one in your PR.

@caesarxuchao
Copy link
Member Author

@k8s-bot node e2e test this, issue #32430

@k8s-github-robot
Copy link

Automatic merge from submit-queue

@k8s-github-robot k8s-github-robot merged commit 45e5719 into kubernetes:master Oct 6, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants