Promtail default scrape configuration labels should match Kubernetes conventions #2070

chancez · 2020-05-13T19:06:05Z

Is your feature request related to a problem? Please describe.
Currently the default configurations provided by the ksonnet/tanka and helm charts for promtail use different label conventions from Kubernetes. The labels mostly follow the pre Kubernetes 1.15/1.16 metrics overhaul conventions, or simply omit some labels entirely. See kubernetes/enhancements#1206 and https://github.com/kubernetes/enhancements/blob/master/keps/sig-instrumentation/20181106-kubernetes-metrics-overhaul.md for when Kubernetes began moving to an "official" way to do labels and naming for metrics. Here's the Kubernetes instrumentation guidelines that came out of that https://github.com/kubernetes/community/blob/master/contributors/devel/sig-instrumentation/instrumentation.md which includes conventions for label names.

You can see the "semi-standard" configuration that's used for Prometheus here: https://github.com/kubernetes-monitoring/kubernetes-mixin

This may not be exhaustive, but here's the issues I've noticed these labels being inconsistent so far:

container_name should be container
pod label is missing, but instance has the value of the pod.
instance is currently the pod, but if a pod label was added then instance likely should be the source file name (eg __path__) or something besides the pod name, to avoid duplicating label values.
job label is current namespace/pod_name, but in Prometheus, the job label is roughly the exporter/collector of the metric. Eg: job="prometheus" if it's a prometheus metric, or job="kube-state-metrics" if it came from KSM, and job="node-exporter" if it came from the node metrics. Similarly, it almost feels like logs collected by promtail should have job="promtail", so we can distinguish logs collected by promtail and other log collectors that send to Loki. As it is, the information in the job label is just redundant and provides no additional information that doesn't already exist in other labels. This comparison isn't perfect, but I think my point stands.

Describe the solution you'd like

I've provided suggestions in the description of the problematic metrics.

Describe alternatives you've considered

Don't provide any default configurations and be less opinionated about "how to label" (This seems like a poor choice)
Provide documentation on why the current labeling scheme is used. Currently I can't tell why it was done this way, but maybe there's a reason?

The text was updated successfully, but these errors were encountered:

chancez · 2020-05-13T19:12:58Z

Also worth discussing, if any of these changes are to be adopted, how is it handled?

Add new labels, and keep existing labels as-is, but deprecate them. Remove them in a future release and/or update existing label values to match? (Breaking change with deprecation)
Change existing labels, add missing labels. (Breaking change)
Add new labels, but do not change existing labels (no breaking changes)

I think adding new labels, deprecating redudant old ones is probably the "correct" path, but I'm unsure of updating existing label values, eg: instance and job as you cannot easily do it as progressively with deprecations, since you can't use the new value.

owen-d · 2020-05-15T13:49:42Z

I think this is pretty well-reasoned and ecosystem parity is a nice benefit. @slim-bean @cyriltovena @rfratto WDYT?

We may need to find a contributor willing to shepherd these changes.

rfratto · 2020-05-15T15:25:54Z

This would be a huge breaking change for us and would require a lot of cooperation to update everything. The reason the Loki k8s labels are the way they are is because they're copied from our Prometheus Ksonnet which all of the mixins we've produced depend on.

I'm open for being more consistent with the overall ecosystem, but if we blindly change this without changing our other uses of the label names, we're going to lose the ability to switch between metrics and logs on the fly, so we should be very careful here.

rfratto · 2020-05-15T15:30:10Z

job label is current namespace/pod_name, but in Prometheus, the job label is roughly the exporter/collector of the metric.

FWIW: this isn't true; job is namespace/ + the name of the creator of the pod (eg., deployment, daemonset, replicaset, etc). Specifically, it gets that value from the name label of the pod. job is how we do most of our ad-hoc querying and is really helpful for quickly aggregating multiple pods together.

chancez · 2020-05-15T22:31:13Z

This would be a huge breaking change for us and would require a lot of cooperation to update everything. The reason the Loki k8s labels are the way they are is because they're copied from our Prometheus Ksonnet which all of the mixins we've produced depend on.

I'm open for being more consistent with the overall ecosystem, but if we blindly change this without changing our other uses of the label names, we're going to lose the ability to switch between metrics and logs on the fly, so we should be very careful here.

I understand that Grafana (labs) configures prometheus one way, but it's still not following the conventions that upstream Kubernetes sets, and is the odd one out. I guarantee you most people using Prometheus with Kubernetes are using the prometheus/prometheus-operator helm chart, or https://github.com/kubernetes-monitoring/kubernetes-mixin or something generated from it. This means with the out of the box promtail configs provided, people have different labels for logs and metrics. Additionally this applies to people not using these, as the built-in metrics from the Kubernetes API server, controller manager, and scheduler emit metrics with labels different from what promtail's example configurations use.

I'm open for being more consistent with the overall ecosystem, but if we blindly change this without changing our other uses of the label names, we're going to lose the ability to switch between metrics and logs on the fly,

I agree this would need to be done carefully, however without these changes, basically anyone using Loki on K8s with one of the production install methods is already losing the ability to switch between metrics and logs on the fly because the Loki labels do not match the metrics coming from Kubernetes.

job is how we do most of our ad-hoc querying

I get that, but it's still redundant information, which explicitly is an anti-pattern with Prometheus. Any queries using job could instead use namespace and pod instead. It may be "more convenient" but, that's not really a good enough argument to be duplicating data in multiple labels IMO.

I understand Grafana has it's own mixins for your jsonnet based configuration, but I don't know that it's the best example to follow since it's not compatible with the labels upstream Kubernetes metrics use.

rfratto · 2020-05-15T23:16:18Z

Yeah, like I said, I'm totally in support of this idea, but I was trying to make it clear that it's:

a) something we should do ourselves and not have someone external contribute (since we're most aware of our own usage of Jsonnet across the board)
b) going to need the approval of other Grafana Labs teams
c) something not insignificant in scope and will take time to do correctly

chancez · 2020-05-15T23:21:39Z

That makes sense.

I suggested some ideas on how to go about it in my 2nd comment. If you're concerned about breakage, it would be best to start with just adding new labels (pod, container), while keeping existing labels as-is. And at a later point the old labels can be removed, or changed. This is what I've been doing.

rfratto · 2020-05-31T14:16:45Z

@chancez Can this be closed now that #2091 is merged? We still have the job label but we're in sync with everything else now.

This was referenced May 20, 2020

Keep scrape config in line with the new Prometheus scrape config #2091

Merged

WIP: production/promtail-mixin: Make dashboard queries configurable #1978

Closed

chancez closed this as completed Jun 1, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Promtail default scrape configuration labels should match Kubernetes conventions #2070

Promtail default scrape configuration labels should match Kubernetes conventions #2070

chancez commented May 13, 2020 •

edited

Loading

chancez commented May 13, 2020

owen-d commented May 15, 2020 •

edited

Loading

rfratto commented May 15, 2020 •

edited

Loading

rfratto commented May 15, 2020

chancez commented May 15, 2020

rfratto commented May 15, 2020

chancez commented May 15, 2020

rfratto commented May 31, 2020

Promtail default scrape configuration labels should match Kubernetes conventions #2070

Promtail default scrape configuration labels should match Kubernetes conventions #2070

Comments

chancez commented May 13, 2020 • edited Loading

chancez commented May 13, 2020

owen-d commented May 15, 2020 • edited Loading

rfratto commented May 15, 2020 • edited Loading

rfratto commented May 15, 2020

chancez commented May 15, 2020

rfratto commented May 15, 2020

chancez commented May 15, 2020

rfratto commented May 31, 2020

chancez commented May 13, 2020 •

edited

Loading

owen-d commented May 15, 2020 •

edited

Loading

rfratto commented May 15, 2020 •

edited

Loading