Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarify sdk prometheus exporter resource attributes #2640

Closed
jack-berg opened this issue Jul 1, 2022 · 20 comments
Closed

Clarify sdk prometheus exporter resource attributes #2640

jack-berg opened this issue Jul 1, 2022 · 20 comments
Assignees
Labels
area:sdk Related to the SDK needs discussion Need more information before all suitable labels can be applied spec:metrics Related to the specification/metrics directory triaged-accepted The issue is triaged and accepted by the OTel community, one can proceed with creating a PR proposal

Comments

@jack-berg
Copy link
Member

jack-berg commented Jul 1, 2022

The spec says the following about the SDK prometheus exporter and resource attribute:

In SDK Prometheus (pull) exporters, resource attributes SHOULD be converted to the "target" info metric family; otherwise, they MUST be dropped, and MUST NOT be attached as labels to other metric families. The "target" info metric family MUST be an info-typed metric family whose labels MUST include the resource attributes, and MUST NOT include any other labels. There MUST be at most one "target" info metric family exposed on a Prometheus endpoint.

This initially seems like a good idea, but after implementing in java and talking it over with other java maintainers, I'm not so sure.

The primary concern of including resource attributes is increasing cardinality. Resource attributes will tend to be unique for each restart of the app due to service.instance.id. Prometheus automatically appends a job and instance label to scraped time series, which are likely to stay constant across restarts of an app. This means a casual user ingesting the target info metric would be exposed to higher cardinality than they would otherwise expect.

Given that in pull based metric systems ingestors are expected know the target metadata a-priori, it seems odd that the SDK prometheus exporters should include target info by default.

Instead, target info could be opt in with a configuration option. We also might consider making the set of resource attributes exposed through target info configurable.

@jack-berg jack-berg added the spec:metrics Related to the specification/metrics directory label Jul 1, 2022
@arminru arminru assigned reyang and unassigned arminru Jul 5, 2022
@arminru arminru added the area:sdk Related to the SDK label Jul 5, 2022
@jsuereth
Copy link
Contributor

jsuereth commented Jul 8, 2022

I think the fundamental issue here is that we've been "Loose" with resource attributes, and it causes problems for Metrics (vs. traces).

Effectively, Metrics have WAY MORE concern on cardinality of all atributes used to identify a timeseries. We have a few open bugs/issues previously about knowing which rsource attributes are important or avoiding adding resource attributes that have high cardinality (like the ones you list).

e.g. #1782 was a very lengthy discussion on this.

Fundmanetally I think we need to "fix" a few design considerations that we've talked about:

  1. Do we want to limit Resource attributes to only attributes useful to metrics (i.e. only identifying, low-cardinality attribtues)
  2. Do we want to create a distinction in attributes such that we understand "identifying" vs. "descriptive" attributes. I.e. we have a specified, built-in way to prune high-cardinality attributes that would be useful to Trace/Logs but harmful for metrics

@jsuereth jsuereth added needs discussion Need more information before all suitable labels can be applied triaged-accepted The issue is triaged and accepted by the OTel community, one can proceed with creating a PR proposal labels Jul 8, 2022
@tigrannajaryan
Copy link
Member

Do we want to create a distinction in attributes such that we understand "identifying" vs. "descriptive" attributes. I.e. we have a specified, built-in way to prune high-cardinality attributes that would be useful to Trace/Logs but harmful for metrics

It has been my long-held view that a definition of a Resource as just a bag of key-value attributes is not sufficient. We need to be able to distinguish between identifying and non-identifying attributes.

I see 3 possible ways to improve the situation:

  1. Add a notion of non-identifying flag to Resource attributes. This can be probably done in a backwards compatible way both in the Resource creation API and in OTLP (although in somewhat awkward way, using collinear arrays in Protobuf). Absent the flag the attributes will be considered to be identifying, which is the closest to current semantics.
  2. Do not touch the API or OTLP. Instead make the identifying and non-identifying flag part of the semantic conventions. Any party that needs to know how to interpret a particular Resource and its attributes needs to have access to semantic convention definitions to know which attributes in the attributes bag are identifying. If we add full semantic conventions to the Schema Files then this information will be machine readable. Essentially, we will extract the metadata about attributes into a separate place (Schema Files) and will avoid transmitting it in OTLP every time. The downside of this approach is that the Resource creation API is unaware of this notion and any custom attributes added to the Resource will belong to an undefined bucket (we won't know if the attribute is identifying or no).
  3. Introduce the Entity concept, where Resource is properly split into its constituent entities, each entity recording its identifying and non-identifying attributes and optionally also relations between entities. This is the most complicated approach. If we succeed it will result in the most expressive and structured data available. The difficulty here is introducing Entity in a way that does not disrupt our entire APIs, OTLP and all the infrastructure we have built so far. I don't see an easy, incremental, backward compatible way of doing so (but maybe there is - I haven't spent enough time thinking about it).

@jsuereth
Copy link
Contributor

jsuereth commented Jul 8, 2022

I like the direction of (3), particularly in the light that I've been viewing resources lately:

Specifically - for OTEL resource to work (as designed) we almost REQUIRE an external metadata / service-discovery mechanism

A few points:

  • k8s resource semnatic conventions today require the downward API to gain access (in-process). In practice, we use the otel-operator to gain access to our resource attributes related to "what we are in k8s". Alternatively, we use an OTEL collector (sidecar) that injects the k8s-based resource attributes (but still requires some external-to-the-process service-information-registry).
  • Resource detection is composed of many disparate options in an attempt to piece together an identifying set of attributes. However, for any given environment we don't know what will be identifying without viewing the whole picture. E.g. if we have access to k8s control plane information, we can identify our pod, container, etc. If we don't, then we'll need to guess based on our process name and maybe ip address? What we do today is pull all of this information, everytime, and bundle it all together in a big blob. In reality, if we had k8s identifiers we could likely drop process identifiers.
  • Most "cloud" resource attributes rely on the availability of a metadata service that can describe the current "runner" of the workload (or injects ENV variables that describe the current thing).

This leads me to believe that, in practice, we need a few key features around Resource:

  • A mechanism that bundles resource attributes by semantics. E.g. process-identifying resource attributes vs. network-identifying resource attributes vs. k8s-identifying resource attributes.
    • This allows us to interact with resource attributes in more meaningful ways. I believe this is what @tigrannajaryan means as "entity"
    • This allows us to decide some bundles are higher priority than others and reduce labels intelligently.
  • A mechanism that allows (e.g. service-discovery) to replace raw identifying attributes with better identifying attributes (e.g. the k8s scenario of understanding where we may start from port/ip address and use service-dsicovery to turn this into a pod/container in a deployment, possibly with descriptive annotations of what k8s service that deploment fulfills).
    • This allows, e.g. Prometheus service discovery to OVERRIDE resource attributes. It means job and instance labels from prometheus can override resource labels for metrics within our model.
    • This opens the door for service-discovery aspects of the collector to simplify / make consistent resource detection where external to the process lookups are needed (e.g. k8s).
  • A distinction between descriptive and identifying resource attributes where identifying resource attributes come with strict cardinality limitations.
    • This allows Logs/Traces to include key descriptive attributes without causing carnality issues for Metrics.
    • This could allow us, e.g. to mark an entire "bundle" of reosurce attributes as descriptive because a more identifying bundle exists. E.g. if we have k8s resource attributes, we can mark process/network attributes as descriptive.
    • I believe we are all aligned on this particular point, but none of us believe this fully solves the problem.

TL;Dr; regarding @tigrannajaryan's three options, I don't think option 1 or 2 make enough room for service discovery within our notion of resource.

@tigrannajaryan
Copy link
Member

Let me post this old proposal to model entities in the Collector (precisely for discovery purposes): https://docs.google.com/document/d/1Tg18sIck3Nakxtd3TFFcIjrmRO_0GLMdHXylVqBQmJA/edit#

There are a couple diagrams in the doc that show examples of "entities". In my mind there are different kinds of entities, some are defined at build time (such as the Service or Container Image), some are defined at run time (such as K8s Pod, K8s Node, Host, Service Instance, etc).

Here is a bit more comprehensive case (boxes are entities, arrows are relationships):

image

Today we lump all information about these entities into a single bag of attributes and call it a Resource. By doing so we loose important information. The irony is that our resource detectors often detect entities separately, so our implementations in the Collector for example could create distinct entities and even link them by the correct relationships. Instead we flatten this information into a single bag of attributes and call it a Resource.

@jsuereth
Copy link
Contributor

jsuereth commented Jul 8, 2022

@tigrannajaryan This is inline with my thinking as well. I think maybe we should open a (non prometheus SDK focused) feature-request / proposal around resources that can help resolve the specific issues called out in this bug. Is it worth discussing in a SPec SiG before working on an entity proposal, or should we flesh out this idea more (I'm happy to outline the requirements section)?

@tigrannajaryan
Copy link
Member

@tigrannajaryan This is inline with my thinking as well. I think maybe we should open a (non prometheus SDK focused) feature-request / proposal around resources that can help resolve the specific issues called out in this bug. Is it worth discussing in a SPec SiG before working on an entity proposal, or should we flesh out this idea more (I'm happy to outline the requirements section)?

I think it is worth discussing first. It is a relatively big commitment IMO. I personally would love to work on it, but have a few other things in progress right now, so would prefer to finish those first.

@dashpole
Copy link
Contributor

Do we want to limit Resource attributes to only attributes useful to metrics (i.e. only identifying, low-cardinality attribtues)

@jsuereth maybe i'm misreading, but this seems contradictory. If a set of attributes is uniquely identifying, it has maximum cardinality. Put another way, adding non-identifying attributes does not increase cardinality. The concern here seems to be that prometheus keeps the same resource (i.e. job + instance) across restarts, but OTel does not, which results in higher cardinality. The benefit of OTel's model is that we can tell when something restarted, but it comes at the cost of an extra metric stream.

This means a casual user ingesting the target info metric would be exposed to higher cardinality than they would otherwise expect.

A single metric stream per-restart of an application doesn't really seem like a cardinality issue, as a single prometheus histogram often has hundreds, or even thousands of streams. Are you concerned about specific scenarios (e.g. crashlooping)?

I'm definitely on-board with it being configurable, and like the idea of a filter for the resource attributes added, but it does seem nice to preserve resource information by default.

@tigrannajaryan
Copy link
Member

If a set of attributes is uniquely identifying, it has maximum cardinality. Put another way, adding non-identifying attributes does not increase cardinality.

@dashpole initially I thought so too, but I don't think this is always true. For example imagine we are reporting host metrics. The host is uniquely identified by SMBIOS hardware machine ID which is the only and sufficient identifying attribute that we add to the Resource as a let's say "host.id" attribute. However, we also decide to add the IP address as a Resource attribute "host.ip" (sounds like a reasonable idea for a Host, right?). The host also happens to be using DHCP, so its IP address changes over time (can happen on restarts or even periodically). So, the combination of the machine ID and IP address is higher cardinality than machine ID alone.

Arguably, it not only is higher cardinality, it results in discontinuities in the metrics reported by the same host, so it is also logically undesirable even if higher cardinality itself is not a problem.

So, to make it clear: non-identifying attribute does not mean "a derived" attribute that can be inferred from identifying attributes. Non-identifying really means any value that should not be used for identification purposes because it is superfluous and results in incorrect identity of timeseries.

@fstab
Copy link
Member

fstab commented Jul 19, 2022

I'm wondering how much of a problem this actually is. Even if some attributes change occasionally, target_info still are time series with constant values of 1, which should be fairly cheap to store in Prometheus.

@jack-berg
Copy link
Member Author

I'm wondering how much of a problem this actually is. Even if some attributes change occasionally, target_info still are time series with constant values of 1, which should be fairly cheap to store in Prometheus.

It will depend on the frequency of the restarts and how resource constrained the prometheus instances are, but in general I think I agree that it shouldn't be a big problem most of the time. One solution would be to have the prometheus exporter have a configurable disallow list, where target_info includes all resource attributes except those on the disallowed list.

The other question implied in this issue but not explicitly stated is what is the use case that including target_info solves? Reiterating that in pull based metric systems ingestors are expected know the target metadata a-priori, I can't think of a sensible setup where a user would be able to take advantage of this information.

@tigrannajaryan
Copy link
Member

I'm wondering how much of a problem this actually is. Even if some attributes change occasionally, target_info still are time series with constant values of 1, which should be fairly cheap to store in Prometheus.

It will depend on the frequency of the restarts and how resource constrained the prometheus instances are, but in general I think I agree that it shouldn't be a big problem most of the time.

Yes, it is likely not a problem most of the time from performance perspective.

However, to emphasize what I wrote above: I think it is still a problem from timeseries continuity perspective. If I am looking at the CPU usage timeseries for a particular host I don't expect the timeseries to terminate and a new timeseries to be created when the IP address of the machine changes (well, assuming that I believe the host is identified solely by its machine ID and not by its IP address). It is important that the timerseries is identified by dimensions that the user expects to be the identity of the entity that is being observed (sorry, that was mouthful). When irrelevant dimensions are added on top of the correct ones it can result in a surprising behavior.

@jack-berg
Copy link
Member Author

I think it is still a problem from timeseries continuity perspective. If I am looking at the CPU usage timeseries for a particular host I don't expect the timeseries to terminate and a new timeseries to be created when the IP address of the machine changes (well, assuming that I believe the host is identified solely by its machine ID and not by its IP address).

I agree in general that we need to find some mechanism to indicate which resource attributes are identifying vs descriptive. However the prometheus SDK exporter doesn't necessarily have to address this question since in pull based exporters scrapers have the responsibility of identifying entities. Including resource attributes in target_info conflates responsibilities, and it's not clear which use case(s) stand to benefit.

To put it another way, if we were to remove the language around sdks exposing resource attributes in the prometheus exporter via target_info, what would we loose? The target info is aimed at push based approaches, and for pull based approaches that expose several targets (i.e. a collector exposing the metrics for several individual SDKs). The SDK prometheus exporter doesn't fall under either of those use cases, so why include target info at all?

@tigrannajaryan
Copy link
Member

However the prometheus SDK exporter doesn't necessarily have to address this question since in pull based exporters scrapers have the responsibility of identifying entities.

I understand the desire to limit the scope but I think the solution may need to be broader than what is necessary just for Prometheus SDK exporter. Exporting into other non-OTLP formats will likely face similar challenges.
Also, what about Prometheus remote-write exporter, which is push, not pull?

@jack-berg
Copy link
Member Author

Also, what about Prometheus remote-write exporter, which is push, not pull?

The SDK prometheus exporter "MUST NOT support Push mode".

@fstab
Copy link
Member

fstab commented Jul 20, 2022

in general I think I agree that it shouldn't be a big problem most of the time. One solution would be to have the prometheus exporter have a configurable disallow list, where target_info includes all resource attributes except those on the disallowed list.

👍 this sounds good: If target_info isn't a problem for most users, it should be present by default. Semantic resource conventions include useful information, like the command line for processes, or the JVM version for Java runtimes. It would be sad if this was unavailable in Prometheus by default.

why include target info at all?

There different ways how to set up an OTel metrics pipeline with Prometheus. You could scrape metrics directly from the applications. Or you could have the applications push metrics to the collector via OTLP, and the collector forward these metrics to the Prometheus server via remote write.

It would be good if users could see the same metrics in Prometheus independent of the pipeline. It would be weird if the target_info with attributes like the process command line or the JVM version was present in a collector-based pipeline but missing if applications are scraped directly.

@dashpole
Copy link
Contributor

There are a few motivations behind target_info:

  1. Limiting data loss when round-tripping between Prometheus and OpenTelemetry.
  2. (mentioned by @fstab) Parity between the data produced by a Prometheus exporter and an OTLP exporter
  3. Providing users with resource attributes which are not generally available in pull systems (e.g. runtime information)

For the OpenMetrics specification: It is true that pull-based systems are expected to have have some resource information, but metadata the exposer is aware of is still expected to be useful when using pull-based systems:
even though pull-style ingestors should use their own target metadata, it is still often useful to have access to the metadata the exposer itself is aware of.

@jack-berg
Copy link
Member Author

  1. Limiting data loss when round-tripping between Prometheus and OpenTelemetry.

On this note the round trip behavior of sdk w/ prometheus exporter -> collector w/ prometheus receiver -> otlp is actually pretty strange right now for resource attributes. I talk about it in this comment, but TL;DR is that the collector prometheus receiver assigns a serivce.name and service.instance.id based on configured prometheus the job and instance, but it also assigns a service_name and service_instance_id based on what it scrapes from target_info. So we get this lossy translation where dot delimited resource attributes change to snake case, and are sometimes duplicated. Not great.

  1. Providing users with resource attributes which are not generally available in pull systems (e.g. runtime information)

A question I keep asking is how users are actually expected to use these. To my knowledge, you can't join in prometheus so the data isn't accessible at query time. And I also couldn't find any way to configure a prometheus scrape job to extract target_info attributes and add them to the scraped series.

Overall though, I'm happy enough if the sdk prometheus exporter has a optional configurable disallow list. Or perhaps an even simpler option of having an option to toggle target_info off all together.

@gouthamve
Copy link
Member

To my knowledge, you can't join in prometheus so the data isn't accessible at query time.

This is not true tbh, and _info metrics are common to have joins on. I am working on a blogpost that illustrates this specifically for target_info from OTel Col, but you can also see this in action here: https://grafana.com/blog/2021/08/04/how-to-use-promql-joins-for-more-effective-queries-of-prometheus-metrics-at-scale/

The Prometheus community has also been talking about a simpler syntax for joins: prometheus/prometheus#3094

And on top, metric discontinuity is not that much of an issue in Prometheus because you never select on labels that are going to be different across restarts, change frequently. If you don't select on them, the queries should all work the same with and without the metric discontinuities.

Further, @fstab mentioned in todays Prometheus WG call that we can always use metric_relabel_rules to drop the labels that have a lot of churn. This is equivalent to the configurable disallow list. I am not a fan of completely turning it off though.

@jack-berg
Copy link
Member Author

This is not true tbh, and _info metrics are common to have joins on.

Good to know!

we can always use metric_relabel_rules to drop the labels that have a lot of churn.

Another great point.

So with joins available to access target_info attributes I'm convinced of the usefulness of making the resource attributes available via target_info. And with metric_relabel_rules available to perform the same functionality as a disallow list would provide, I'm in favor of encouraging users to use that existing functionality rather than adding something new.

I still think the round tripping is weird, and that we generally need some way to indicate resource attributes as identifying or descriptive, but I'm happy enough to close this specific issue.

@lucabrunox
Copy link

lucabrunox commented Aug 25, 2022

I agree in general that we need to find some mechanism to indicate which resource attributes are identifying vs descriptive.

To plot exactly 1 metric in Prometheus all the labels of that metric are needed, which to me sounds as all label key-value pairs are needed to identify a metric. Does that mean those labels should be in the "resource" field of OT then instead of "datapoint attributes"?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:sdk Related to the SDK needs discussion Need more information before all suitable labels can be applied spec:metrics Related to the specification/metrics directory triaged-accepted The issue is triaged and accepted by the OTel community, one can proceed with creating a PR proposal
Projects
None yet
Development

No branches or pull requests

9 participants