Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sanitize metric names for the Prometheus client #4866

Merged

Conversation

jonatan-ivanov
Copy link
Member

We need to sanitize metric names for the Prometheus client otherwise it fails with something like this:

java.lang.IllegalArgumentException: 'jvm_info': Illegal metric name. The metric name must not include the '_info' suffix.

This can be reproduced with:

Gauge.builder("test.info", () -> 1).register(registry);

The name of this Gauge will be test_info in Prometheus that is invalid according to the new client, see PrometheusNaming.

We have built-in instrumentation that fails this validation too (see JvmInfoMetrics):

new JvmInfoMetrics().bindTo(registry);

If we don't want to break this, we need to sanitize the invalid suffixes.

Please check the tests because the behavior can be unexpected sometimes:

  • If you create a Micrometer Gauge named test.info, the name of Prometheus gauge will be test_info (since the registry removes but the Prometheus client adds back the _info suffix).
  • If you create a Micrometer Counter named test.total, the name of the Prometheus counter will be test_total (since the registry removes but the Prometheus client adds back the _total suffix).
  • If you create a Micrometer Counter named test.info, the name of the Prometheus counter will be test_total (since the registry removes the _info suffix and the Prometheus client adds the _total suffix for counters).
  • If you create a Micrometer Gauge named test.total, the name of the Prometheus counter will be test (since the registry removes the _total suffix and the Prometheus client does not add _info suffix for gauges).
  • Similarly, _created and _bucket will be removed otherwise the Prometheus client will fail.

We need to sanitize metric names for the Prometheus client otherwise
it fails with something like this:
java.lang.IllegalArgumentException: 'jvm_info': Illegal metric name.
The metric name must not include the '_info' suffix.

This can be reproduced with:
Gauge.builder("test.info", () -> 1).register(registry);

The name of this Gauge will be test_info in Prometheus that is invalid
according to the new client, see PrometheusNaming:
https://github.com/prometheus/client_java/blob/8bb75446c6dca2a4e2c074e39fe34c1b061b028a/prometheus-metrics-model/src/main/java/io/prometheus/metrics/model/snapshots/PrometheusNaming.java#L37-L40

We have built-in instrumentation that fails this validation too, see JvmInfoMetrics:
https://github.com/micrometer-metrics/micrometer/blob/966f5fe953a69c315b44bf4f63adf101a060bbfb/micrometer-core/src/main/java/io/micrometer/core/instrument/binder/jvm/JvmInfoMetrics.java#L32

If we don't want to break this, we need to sanitize the invalid suffixes.

Please check the tests because the behavior can be unexpected sometimes:
- If you create a Micrometer Gauge named test.info,
the name of the Prometheus gauge will be test_info (since the registry
removes but the Prometheus client adds back the `_info` suffix).
- If you create a Micrometer Counter named test.total,
the name of the Prometheus counter will be test_total (since the
registry removes but the Prometheus client adds back the _total suffix).
- If you create a Micrometer Counter named test.info,
the name of the Prometheus counter will be test_total (since the
registry removes the _info suffix and the Prometheus client adds the
_total suffix for counters).
- If you create a Micrometer Gauge named test.total,
the name of the Prometheus counter will be test (since the registry
removes the _total suffix and the Prometheus client does not add
_info suffix for gauges).
- Similarly, _created and _bucket will be removed otherwise
the the Prometheus client will fail.
@fstab
Copy link
Contributor

fstab commented Mar 21, 2024

The client_java provides methods

  • PrometheusNaming.sanitizeMetricName(String)
  • PrometheusNaming.sanitizeLabelName(String)

I think the best solution is to call these before passing Micrometer names to the Prometheus library.

For context here are two subtle facts about OpenMetric naming:

Metric Name != Sample Name

There's a difference between the "metric name" and the "sample name". Here's an example from the OpenMetrics spec:

# TYPE foo info
foo_info{name="pretty name",version="8.2.7"} 1

The metric name is foo (the name behind the TYPE identifier), and the sample name is foo_info.

This is more obvious when you have different sample names for one metric, like with counters:

# TYPE foo counter
foo_total 17.0
foo_created 1520430000.123

Here, the metric name is foo, and the sample names are foo_total and foo_created.

The Prometheus client library stores the metric name (without suffix) and appends suffixes when metrics are exposed in OpenMetrics text format.

Prometheus will support arbitrary UTF-8 Characters in the Future

Prometheus metric and label names currently allow only a very restricted set of characters, while OpenTelemetry metric and label names support arbitrary UTF-8 characters. This is annoying for users who store OpenTelemetry metrics in a Prometheus database, because the metric names get converted under that hood (for example . is replaced with _).

The Prometheus community want to fix this by allowing arbitrary UTF-8 characters in Prometheus. There is currently discussion how to implement this (see this doc).

The Prometheus client_java library is prepared for this (internally it stores a name that could allow arbitrary UTF-8 characters in the future and a prometheusName for the restricted character set). Once there is consensus how to implement this, the client_java library will offer different exposition formats (current format with classic Prometheus names, new format with UTF-8 names). The format can be selected depending on the Accepts: header of the scrape request.

Anyway, if you use PrometheusNaming.sanitizeMetricName(String) and PrometheusNaming.sanitizeLabelName(String) you don't need to care. Once there is consensus, we'll update these methods and support the same character set as the Prometheus server. If you use the sanitize...() methods in PrometheusNaming your code will be forward-compatible with future versions when the supported character set changes.

@fstab
Copy link
Contributor

fstab commented Mar 21, 2024

Info metrics:

This is tricky, because afaik Micrometer does not have Info metrics and uses Gauge to represent them.

Possible solutions:

  • Heuristic: If a Gauge has an .info or _info suffix and a value of 1, it is mapped to a Prometheus Info rather than a Prometheus Gauge.
  • Hard-coded exception for jvm.info: Afaik this is the only built-in info metric, so we could just hard-code it to be mapped to a Prometheus info rather than a Gauge.
  • Explicit configuration: Provide a config option where users can specify metrics that should be represented as Info in Prometheus. By default jvm.info should be in the list of metrics that should be represented as Info.

@shakuzen
Copy link
Member

Thanks for the review @fstab. I agree we still need to update our PrometheusNamingConvention to delegate to the Prometheus client's PrometheusNaming.sanitize* methods. That was left as a TODO when initial support for the latest client was merged. I think that will alleviate most of the issues. @jonatan-ivanov do you want to try that in this PR? Or I could make a separate PR to try that out. What do you think?

Heuristic: If a Gauge has an .info or _info suffix and a value of 1, it is mapped to a Prometheus Info rather than a Prometheus Gauge.

The PR right now does this except it does not check the value of the gauge. This could break some users who don't want that behavior, but it seems like a reasonable assumption. We'll have to see if we can get some early feedback from users on this.

@jonatan-ivanov
Copy link
Member Author

I think the best solution is to call these before passing Micrometer names to the Prometheus library.

I think it is a good idea but that will not change the weird behavior above. I updated this PR with it, it should mean less code but similar behavior.
(The behavior is slightly different see PrometheusNamingConventionTest.)

There's a difference between the "metric name" and the "sample name".

Oops, I meant to say a third name above: Meter name. :D In the PR, I'm sanitizing the Micrometer Meter name that is passed to MetricMetadata.

The issue here is that we have a Micrometer Gauge with the name jvm.info. With the old client this was converted to a time series:jvm_info{...} 1. I think I should not be too worried about the metric name but if the time series is different, that's a breaking change, i.e. I think this should be ok:

# TYPE jvm info
# HELP jvm JVM version info
jvm_info{...} 1

Even if the metric name is jvm and the type is info, the time series is jvm_info{...} and I assume it can be used as a gauge, the Prometheus text format is also a gauge:

# HELP jvm_info JVM version info
# TYPE jvm_info gauge
jvm_info{...} 1

In order to do this, we need to remove the _info suffix and create an InfoDataPointSnapshot so that _info will be appended back. This leads us to the unexpected behavior.
If this is a Micrometer Gauge with the name test.info, the name of the time series will be test_info which is good but if this is a Micrometer Counter with the name test.info, the name of the time series will be test_total which is somewhat unexpected.

Prometheus will support arbitrary UTF-8 Characters in the Future

I want to do emoji-metrics. ❤️ 😄

@shakuzen

do you want to try that in this PR? Or I could make a separate PR to try that out. What do you think?

I updated the PR. It does not solve the weird behavior but it means less code for us and we will follow the changes in the sanitization logic out of the box.

@jonatan-ivanov jonatan-ivanov merged commit 8e05395 into micrometer-metrics:main Mar 22, 2024
6 checks passed
@jonatan-ivanov jonatan-ivanov deleted the prometheus-sanitization branch March 22, 2024 17:59
@shakuzen shakuzen added the registry: prometheus A Prometheus Registry related issue label Mar 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement A general enhancement registry: prometheus A Prometheus Registry related issue
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants