Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

http_server_duration_bucket generates high cardinality metrics #5753

Closed
irizzant opened this issue Apr 5, 2022 · 9 comments · Fixed by #5819
Closed

http_server_duration_bucket generates high cardinality metrics #5753

irizzant opened this issue Apr 5, 2022 · 9 comments · Fixed by #5819
Labels
bug Something isn't working

Comments

@irizzant
Copy link

irizzant commented Apr 5, 2022

Describe the bug
The Java agent, when configured to export metrics with Prometheus exporter, creates as expected the bucket http_server_duration_bucket.

This metric also has the http_route label, which produces high cardinality because it contains full paths like:

http_server_duration_bucket{container="xxx", endpoint="xxx", http_flavor="1.1", http_host="xxx", http_method="GET", http_route="/app/wsrest/userlogo/holding.png", http_scheme="http", http_status_code="200", instance="xxx:9464", job="xxx", le="7500.0", namespace="wms", pod="xxxx", service="wms"}

In my cluster this caused the number of time series to increase of ~300k !

Steps to reproduce

  1. Instrument a Java app to export metrics with Prometheus exporter
  2. Setup a Prometheus instance to scrape those metrics
  3. Query Prometheus for http_server_duration_bucket

What did you expect to see?
http_server_duration_bucket should export lower cardinality metrics

What did you see instead?
http_server_duration_bucket export high cardinality metrics

What version are you using?
1.12.1

Environment
Compiler: (e.g., "AdoptOpenJDK 11.0.6")
OS: Ubuntu 20.04
Runtime (if different from JDK above): (e.g., "Oracle JRE 8u251")
OS (if different from OS compiled on): (e.g., "Windows Server 2019")

Additional context
Add any other context about the problem here.

@irizzant irizzant added the bug Something isn't working label Apr 5, 2022
@trask
Copy link
Member

trask commented Apr 5, 2022

hi @irizzant!

I believe http_route=/app/wsrest/userlogo/holding.png is a bug, can you enable otel.javaagent.debug=true and post the full span details for a similar span that is producing http_route like this? that should include the instrumentationLibrary which produced it which will give us clue where to look

@irizzant
Copy link
Author

irizzant commented Apr 5, 2022

Hey @trask !
I think it's the undertow instrumentation:

     [exec] 20220405_200339_754 ERROR #[[default task-11@srv=xxx]]# #[[stderr]]# [otel.javaagent 2022-04-05 20:03:39:754 +0200] [default task-11] INFO io.opentelemetry.exporter.logging.LoggingSpanExporter - '/app/wsrest/userlogo/holding.png' : a1e2248697dec2b96e89e045c57e035f 850b5b7800abcbd0 SERVER [tracer: io.opentelemetry.undertow-1.4:1.12.1] AttributesMap{data={http.client_ip=xxx, http.host=xxx, http.status_code=200, net.peer.ip=10.42.0.157, thread.name=default I/O-1, http.response_content_length=10566, http.flavor=1.1, http.target=/app/wsrest/userlogo/holding.png?userId=1253&loggedInTime=1649181814859, net.transport=ip_tcp, thread.id=121, http.scheme=http, http.method=GET, net.peer.port=51774, http.route=/app/wsrest/userlogo/holding.png, http.user_agent=Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:98.0) Gecko/20100101 Firefox/98.0}, capacity=128, totalAddedValues=15} 

@irizzant
Copy link
Author

irizzant commented Apr 5, 2022

I also noticed traces like these, which makes me think this is also a problem on the client end:

     [exec] 20220405_200339_947 ERROR #[[default task-8@srv=xxxxx]]# #[[stderr]]# [otel.javaagent 2022-04-05 20:03:39:947 +0200] [default task-8] INFO io.opentelemetry.exporter.logging.LoggingSpanExporter - '/app/*' : 1185f729216b84c0043dc4624ed42917 09aecfa98759b24d SERVER [tracer: io.opentelemetry.undertow-1.4:1.12.1] AttributesMap{data={http.client_ip=xxxx, http.host=xxxx, http.status_code=200, net.peer.ip=xxx, thread.name=default I/O-1, http.response_content_length=3251, http.flavor=1.1, http.target=/app/resources/sdb/css/icons_svg/search.svg, net.transport=ip_tcp, thread.id=121, http.scheme=http, http.method=GET, net.peer.port=51774, http.route=/app/*, http.user_agent=Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:98.0) Gecko/20100101 Firefox/98.0}, capacity=128, totalAddedValues=15} 

you can see that http.target=/app/resources/sdb/css/icons_svg/search.svg which is the same as the one we have in http.route but for http_client_duration_bucket

@trask
Copy link
Member

trask commented Apr 5, 2022

what is your servlet mapping for /app/wsrest/userlogo/holding.png? (e.g. from your web.xml or equivalent annotation-based configuration)

on the client side http.target shouldn't appear on metrics anymore (since #5081), can you confirm?

@irizzant
Copy link
Author

irizzant commented Apr 6, 2022

@trask
I confirm I see the http.target in the logs at debug level, but I don't see http_client label in the http_client_duration_bucket, sorry for the false alarm.
My servlet mapping is the following:

<servlet-mapping>
		<servlet-name>JAX-RS Servlet</servlet-name>
		<url-pattern>/wsrest/*</url-pattern>
	</servlet-mapping>
<servlet>
		<servlet-name>JAX-RS Servlet</servlet-name>
		<servlet-class>it.sdb.jee.rest.application.ApplicationSDB</servlet-class>
		<load-on-startup>1</load-on-startup>
	</servlet>

@trask
Copy link
Member

trask commented Apr 7, 2022

is /app/wsrest/userlogo/holding.png served by your JAX-RS servlet at /wsrest/*? or is it a static resource served by the underlying servlet container?

@irizzant
Copy link
Author

irizzant commented Apr 7, 2022

@trask no the PNG is a static resource it's not served by JAX-RS

@mateuszrzeszutek
Copy link
Member

Hey @irizzant ,
Would it be possible for you to extract a part of your application as a repro app? Without a working example it's really hard to reason about the http.route attribute, since it usually isn't set by the server instrumentations, but (one or more) controller/mvc framework.

@irizzant
Copy link
Author

irizzant commented Apr 12, 2022

Hi @mateuszrzeszutek @trask

Would it be possible for you to extract a part of your application as a repro app?

this request took me a lot of work!

Anyway, please check https://github.com/irizzant/otel-java-instrumentation-5753

If you run the reproducer, you will see that the exported Undertow metrics have http.route that get to the very single resources, for example you can see http.route reporting the single /jsf-demo/hello.xhtml

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
3 participants