Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory Leak with Java 21 #9805

Closed
anuragagarwal561994 opened this issue Nov 2, 2023 · 10 comments
Closed

Memory Leak with Java 21 #9805

anuragagarwal561994 opened this issue Nov 2, 2023 · 10 comments
Labels
bug Something isn't working needs triage New issue that requires triage

Comments

@anuragagarwal561994
Copy link
Contributor

anuragagarwal561994 commented Nov 2, 2023

Describe the bug

We recently migrated to java 21 and tried Virtual Thread executor provided by tomcat v10.

We fine tuned the application to get things working, but saw ever increasing old generation space wih 3.2 MB/s promotion rate while in general our application has close to 200-500 KB / s promotion rate.

When we tried to disable multiple components like logging or further fine tuning things we were not able to identify any reason except for when we removed otel agent and everything started working fine.

I have not been able to identify what the root cause of the issue is, but I could observe two things:

  • higher promotion rate
  • ever increasing direct buffer size

Steps to reproduce

Our application is serving around 2K qps with 2 GB/s allocation rate.

Java 21
Tomcat with Virtual Thread Executor v10

Expected behavior

Memory should not increase unboundedly

Actual behavior

Direct Buffer Size is increasing unboundedly

Old Gen space increasing unboundedly

Promotion rate increased

Javaagent or library instrumentation version

1.28

Environment

openjdk version "21" 2023-09-19 LTS
OpenJDK Runtime Environment Temurin-21+35 (build 21+35-LTS)
OpenJDK 64-Bit Server VM Temurin-21+35 (build 21+35-LTS, mixed mode, sharing)

Linux 5.15.120+ SMP Sat Sep 30 10:17:59 UTC 2023 x86_64 Linux

Additional context

Key libraries we use include:
kafka
apache http async client
grpc
tomcat

@anuragagarwal561994 anuragagarwal561994 added bug Something isn't working needs triage New issue that requires triage labels Nov 2, 2023
@anuragagarwal561994
Copy link
Contributor Author

Before:
Screenshot 2023-11-03 at 3 22 53 AM
Screenshot 2023-11-03 at 3 23 09 AM

After:
Screenshot 2023-11-03 at 3 24 32 AM
Screenshot 2023-11-03 at 3 24 48 AM

@trask
Copy link
Member

trask commented Nov 2, 2023

hi @anuragagarwal561994, does before and after refer to before Java 21 and after Java 21? or before OTel Java agent and after OTel Java agent?

@anuragagarwal561994
Copy link
Contributor Author

anuragagarwal561994 commented Nov 3, 2023

@trask this is before and after Otel Agent on Java 21 and when using virtual thread.

Without virtual threads things look like the second graph itself

@laurit
Copy link
Contributor

laurit commented Nov 3, 2023

@anuragagarwal561994 It is unlikely that we will be able to help you based on just the information that you have provided. If you are seeking for a speedy resolution for this issue then please provide a sample app that allows for reproducing this issue or use the tools available to you (heap dumps, jfr, whatever) to figure out what objects are being leaked or what is causing the direct byte buffers to accumulate.

@anuragagarwal561994
Copy link
Contributor Author

@laurit we are trying to figure it out ourselves will be able to provide more information on this soon.

Just wanted to check if we have benchmarked the agent with hava 21 and virtual thread kind of scenarios and if not we can try building that use case

For now what we are doing is trying to disable all instruments and enable them one by one to identify what may be leaking.

@anuragagarwal561994
Copy link
Contributor Author

@trask @laurit we have found the instrumentation with issue and it actually makes sense.\

[otel.javaagent 2023-11-03 16:19:33:793 +0000] [main] DEBUG io.opentelemetry.javaagent.extension.instrumentation.InstrumentationModule - Applying instrumentation: executors [class io.opentelemetry.javaagent.instrumentation.executors.ExecutorsInstrumentationModule] on io.opentelemetry.javaagent.bootstrap.AgentClassLoader$BootstrapClassLoaderProxy@299a06ac
[otel.javaagent 2023-11-03 16:19:33:912 +0000] [main] DEBUG io.opentelemetry.javaagent.extension.instrumentation.InstrumentationModule - Applying instrumentation: internal-lambda [class io.opentelemetry.javaagent.instrumentation.internal.lambda.LambdaInstrumentationModule] on io.opentelemetry.javaagent.bootstrap.AgentClassLoader$BootstrapClassLoaderProxy@299a06ac
[otel.javaagent 2023-11-03 16:19:33:927 +0000] [main] DEBUG io.opentelemetry.javaagent.extension.instrumentation.InstrumentationModule - Applying instrumentation: internal-reflection [class io.opentelemetry.javaagent.instrumentation.internal.reflection.ReflectionInstrumentationModule] on io.opentelemetry.javaagent.bootstrap.AgentClassLoader$BootstrapClassLoaderProxy@299a06ac
[otel.javaagent 2023-11-03 16:19:33:932 +0000] [main] DEBUG io.opentelemetry.javaagent.extension.instrumentation.InstrumentationModule - Applying instrumentation: internal-class-loader [class io.opentelemetry.javaagent.instrumentation.internal.classloader.ClassLoaderInstrumentationModule] on io.opentelemetry.javaagent.bootstrap.AgentClassLoader$BootstrapClassLoaderProxy@299a06ac
[otel.javaagent 2023-11-03 16:19:34:144 +0000] [main] DEBUG io.opentelemetry.javaagent.extension.instrumentation.InstrumentationModule - Applying instrumentation: internal-url-class-loader [class io.opentelemetry.javaagent.instrumentation.internal.urlclassloader.UrlClassLoaderInstrumentationModule] on io.opentelemetry.javaagent.bootstrap.AgentClassLoader$BootstrapClassLoaderProxy@299a06ac
[otel.javaagent 2023-11-03 16:19:34:350 +0000] [main] DEBUG io.opentelemetry.javaagent.extension.instrumentation.InstrumentationModule - Applying instrumentation: servlet [class io.opentelemetry.javaagent.instrumentation.servlet.v5_0.JakartaServletInstrumentationModule] on jdk.internal.loader.ClassLoaders$AppClassLoader@4e0e2f2a
[otel.javaagent 2023-11-03 16:19:34:403 +0000] [main] DEBUG io.opentelemetry.javaagent.extension.instrumentation.InstrumentationModule - Applying instrumentation: log4j-appender [class io.opentelemetry.javaagent.instrumentation.log4j.appender.v2_17.Log4jAppenderInstrumentationModule] on jdk.internal.loader.ClassLoaders$AppClassLoader@4e0e2f2a
[otel.javaagent 2023-11-03 16:19:34:505 +0000] [main] DEBUG io.opentelemetry.javaagent.extension.instrumentation.InstrumentationModule - Applying instrumentation: executors [class io.opentelemetry.javaagent.instrumentation.executors.ExecutorsInstrumentationModule] on jdk.internal.loader.ClassLoaders$AppClassLoader@4e0e2f2a
[otel.javaagent 2023-11-03 16:19:34:689 +0000] [main] DEBUG io.opentelemetry.javaagent.extension.instrumentation.InstrumentationModule - Applying instrumentation: log4j-context-data [class io.opentelemetry.javaagent.instrumentation.log4j.contextdata.v2_17.Log4j2InstrumentationModule] on jdk.internal.loader.ClassLoaders$AppClassLoader@4e0e2f2a
[otel.javaagent 2023-11-03 16:19:34:761 +0000] [main] DEBUG io.opentelemetry.javaagent.extension.instrumentation.InstrumentationModule - Applying instrumentation: http-url-connection [class io.opentelemetry.javaagent.instrumentation.httpurlconnection.HttpUrlConnectionInstrumentationModule] on io.opentelemetry.javaagent.bootstrap.AgentClassLoader$BootstrapClassLoaderProxy@299a06ac
[otel.javaagent 2023-11-03 16:19:35:196 +0000] [main] DEBUG io.opentelemetry.javaagent.extension.instrumentation.InstrumentationModule - Applying instrumentation: java-util-logging [class io.opentelemetry.javaagent.instrumentation.jul.JavaUtilLoggingInstrumentationModule] on io.opentelemetry.javaagent.bootstrap.AgentClassLoader$BootstrapClassLoaderProxy@299a06ac
[otel.javaagent 2023-11-03 16:19:35:202 +0000] [main] DEBUG io.opentelemetry.javaagent.extension.instrumentation.InstrumentationModule - Applying instrumentation: java-util-logging [class io.opentelemetry.javaagent.instrumentation.jul.JavaUtilLoggingInstrumentationModule] on jdk.internal.loader.ClassLoaders$AppClassLoader@4e0e2f2a
[otel.javaagent 2023-11-03 16:19:35:376 +0000] [main] DEBUG io.opentelemetry.javaagent.extension.instrumentation.InstrumentationModule - Applying instrumentation: kafka-clients [class io.opentelemetry.javaagent.instrumentation.kafkaclients.v0_11.KafkaClientsInstrumentationModule] on jdk.internal.loader.ClassLoaders$AppClassLoader@4e0e2f2a
[otel.javaagent 2023-11-03 16:19:36:478 +0000] [main] DEBUG io.opentelemetry.javaagent.extension.instrumentation.InstrumentationModule - Applying instrumentation: tomcat [class io.opentelemetry.javaagent.instrumentation.tomcat.v10_0.Tomcat10InstrumentationModule] on jdk.internal.loader.ClassLoaders$AppClassLoader@4e0e2f2a
[otel.javaagent 2023-11-03 16:19:36:567 +0000] [main] DEBUG io.opentelemetry.javaagent.extension.instrumentation.InstrumentationModule - Applying instrumentation: internal-class-loader [class io.opentelemetry.javaagent.instrumentation.internal.classloader.ClassLoaderInstrumentationModule] on jdk.internal.loader.ClassLoaders$AppClassLoader@4e0e2f2a
[otel.javaagent 2023-11-03 16:19:36:738 +0000] [main] DEBUG io.opentelemetry.javaagent.extension.instrumentation.InstrumentationModule - Applying instrumentation: quartz [class io.opentelemetry.javaagent.instrumentation.quartz.v2_0.QuartzInstrumentationModule] on jdk.internal.loader.ClassLoaders$AppClassLoader@4e0e2f2a
[otel.javaagent 2023-11-03 16:19:36:827 +0000] [main] DEBUG io.opentelemetry.javaagent.extension.instrumentation.InstrumentationModule - Applying instrumentation: guava [class io.opentelemetry.javaagent.instrumentation.guava.v10_0.GuavaInstrumentationModule] on jdk.internal.loader.ClassLoaders$AppClassLoader@4e0e2f2a
[otel.javaagent 2023-11-03 16:19:37:040 +0000] [main] DEBUG io.opentelemetry.javaagent.extension.instrumentation.InstrumentationModule - Applying instrumentation: opentelemetry-instrumentation-annotations [class io.opentelemetry.javaagent.instrumentation.instrumentationannotations.AnnotationInstrumentationModule] on jdk.internal.loader.ClassLoaders$AppClassLoader@4e0e2f2a
[otel.javaagent 2023-11-03 16:19:38:229 +0000] [main] DEBUG io.opentelemetry.javaagent.extension.instrumentation.InstrumentationModule - Applying instrumentation: rmi [class io.opentelemetry.javaagent.instrumentation.rmi.server.RmiServerInstrumentationModule] on io.opentelemetry.javaagent.bootstrap.AgentClassLoader$BootstrapClassLoaderProxy@299a06ac
[otel.javaagent 2023-11-03 16:19:38:230 +0000] [main] DEBUG io.opentelemetry.javaagent.extension.instrumentation.InstrumentationModule - Applying instrumentation: rmi [class io.opentelemetry.javaagent.instrumentation.rmi.server.RmiServerInstrumentationModule] on jdk.internal.loader.ClassLoaders$AppClassLoader@4e0e2f2a
[otel.javaagent 2023-11-03 16:19:38:399 +0000] [main] DEBUG io.opentelemetry.javaagent.extension.instrumentation.InstrumentationModule - Applying instrumentation: grpc [class io.opentelemetry.javaagent.instrumentation.grpc.v1_6.GrpcInstrumentationModule] on jdk.internal.loader.ClassLoaders$AppClassLoader@4e0e2f2a
[otel.javaagent 2023-11-03 16:19:38:745 +0000] [main] DEBUG io.opentelemetry.javaagent.extension.instrumentation.InstrumentationModule - Applying instrumentation: jdbc [class io.opentelemetry.javaagent.instrumentation.jdbc.JdbcInstrumentationModule] on jdk.internal.loader.ClassLoaders$AppClassLoader@4e0e2f2a
[otel.javaagent 2023-11-03 16:19:38:763 +0000] [main] DEBUG io.opentelemetry.javaagent.extension.instrumentation.InstrumentationModule - Applying instrumentation: hikaricp [class io.opentelemetry.javaagent.instrumentation.hikaricp.v3_0.HikariCpInstrumentationModule] on jdk.internal.loader.ClassLoaders$AppClassLoader@4e0e2f2a
[otel.javaagent 2023-11-03 16:19:39:349 +0000] [main] DEBUG io.opentelemetry.javaagent.extension.instrumentation.InstrumentationModule - Applying instrumentation: apache-httpclient [class io.opentelemetry.javaagent.instrumentation.apachehttpclient.v5_0.ApacheHttpClientInstrumentationModule] on jdk.internal.loader.ClassLoaders$AppClassLoader@4e0e2f2a
[otel.javaagent 2023-11-03 16:19:43:463 +0000] [QuartzScheduler_Worker-7] DEBUG io.opentelemetry.javaagent.extension.instrumentation.InstrumentationModule - Applying instrumentation: jedis [class io.opentelemetry.javaagent.instrumentation.jedis.v1_4.JedisInstrumentationModule] on jdk.internal.loader.ClassLoaders$AppClassLoader@4e0e2f2a
[otel.javaagent 2023-11-03 16:19:43:478 +0000] [QuartzScheduler_Worker-4] DEBUG io.opentelemetry.javaagent.extension.instrumentation.InstrumentationModule - Applying instrumentation: opentelemetry-api [class io.opentelemetry.javaagent.instrumentation.opentelemetryapi.OpenTelemetryApiInstrumentationModule] on jdk.internal.loader.ClassLoaders$AppClassLoader@4e0e2f2a

These are the instrumentations that are being applied in our application, we tried doing binary searching for disabling instrumentations and then found executors to be responsible for leaking memory.

[otel.javaagent 2023-11-03 13:39:37:859 +0000] [grpc-default-worker-ELG-1-11] DEBUG io.opentelemetry.javaagent.bootstrap.executors.PropagatedContext - Failed to propagate context because previous propagated context is already set;

We are getting repeated such errors messages when we enabled debug logs for otel.

This instrumentation is not virtual thread compatible and hence creating issues. Let me know if more details are needed, but I think this can be a good start to reproduce the issue.

@trask
Copy link
Member

trask commented Nov 3, 2023

can you try with the latest version (1.31.0), which includes #9616?

@laurit
Copy link
Contributor

laurit commented Nov 6, 2023

@anuragagarwal561994 knowing that the issue has something to do with executor instrumentation is a start, but it really isn't all that useful. I still think that examining the heap dump could give you more information. The log line about grpc that you pasted might not be related to this issue at all.

@anuragagarwal561994
Copy link
Contributor Author

Yes @laurit I actually checked forgot to update here, we did a binary search in the instrumentations in use and identified that executors instrumentation is the one causing the issue. We are checking with the latest version if everything looks fine, will take 1-2 days to give further updates

@anuragagarwal561994
Copy link
Contributor Author

@laurit @trask we have tested with 1.31 version, this is working fine. Thanks for helping

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working needs triage New issue that requires triage
Projects
None yet
Development

No branches or pull requests

3 participants