Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Run micrometer on virtual threads #4852

Closed
wyhasany opened this issue Mar 13, 2024 · 6 comments
Closed

Run micrometer on virtual threads #4852

wyhasany opened this issue Mar 13, 2024 · 6 comments
Labels
question A user question, probably better suited for StackOverflow

Comments

@wyhasany
Copy link

Using virtual threads for publishing metrics should accelerate application performance because the system does not need to schedule platform threads on the CPU. In my Spring Boot application, I attempted to replace the ThreadFactory for OtlpMeterRegistry:

    @Bean
    @ConditionalOnClass(OtlpMeterRegistry.class)
    @ConditionalOnEnabledMetricsExport("otlp")
    public OtlpMeterRegistryPostProcessor otlpMeterRegistryPostProcessor() {
        return new OtlpMeterRegistryPostProcessor();
    }

    static class OtlpMeterRegistryPostProcessor implements BeanPostProcessor {

        @Override
        public Object postProcessBeforeInitialization(Object bean, String beanName) throws BeansException {
            if (bean instanceof OtlpMeterRegistry otlpMeterRegistry) {
                otlpMeterRegistry.stop();
                otlpMeterRegistry.start(
                    Thread
                        .ofVirtual()
                        .name("otlp-metrics-publisher-", 0L)
                        .inheritInheritableThreadLocals(true)
                        .factory()
                );
            }
            return bean;
        }
    }

Unfortunately, this causes the JVM to freeze. It would be beneficial to run micrometer publications on virtual
threads and allow for the configuration of a custom ThreadFactory from the client's perspective.

@jonatan-ivanov
Copy link
Member

I doubt using virtual threads for publishing metrics should impact performance in a positive fashion because virtual threads are also mounted (and unmounted) to platform threads. Virtual threads work on top of platform threads and to manage them, there is a little overhead.

Please look at the official docs and JEP-444:

Use virtual threads in high-throughput concurrent applications, especially those that consist of a great number of concurrent tasks that spend much of their time waiting.

Also:

Virtual threads are not faster threads; they do not run code any faster than platform threads. They exist to provide scale (higher throughput), not speed (lower latency).

Please look into the registry, you should find this in the codebase: Executors.newSingleThreadScheduledExecutor(threadFactory)

We have one platform thread that we use for publishing, and that thread does some work by default with a frequency of one minute. So that part of Micrometer is not concurrent, does not need to scale and runs very infrequently. Doing this with one platform thread seems totally fine to me instead of creating a new virtual thread every time.

What do you think?

@jonatan-ivanov jonatan-ivanov added waiting for feedback We need additional information before we can continue and removed waiting-for-triage labels Mar 13, 2024
@wyhasany
Copy link
Author

It's evident that Micrometer's current use of two platform threads, which perform intermittent work and then sleep until the next metrics collection, is not optimal. These threads remain mostly idle, resulting in inefficient CPU resource utilization:

this.meterPollingService.scheduleAtFixedRate(this::pollMetersToRollover, getInitialDelay(),

scheduledExecutorService.scheduleAtFixedRate(this::publishSafelyOrSkipIfInProgress, initialDelayMillis,

Transitioning this task to Java's virtual threads, such as those provided by ForkJoinPool, could significantly enhance efficiency. By leveraging virtual threads, we could reuse existing threads from the ForkJoinPool for the duration of the metric collection task, minimizing resource wastage.

Furthermore, reducing the number of platform threads in the JVM process aligns with the goal of minimizing system scheduling overhead, which can be significant. While it's important to acknowledge that the effectiveness of this approach may vary across different environments, exploring the use of virtual threads as an alternative to platform threads is a worthwhile consideration.

In practical scenarios where the number of available cores matches the workload, employing application threads instead of an additional Micrometer thread could lead to improved performance. This optimization could be particularly beneficial for systems with limited resources or high concurrency demands.

Ultimately, it's essential for users to evaluate and determine the most suitable approach based on their specific environment and requirements. Virtual threads present a promising option for optimizing CPU resource utilization in Micrometer, and their adoption warrants further exploration and consideration.

@jonatan-ivanov
Copy link
Member

If I want to talk to ChatGPT, I can just open it. ;)

It's evident that Micrometer's current use of two platform threads, which perform intermittent work and then sleep until the next metrics collection, is not optimal.

Oh yepp, you are right, there are two platform threads.
I'm happy this is evident for you (or for ChatGPT) but I'm not sure it is evident for me, could you please elaborate? You are claiming that using platform threads in this scenario has a performance impact and it is not optimal but you haven't expressed why.
Also, if my understanding is correct, the docs I linked disagree with you.

These threads remain mostly idle, resulting in inefficient CPU resource utilization

I'm not sure how you drew that conclusion and also I don't get what you mean by "inefficient CPU resource utilization". Let's say you have 8 cores. If you create 8 threads and block all of them, the CPU will continue its work happily, those blocked threads will not really affect it.

Transitioning this task to Java's virtual threads, such as those provided by ForkJoinPool, could significantly enhance efficiency. By leveraging virtual threads, we could reuse existing threads from the ForkJoinPool for the duration of the metric collection task, minimizing resource wastage.

Again, you are claiming something ("significantly enhance efficiency") without explaining why. Btw those virtual threads now are competing with other virtual threads that are likely processing your business logic so metrics publishing can now disrupt your app which is not the case with platform threads.

Furthermore, reducing the number of platform threads in the JVM process aligns with the goal of minimizing system scheduling overhead, which can be significant.

Can you back this claim: "which can be significant". I think we are still talking about two platform threads. If that significantly impacts your application you might have way bigger problems.

While it's important to acknowledge that the effectiveness of this approach may vary across different environments, exploring the use of virtual threads as an alternative to platform threads is a worthwhile consideration.

It's worthwhile if your use-case justifies using virtual threads. Please check the docs I linked above. I don't think using virtual threads are justified here.

In practical scenarios where the number of available cores matches the workload, employing application threads instead of an additional Micrometer thread could lead to improved performance. This optimization could be particularly beneficial for systems with limited resources or high concurrency demands.

Ultimately, it's essential for users to evaluate and determine the most suitable approach based on their specific environment and requirements. Virtual threads present a promising option for optimizing CPU resource utilization in Micrometer, and their adoption warrants further exploration and consideration.

Please back your claims with explanation/evidence.

Do you want to create a perf test where you can show us the inefficient CPU resource utilization, metrics collection being not optimal, the significantly enhanced efficiency, minimizing resource wastage, the significant system scheduling overhead and all the other claims?

@wyhasany
Copy link
Author

Hi @jonatan-ivanov,

I'd like to apologize for the tone of my last message. It was a mistake to write it so late in the evening. 😞

I acknowledge that my previous assumptions were too strong without sufficient evidence, which I currently lack. Here's my understanding of the situation, but please correct me if I'm mistaken.

Let's consider a scenario where I have 8 cores, which I primarily utilize with 8 active worker threads from ForkJoinPool (virtual threads). Additionally, I run 2 extra threads for micrometer once a minute. Although these micrometer threads complete their work in milliseconds, they occupy platform threads continuously. The kernel, using timers, schedules these threads, momentarily taking one from the ForkJoinPool threads and allocating it to a micrometer thread. Once its task or time quantum is complete, it switches back to the ForkJoinPool, resulting in unnecessary context switches and time wastage. Scheduling these threads on virtual threads should prevent context switches in the middle of processing client requests (as far as I understand how non-preemptive threads work). For instance, if a client request thread is blocked on an IO operation, the Java scheduler can allocate work to the micrometer thread at a more suitable time, potentially enhancing efficiency. Furthermore, in this setup, the system scheduler doesn't need to replace the entire thread on the CPU.

The latest version of Spring Boot creates a scheduled executor based on virtual threads. I interpret this as a method to minimize the usage of platform threads and allow Java to schedule tasks more efficiently.

In both cases, we inevitably impact the application's performance by collecting and sending metrics. When using platform threads, the system schedules a thread to collect and send metrics, whereas with virtual threads, Java handles this task. According to JEP-444:

To put it another way, virtual threads can significantly improve application throughput when

  • The number of concurrent tasks is high (more than a few thousand), and
  • The workload is not CPU-bound, since having many more threads than processor cores cannot improve throughput in that case.

Given that these micrometer threads are not CPU-bound (is is correct assumption?), it seems reasonable to consider this approach. While there might not be many such threads, it could be beneficial not to interrupt client request threads to handle metrics but rather wait until these threads are blocked on IO operations.

Please correct any misunderstandings in my reasoning. Thank you for your time.

@wyhasany
Copy link
Author

I believe virtual threads may also degrade performance because they are not preemptive, meaning they are not scheduled after a specific time quantum. In such a scenario, if the collection and sending of metrics take a long time, client-requested virtual threads will have to wait for these processes to finish. If platform threads were used instead, Micrometer would be scheduled multiple times to complete their tasks. Therefore, there is no single best solution that fits all scenarios. This is why I think it would be beneficial to allow clients to configure the thread factory.

@jonatan-ivanov
Copy link
Member

No worries, I don't think there were any issues with the tone. :)

they occupy platform threads continuously.

Those two Micrometer threads are the platform threads, they are not separate things, the two threads that Micrometer creates are platform threads.

The kernel, using timers, schedules these threads, momentarily taking one from the ForkJoinPool threads and allocating it to a micrometer thread. Once its task or time quantum is complete, it switches back to the ForkJoinPool, resulting in unnecessary context switches and time wastage.

I think by "ForkJoinPool threads" you mean the JVM common pool but that part should not be the case. The common pool is untouched. We create two new platform threads and use them every once in a while, no other threads or pools are impacted.

Scheduling these threads on virtual threads should prevent context switches in the middle of processing client requests (as far as I understand how non-preemptive threads work). For instance, if a client request thread is blocked on an IO operation, the Java scheduler can allocate work to the micrometer thread at a more suitable time, potentially enhancing efficiency. Furthermore, in this setup, the system scheduler doesn't need to replace the entire thread on the CPU.

The CPU has limited resources, virtual threads are not free lunch, somebody need to schedule the work, switch context, etc. If all of the client request threads are blocked the JVM can still operate, those threads are doing nothing but waiting. In this scenario the two Micrometer background (platform) threads can happily do whatever they need to without affecting anything really. Also, I don't mind having two OS thread context switches one per minute (otherwise it would be mounting and unmounting two virtual threads on platform threads and the context switch is still likely).

The latest version of Spring Boot creates a scheduled executor based on virtual threads. I interpret this as a method to minimize the usage of platform threads and allow Java to schedule tasks more efficiently.

application-properties.core.spring.threads.virtual.enabled is false by default so it does not. Also, and this is more important: the use-case is very different. In the use case of processing a lot of concurrent requests, using virtual threads can be a good idea. Let me quote again from the official docs:

Use virtual threads in high-throughput concurrent applications, especially those that consist of a great number of concurrent tasks that spend much of their time waiting.
[...]
Virtual threads are not faster threads; they do not run code any faster than platform threads. They exist to provide scale (higher throughput), not speed (lower latency).

I really think this is not applicable in the use-case of Micrometer publishing once per minute: no high-throughput, no high concurrency and no scalability concerns but everything can be applicable in the case where Boot uses them: processing incoming requests.

In both cases, we inevitably impact the application's performance by collecting and sending metrics. When using platform threads, the system schedules a thread to collect and send metrics, whereas with virtual threads, Java handles this task.

This is true no matter what you do. There is a CPU that needs to do some work in order this to be happen. Virtual threads are not magical things that can pull out CPU cycles out of thin air.

According to JEP-444:

To put it another way, virtual threads can significantly improve application throughput when

  • The number of concurrent tasks is high (more than a few thousand), and
  • The workload is not CPU-bound, since having many more threads than processor cores cannot improve throughput in that case.

Given that these micrometer threads are not CPU-bound (is is correct assumption?), it seems reasonable to consider this approach. While there might not be many such threads, it could be beneficial not to interrupt client request threads to handle metrics but rather wait until these threads are blocked on IO operations.

The two Micrometer threads are not CPU bound but they also don't have a need to improve the throughput (they run 1/min), publishing also not a concurrent task, there can be only one publishing task at a time.

At this point, let me go back to the original issue you raised. I think users should be able to use whatever ThreadFactory they want to and Micrometer should continue to work. I'm not against virtual threads, quite the opposite. But iirc no one asked for this except this issue and the justification here might not worth the effort for us to dig deeper and fix it (although it might have been less I spent on writing comments :D). Let me close this issue but please feel free to create a perf test where you show us that the application performs better if the two Micrometer threads at publishing are virtual or please feel free to investigate why the JVM freezes if you use a virtual ThreadFactory, if the fix is simple enough we would definitely consider accepting a PR.

@jonatan-ivanov jonatan-ivanov closed this as not planned Won't fix, can't repro, duplicate, stale Mar 15, 2024
@jonatan-ivanov jonatan-ivanov added question A user question, probably better suited for StackOverflow and removed waiting for feedback We need additional information before we can continue labels Mar 15, 2024
This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question A user question, probably better suited for StackOverflow
Projects
None yet
Development

No branches or pull requests

2 participants