-
Notifications
You must be signed in to change notification settings - Fork 979
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use GaugeHistogram for Prometheus LongTaskTimer #4988
Comments
Thanks @snicoll for catching this and for the reproducer. Thanks @mhalbritter for the issue and for further investigating this. WorkaroundIt seems the issue is happening when a
Micrometer ReproducerI was able to come up with a reproducer without Boot: @Test
void lttWithHistogram() {
LongTaskTimer longTaskTimer = LongTaskTimer.builder("test")
.publishPercentileHistogram()
.register(registry);
Sample sample = longTaskTimer.start();
assertThat(registry.scrape()).isNotBlank();
sample.stop();
Sample sample2 = longTaskTimer.start();
assertThat(registry.scrape()).isNotBlank();
sample2.stop();
} or (slightly simpler): @Test
void lttWithHistogram() {
LongTaskTimer longTaskTimer = LongTaskTimer.builder("test")
.publishPercentileHistogram()
.register(registry);
Sample sample = longTaskTimer.start();
assertThat(registry.scrape()).isNotBlank();
assertThat(registry.scrape()).isNotBlank();
sample.stop();
} ( TroubleshootingIt seems we have a similar issue with Prometheus 0.x but because we don't need to calculate the value of the LongTaskTimer longTaskTimer = LongTaskTimer.builder("test")
.publishPercentileHistogram()
.register(registry);
Sample sample = longTaskTimer.start();
registry.scrape();
System.out.println(registry.scrape());
sample.stop(); I get this:
The logic is the following: Also, there seems to be another issue, calling scrape (snapshotting the histogram) multiple times somehow increases the bucket counts of a histogram, see:
There was only one recording, this should not be 2. I think "fixing" the exception might not be too hard but I also think that the real fix is using a "gauge histogram" (histogram counters can go up or down) instead of a "classic" cumulative histogram (monotonic counters) which might take a bit more effort. |
I fixed this by migrating to Prometheus GaugeHistogram, please see the commit message of d7b9d24 for details. Please notice that since
this is a breaking change. |
@snicoll I tried this with Boot locally, could you please check your demo with the latest snapshot? |
Original title:
See the bug reported against Spring Boot: spring-projects/spring-boot#40552
When debugging this,
counts
inio.micrometer.prometheusmetrics.PrometheusMeterRegistry#addDistributionStatisticSamples
, which is used to callClassicHistogramBuckets.of(buckets, counts)
contains at the last element a-1
:It entered this condition
if (Double.isFinite(histogramCounts[histogramCounts.length - 1].bucket())) {
, which setscount
is 1,histogramCounts[histogramCounts.length - 1].count()
is increasing with every request. This leads to a negative number.Version
micrometer-registry-prometheus
,1.13.0-RC1
.The text was updated successfully, but these errors were encountered: