ExemplarReservoir default and performance hits #3952

cijothomas · 2024-03-21T15:44:44Z

This talks about Exemplar defaults overall, this is a related, but different issue, related to Exemplars.

Based on stress tests/benchmarks in OTel .NET, turning on exemplars drops perf by 10-30% (For Histograms, the drop is less, but Counters/other-sync-instruments suffer the worst perf drop). As per the current wording, the default Reservoir to be used for every non Histograms is the SimpleFixedSizeExemplarReservoir.

Consider the following scenario:

When a end users upgrades to the newer SDKs (supporting Exemplars), and turn the feature on (whether its on/off by default could be clarified separately), there is perf drop for all instruments. I think, a common scenario is - users are interested in Exemplars for their latency metric (Histogram), but not so much for other instruments like counters/etc. But all their instruments now pay for the cost of Exemplars.
It is not possible to turn off ExemplarFilter per instrument, as ExemplarFilter is applied at provider level
Users can use Views to change ExemplarReservoir per instrument, however, there is no "no-op" ExemplarReservoir in the spec. This forces users to write one themselves, and then selectively apply that to instruments.

Some thoughts on fixing this:

Not do anything in the spec, and let languages solve it with different defaults (if perf drop is significant, and is a concern for a language)
Change the defaults and make Exemplars opt-in on a instrument-by-instrument basis. So users can selectively enable it for those instruments where they see value.
Introduce No-OpExemplarReservoir part of the spec, and make it the default for non-Histogram instruments, assuming Histograms is where most users are interested in Exemplars, and the fact that perf drop is most significant for non-histograms (.NET example. would love to see if other languages observe similar).
Others/please share!

The text was updated successfully, but these errors were encountered:

cijothomas · 2024-03-21T15:56:47Z

Additional note about non-histograms (counters/gauge/updowncounter):
The aggregation for these instruments are only updating a single value (Sum or LastValue), which can be done using atomic-instructions like Interlock.Add in .NET. But when Exemplars are required, it requires updating more than one value (the Sum + Exemplars itself), making it difficult to rely on atomic-instructions, forcing the fallback to more expensive locking mechanisms. It might be solvable to a certain extent, but the default Reservoir algorithm also needs a Random number generation in the hot path, which is also a non-trivial cost!

jsuereth · 2024-03-27T14:33:03Z

So, 10-30% seems pretty high. I agree that's unacceptable to default-on to users as a surprise release.

Are there any optimizations possible on non-Histogram instruments for .NET? E.g. in Java, the reservoir has size == maximum concurrency possible and we're using a thread-local RNG, which greatly reduced lock contention overall. I wonder if there's a way we can reduce overhead here, as we'd like to keep overhead low in exemplars, generally (regardless of default on).
I think we could entertain only default-on exemplars for Histograms, as this is the metric where exemplars provide the most value. I'd like to hear from other WG/SIGs their own thoughts + concerns here. I think we could default-on Histograms without Sums/Gauges given Views can be used to turn on exemplars later. I like the idea of filtering w/ instrument type, but again, I'd like to hear from other WG/SiGs on this first.
Regarding a noop exemplar reservoir, a Fixed-size of 0 is the current no-op. I'd be happy to add an explicit no-op if needed for optimisation/clarity reasons.

Thanks @cijothomas for this feedback! Let me know your preferences around 2 + 3.

cijothomas · 2024-03-27T16:40:40Z

For 1. @CodeBlanch is the perf expert and is already actively working on that! We have thread-local random generator and other usual perf suspects taken care of.
Based on this, there may not be much we can do for non-histograms. I'll let Blanch share his thoughts.

I'd also love to see the overhead in other languages for non-histograms. (I'll share .NET perf numbers here for easy comparison shortly).

CodeBlanch · 2024-03-27T18:26:10Z

Here are some benchmark numbers for counter in .NET. These have all of the optimizations I have come up with so far.

Method	ExemplarFilterType	Mean	Error	StdDev
CounterHotPath	AlwaysOff	10.77 ns	0.130 ns	0.122 ns
CounterWith1LabelsHotPath	AlwaysOff	40.03 ns	0.250 ns	0.234 ns
CounterWith2LabelsHotPath	AlwaysOff	55.87 ns	0.550 ns	0.459 ns
CounterWith3LabelsHotPath	AlwaysOff	74.86 ns	0.860 ns	0.763 ns
CounterWith4LabelsHotPath	AlwaysOff	96.90 ns	1.025 ns	0.959 ns
CounterWith5LabelsHotPath	AlwaysOff	115.42 ns	1.938 ns	1.718 ns
CounterHotPath	AlwaysOn	16.89 ns	0.117 ns	0.103 ns
CounterWith1LabelsHotPath	AlwaysOn	46.20 ns	0.910 ns	0.851 ns
CounterWith2LabelsHotPath	AlwaysOn	62.59 ns	0.527 ns	0.493 ns
CounterWith3LabelsHotPath	AlwaysOn	81.71 ns	0.949 ns	0.888 ns
CounterWith4LabelsHotPath	AlwaysOn	105.54 ns	1.610 ns	1.506 ns
CounterWith5LabelsHotPath	AlwaysOn	122.92 ns	1.894 ns	1.772 ns

On my box turning on exemplars has a constant ~6ns perf hit. For the "With*Labels" runs it is less significant because the time spent finding the metric point to update for the supplied tags dwarfs the exemplar costs. But for "CounterHotPath" where there are no tags supplied, we optimize/cheat and bypass the lookup using a known metric point index. For that case the 6ns seems really expensive compared to the overall processing time.

cijothomas · 2024-03-28T01:48:23Z

I think we could entertain only default-on exemplars for Histograms, as this is the metric where exemplars provide the most value. I'd like to hear from other WG/SIGs their own thoughts + concerns here. I think we could default-on Histograms without Sums/Gauges given Views can be used to turn on exemplars later. I like the idea of filtering w/ instrument type, but again, I'd like to hear from other WG/SiGs on this first.

Regarding a noop exemplar reservoir, a Fixed-size of 0 is the current no-op. I'd be happy to add an explicit no-op if needed for optimisation/clarity reasons.

I support this! (must be obvious, as I opened the issue 🤣) Specifically, the default for non-histograms should be no-op, and users can change it via Views.
It is not clear to an end-user that a FixedSize=0 will give them the no-op behavior, so I'd like it if spec made it very explicit.

jsuereth · 2024-04-02T13:48:42Z

Default filter for Exemplars is "WithTrace", not AlwaysOn. Do you have a benchmark w/ our default of 10% trace sampling?

I assume it should cut down the overhead by a good bit. Also, we should compare to label-based lookup given we expect labels in otel.

jack-berg · 2024-04-10T16:12:05Z

Discussed at the 4/10/24 TC meeting:

Not do anything in the spec, and let languages solve it with different defaults (if perf drop is significant, and is a concern for a language)

Change the defaults and make Exemplars opt-in on a instrument-by-instrument basis. So users can selectively enable it for those instruments where they see value.

Introduce No-OpExemplarReservoir part of the spec, and make it the default for non-Histogram instruments, assuming Histograms is where most users are interested in Exemplars, and the fact that perf drop is most significant for non-histograms (.NET example. would love to see if other languages observe similar).

This are good recommendations but they should be opened at 3 separate issues to be considered independently. @cijothomas I'm going to go ahead and close this - can you open separate issues for these?

CodeBlanch · 2024-04-19T18:12:55Z

@jsuereth Just in case you are curious, I have more benchmarks available here: open-telemetry/opentelemetry-dotnet#5545

There are perf hits with or without an active trace. With an active trace the hits are bigger. SimpleFixedSizeExemplarReservoir has a lower hit because most measurements after the first one are dropped. AlignedHistogramBucketExemplarReservoir has a much bigger perf hit due to the way it is designed to always record the last exemplar for a given bucket.

cijothomas added the spec:metrics Related to the specification/metrics directory label Mar 21, 2024

github-actions bot assigned reyang Mar 21, 2024

cijothomas mentioned this issue Mar 21, 2024

Mark exemplars as stable. #3870

Merged

5 tasks

austinlparker unassigned reyang Mar 26, 2024

trask added triage:deciding:community-feedback triage:deciding:tc-inbox labels Mar 26, 2024

jack-berg assigned jsuereth Apr 10, 2024

jack-berg removed the triage:deciding:tc-inbox label Apr 10, 2024

jack-berg closed this as completed Apr 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ExemplarReservoir default and performance hits #3952

ExemplarReservoir default and performance hits #3952

cijothomas commented Mar 21, 2024 •

edited

Loading

cijothomas commented Mar 21, 2024 •

edited

Loading

jsuereth commented Mar 27, 2024

cijothomas commented Mar 27, 2024

CodeBlanch commented Mar 27, 2024

cijothomas commented Mar 28, 2024

jsuereth commented Apr 2, 2024

jack-berg commented Apr 10, 2024 •

edited

Loading

CodeBlanch commented Apr 19, 2024

ExemplarReservoir default and performance hits #3952

ExemplarReservoir default and performance hits #3952

Comments

cijothomas commented Mar 21, 2024 • edited Loading

cijothomas commented Mar 21, 2024 • edited Loading

jsuereth commented Mar 27, 2024

cijothomas commented Mar 27, 2024

CodeBlanch commented Mar 27, 2024

cijothomas commented Mar 28, 2024

jsuereth commented Apr 2, 2024

jack-berg commented Apr 10, 2024 • edited Loading

CodeBlanch commented Apr 19, 2024

cijothomas commented Mar 21, 2024 •

edited

Loading

cijothomas commented Mar 21, 2024 •

edited

Loading

jack-berg commented Apr 10, 2024 •

edited

Loading