Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ExemplarReservoir default and performance hits #3952

Closed
cijothomas opened this issue Mar 21, 2024 · 8 comments
Closed

ExemplarReservoir default and performance hits #3952

cijothomas opened this issue Mar 21, 2024 · 8 comments
Assignees
Labels
spec:metrics Related to the specification/metrics directory triage:deciding:community-feedback

Comments

@cijothomas
Copy link
Member

cijothomas commented Mar 21, 2024

This talks about Exemplar defaults overall, this is a related, but different issue, related to Exemplars.

Based on stress tests/benchmarks in OTel .NET, turning on exemplars drops perf by 10-30% (For Histograms, the drop is less, but Counters/other-sync-instruments suffer the worst perf drop). As per the current wording, the default Reservoir to be used for every non Histograms is the SimpleFixedSizeExemplarReservoir.

Consider the following scenario:

  1. When a end users upgrades to the newer SDKs (supporting Exemplars), and turn the feature on (whether its on/off by default could be clarified separately), there is perf drop for all instruments. I think, a common scenario is - users are interested in Exemplars for their latency metric (Histogram), but not so much for other instruments like counters/etc. But all their instruments now pay for the cost of Exemplars.
  2. It is not possible to turn off ExemplarFilter per instrument, as ExemplarFilter is applied at provider level
  3. Users can use Views to change ExemplarReservoir per instrument, however, there is no "no-op" ExemplarReservoir in the spec. This forces users to write one themselves, and then selectively apply that to instruments.

Some thoughts on fixing this:

  1. Not do anything in the spec, and let languages solve it with different defaults (if perf drop is significant, and is a concern for a language)
  2. Change the defaults and make Exemplars opt-in on a instrument-by-instrument basis. So users can selectively enable it for those instruments where they see value.
  3. Introduce No-OpExemplarReservoir part of the spec, and make it the default for non-Histogram instruments, assuming Histograms is where most users are interested in Exemplars, and the fact that perf drop is most significant for non-histograms (.NET example. would love to see if other languages observe similar).
  4. Others/please share!
@cijothomas cijothomas added the spec:metrics Related to the specification/metrics directory label Mar 21, 2024
@cijothomas
Copy link
Member Author

cijothomas commented Mar 21, 2024

Additional note about non-histograms (counters/gauge/updowncounter):
The aggregation for these instruments are only updating a single value (Sum or LastValue), which can be done using atomic-instructions like Interlock.Add in .NET. But when Exemplars are required, it requires updating more than one value (the Sum + Exemplars itself), making it difficult to rely on atomic-instructions, forcing the fallback to more expensive locking mechanisms. It might be solvable to a certain extent, but the default Reservoir algorithm also needs a Random number generation in the hot path, which is also a non-trivial cost!

@jsuereth
Copy link
Contributor

So, 10-30% seems pretty high. I agree that's unacceptable to default-on to users as a surprise release.

  1. Are there any optimizations possible on non-Histogram instruments for .NET? E.g. in Java, the reservoir has size == maximum concurrency possible and we're using a thread-local RNG, which greatly reduced lock contention overall. I wonder if there's a way we can reduce overhead here, as we'd like to keep overhead low in exemplars, generally (regardless of default on).
  2. I think we could entertain only default-on exemplars for Histograms, as this is the metric where exemplars provide the most value. I'd like to hear from other WG/SIGs their own thoughts + concerns here. I think we could default-on Histograms without Sums/Gauges given Views can be used to turn on exemplars later. I like the idea of filtering w/ instrument type, but again, I'd like to hear from other WG/SiGs on this first.
  3. Regarding a noop exemplar reservoir, a Fixed-size of 0 is the current no-op. I'd be happy to add an explicit no-op if needed for optimisation/clarity reasons.

Thanks @cijothomas for this feedback! Let me know your preferences around 2 + 3.

@cijothomas
Copy link
Member Author

For 1. @CodeBlanch is the perf expert and is already actively working on that! We have thread-local random generator and other usual perf suspects taken care of.
Based on this, there may not be much we can do for non-histograms. I'll let Blanch share his thoughts.

I'd also love to see the overhead in other languages for non-histograms. (I'll share .NET perf numbers here for easy comparison shortly).

@CodeBlanch
Copy link
Member

Here are some benchmark numbers for counter in .NET. These have all of the optimizations I have come up with so far.

Method ExemplarFilterType Mean Error StdDev
CounterHotPath AlwaysOff 10.77 ns 0.130 ns 0.122 ns
CounterWith1LabelsHotPath AlwaysOff 40.03 ns 0.250 ns 0.234 ns
CounterWith2LabelsHotPath AlwaysOff 55.87 ns 0.550 ns 0.459 ns
CounterWith3LabelsHotPath AlwaysOff 74.86 ns 0.860 ns 0.763 ns
CounterWith4LabelsHotPath AlwaysOff 96.90 ns 1.025 ns 0.959 ns
CounterWith5LabelsHotPath AlwaysOff 115.42 ns 1.938 ns 1.718 ns
CounterHotPath AlwaysOn 16.89 ns 0.117 ns 0.103 ns
CounterWith1LabelsHotPath AlwaysOn 46.20 ns 0.910 ns 0.851 ns
CounterWith2LabelsHotPath AlwaysOn 62.59 ns 0.527 ns 0.493 ns
CounterWith3LabelsHotPath AlwaysOn 81.71 ns 0.949 ns 0.888 ns
CounterWith4LabelsHotPath AlwaysOn 105.54 ns 1.610 ns 1.506 ns
CounterWith5LabelsHotPath AlwaysOn 122.92 ns 1.894 ns 1.772 ns

On my box turning on exemplars has a constant ~6ns perf hit. For the "With*Labels" runs it is less significant because the time spent finding the metric point to update for the supplied tags dwarfs the exemplar costs. But for "CounterHotPath" where there are no tags supplied, we optimize/cheat and bypass the lookup using a known metric point index. For that case the 6ns seems really expensive compared to the overall processing time.

@cijothomas
Copy link
Member Author

  1. I think we could entertain only default-on exemplars for Histograms, as this is the metric where exemplars provide the most value. I'd like to hear from other WG/SIGs their own thoughts + concerns here. I think we could default-on Histograms without Sums/Gauges given Views can be used to turn on exemplars later. I like the idea of filtering w/ instrument type, but again, I'd like to hear from other WG/SiGs on this first.
  2. Regarding a noop exemplar reservoir, a Fixed-size of 0 is the current no-op. I'd be happy to add an explicit no-op if needed for optimisation/clarity reasons.
  1. I support this! (must be obvious, as I opened the issue 🤣) Specifically, the default for non-histograms should be no-op, and users can change it via Views.
  2. It is not clear to an end-user that a FixedSize=0 will give them the no-op behavior, so I'd like it if spec made it very explicit.

@jsuereth
Copy link
Contributor

jsuereth commented Apr 2, 2024

Default filter for Exemplars is "WithTrace", not AlwaysOn. Do you have a benchmark w/ our default of 10% trace sampling?

I assume it should cut down the overhead by a good bit. Also, we should compare to label-based lookup given we expect labels in otel.

@jack-berg
Copy link
Member

jack-berg commented Apr 10, 2024

Discussed at the 4/10/24 TC meeting:

  1. Not do anything in the spec, and let languages solve it with different defaults (if perf drop is significant, and is a concern for a language)
  2. Change the defaults and make Exemplars opt-in on a instrument-by-instrument basis. So users can selectively enable it for those instruments where they see value.
  3. Introduce No-OpExemplarReservoir part of the spec, and make it the default for non-Histogram instruments, assuming Histograms is where most users are interested in Exemplars, and the fact that perf drop is most significant for non-histograms (.NET example. would love to see if other languages observe similar).

This are good recommendations but they should be opened at 3 separate issues to be considered independently. @cijothomas I'm going to go ahead and close this - can you open separate issues for these?

@CodeBlanch
Copy link
Member

@jsuereth Just in case you are curious, I have more benchmarks available here: open-telemetry/opentelemetry-dotnet#5545

There are perf hits with or without an active trace. With an active trace the hits are bigger. SimpleFixedSizeExemplarReservoir has a lower hit because most measurements after the first one are dropped. AlignedHistogramBucketExemplarReservoir has a much bigger perf hit due to the way it is designed to always record the last exemplar for a given bucket.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
spec:metrics Related to the specification/metrics directory triage:deciding:community-feedback
Projects
None yet
Development

No branches or pull requests

6 participants