[sdk-metrics] Turn exemplars on by default in prerelease builds #5545

CodeBlanch · 2024-04-17T23:04:19Z

Changes

Set the default ExemplarFilterType to TraceBased in prerelease builds to match spec.

Benchmarks

Counters (SimpleFixedSizeExemplarReservoir)

Using TraceBased WITHOUT an active trace has some cost (we need to check Activity.Current.Recorded in the hot path):

Method	AggregationTemporality	ExemplarFilterType	Mean	Cost Increased By
CounterHotPath	Cumulative	AlwaysOff	10.51 ns
CounterWith1LabelsHotPath	Cumulative	AlwaysOff	36.37 ns
CounterWith2LabelsHotPath	Cumulative	AlwaysOff	44.98 ns
CounterWith3LabelsHotPath	Cumulative	AlwaysOff	61.44 ns
CounterHotPath	Cumulative	TraceBased	11.29 ns	7.4%
CounterWith1LabelsHotPath	Cumulative	TraceBased	37.48 ns	3.1%
CounterWith2LabelsHotPath	Cumulative	TraceBased	46.68 ns	3.8%
CounterWith3LabelsHotPath	Cumulative	TraceBased	64.76 ns	5.4%

Using TraceBased WITH an active trace has more cost (we need to check Activity.Current.Recorded and do a random-based sample in the hot path):

Method	AggregationTemporality	ExemplarFilterType	Mean	Cost Increased By
CounterHotPath	Cumulative	AlwaysOff	10.44 ns
CounterWith1LabelsHotPath	Cumulative	AlwaysOff	36.86 ns
CounterWith2LabelsHotPath	Cumulative	AlwaysOff	45.82 ns
CounterWith3LabelsHotPath	Cumulative	AlwaysOff	61.02 ns
CounterHotPath	Cumulative	TraceBased	18.32 ns	75.4%
CounterWith1LabelsHotPath	Cumulative	TraceBased	46.46 ns	26.0%
CounterWith2LabelsHotPath	Cumulative	TraceBased	53.52 ns	14.6%
CounterWith3LabelsHotPath	Cumulative	TraceBased	70.00 ns	14.7%

Histograms (AlignedHistogramBucketExemplarReservoir)

Using TraceBased WITHOUT an active trace is interesting. Sometimes I run it things show faster, sometimes it shows slower, and sometimes it shows mixed. I take this as statistically no difference. The cost of the check for Activity.Current.Recorded is dwarfed by the other work to find the bucket and do all the updating:

Method	BoundCount	ExemplarFilterType	Mean	Cost Increased By
HistogramHotPath	10	AlwaysOff	37.37 ns
HistogramWith1LabelHotPath	10	AlwaysOff	65.99 ns
HistogramWith3LabelsHotPath	10	AlwaysOff	110.73 ns
HistogramHotPath	10	TraceBased	36.99 ns	less than 3%
HistogramWith1LabelHotPath	10	TraceBased	66.86 ns	less than 3%
HistogramWith3LabelsHotPath	10	TraceBased	113.68 ns	less than 3%

Using TraceBased WITH an active trace has a lot of cost (we need to check Activity.Current.Recorded and we always update exemplar for every measurement in the hot path):

Method	BoundCount	ExemplarFilterType	Mean	Cost Increased By
HistogramHotPath	10	AlwaysOff	39.93 ns
HistogramWith1LabelHotPath	10	AlwaysOff	71.05 ns
HistogramWith3LabelsHotPath	10	AlwaysOff	109.03 ns
HistogramHotPath	10	TraceBased	68.96 ns	72.7%
HistogramWith1LabelHotPath	10	TraceBased	96.48 ns	35.8%
HistogramWith3LabelsHotPath	10	TraceBased	153.84 ns	41.1%

This is an interesting area @cijothomas and I have discussed. The spec says for AlignedHistogramBucketExemplarReservoir always keep the last exemplar seen for a bucket. There's a lot of overriding as a result (wasted cycles). A simple thing to do would be keep only the first exemplar for a given export. Or do something more like SimpleFixedSizeExemplarReservoir where we always keep the first one then randomly decide whether or not to keep subsequent exemplars 🤔

Merge requirement checklist

CONTRIBUTING guidelines followed (license requirements, nullable enabled, static analysis, etc.)
Unit tests added/updated
Appropriate CHANGELOG.md files updated for non-trivial changes
Changes in public API reviewed (if applicable)

…r-on-by-default

codecov · 2024-04-17T23:10:21Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 85.61%. Comparing base (6250307) to head (4fc1510).
Report is 187 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #5545      +/-   ##
==========================================
+ Coverage   83.38%   85.61%   +2.23%     
==========================================
  Files         297      289       -8     
  Lines       12531    12493      -38     
==========================================
+ Hits        10449    10696     +247     
+ Misses       2082     1797     -285

Flag	Coverage Δ
unittests	`?`
unittests-Solution-Experimental	`85.57% <ø> (?)`
unittests-Solution-Stable	`85.26% <ø> (?)`

Flags with carried forward coverage won't be shown. Click here to find out more.

Files	Coverage Δ
src/OpenTelemetry/Metrics/AggregatorStore.cs	`86.96% <ø> (+6.58%)`	⬆️
.../Metrics/Builder/MeterProviderBuilderExtensions.cs	`98.42% <ø> (-1.58%)`	⬇️

... and 76 files with indirect coverage changes

src/OpenTelemetry/CHANGELOG.md

vishweshbankwar

LGTM - left a suggestion for changelog.

cijothomas

I am okay with this change, but I want to see if we can make a NoopExemplarReservoir, and make it as the default for non-histograms (spec is flexible to allow that) in the 1st stable release.

reyang · 2024-04-18T16:47:12Z

src/OpenTelemetry/CHANGELOG.md

@@ -12,6 +12,11 @@
  function when configuring a view (applies to individual metrics).
  ([#5542](https://github.com/open-telemetry/opentelemetry-dotnet/pull/5542))

+* **Experimental (pre-release builds only):** The default `ExemplarFilterType`
+  on `MeterProvider` is now `ExemplarFilterType.TraceBased` which will enable


What's the perf implications that the users should be aware of?

Let's wait on this. My guess is most users will skip over anything prefixed with **Experimental (pre-release builds only):** in the CHANGELOG. What I think would be more useful is on the final entry where we make everything public for stable builds we can add a link there to something in the docs. Thinking like: Understanding performance implications when sampling Exemplars.

I'm fine to address the changelog/doc later.

I think we still need to know the perf implication of this PR as it serves as a critical input while making decisions.

Updated the description with some benchmarks.

Could you also provide the increase in memory consumption per metric since we would now allocate an ExemplarReservoir instance for each MetricPoint? It could be an issue for histogram users with high bucket counts.

…r-on-by-default

cijothomas · 2024-04-18T22:57:40Z

I am okay with this change, but I want to see if we can make a NoopExemplarReservoir, and make it as the default for non-histograms (spec is flexible to allow that) in the 1st stable release.

After looking at the perf numbers, the overhead is non-trivial. So I recommend to keep it off by default for every metric. Users can opt-in to each metric (using views). (Not many backends/venodors are known to support exemplars.)

CodeBlanch · 2024-04-18T23:15:25Z

Going to keep exemplars off by default for now based on performance analysis.

CodeBlanch added 3 commits April 17, 2024 11:57

Turn exemplars on by default for prerelease builds to match spec.

307dead

Merge remote-tracking branch 'upstream/main' into sdk-metrics-exempla…

7a6cbb5

…r-on-by-default

Verify default exemplar filter in tests.

e12014f

CodeBlanch added pkg:OpenTelemetry Issues related to OpenTelemetry NuGet package metrics Metrics signal related labels Apr 17, 2024

CodeBlanch requested a review from a team April 17, 2024 23:04

CHANGELOG patch.

dc59a27

CodeBlanch mentioned this pull request Apr 17, 2024

Support exemplars in Metrics #2527

Closed

6 tasks

vishweshbankwar reviewed Apr 18, 2024

View reviewed changes

src/OpenTelemetry/CHANGELOG.md Outdated Show resolved Hide resolved

vishweshbankwar approved these changes Apr 18, 2024

View reviewed changes

cijothomas approved these changes Apr 18, 2024

View reviewed changes

reyang reviewed Apr 18, 2024

View reviewed changes

CodeBlanch added 5 commits April 18, 2024 10:00

Merge from main.

1333082

Test fix for stable builds.

2e6f141

Merge remote-tracking branch 'upstream/main' into sdk-metrics-exempla…

e722ea2

…r-on-by-default

Tweak CHANGELOG.

0235622

Merge from main.

4fc1510

cijothomas self-requested a review April 18, 2024 22:27

CodeBlanch closed this Apr 18, 2024

cijothomas mentioned this pull request Apr 19, 2024

Mark exemplars as stable. open-telemetry/opentelemetry-specification#3870

Merged

5 tasks

CodeBlanch mentioned this pull request Apr 19, 2024

ExemplarReservoir default and performance hits open-telemetry/opentelemetry-specification#3952

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[sdk-metrics] Turn exemplars on by default in prerelease builds #5545

[sdk-metrics] Turn exemplars on by default in prerelease builds #5545

CodeBlanch commented Apr 17, 2024 •

edited by reyang

Loading

codecov bot commented Apr 17, 2024 •

edited

Loading

vishweshbankwar left a comment

cijothomas left a comment

reyang Apr 18, 2024

CodeBlanch Apr 18, 2024

reyang Apr 18, 2024

CodeBlanch Apr 18, 2024

utpilla Apr 18, 2024 •

edited

Loading

cijothomas commented Apr 18, 2024

CodeBlanch commented Apr 18, 2024

[sdk-metrics] Turn exemplars on by default in prerelease builds #5545

[sdk-metrics] Turn exemplars on by default in prerelease builds #5545

Conversation

CodeBlanch commented Apr 17, 2024 • edited by reyang Loading

Changes

Benchmarks

Counters (SimpleFixedSizeExemplarReservoir)

Histograms (AlignedHistogramBucketExemplarReservoir)

Merge requirement checklist

codecov bot commented Apr 17, 2024 • edited Loading

Codecov Report

vishweshbankwar left a comment

Choose a reason for hiding this comment

cijothomas left a comment

Choose a reason for hiding this comment

reyang Apr 18, 2024

Choose a reason for hiding this comment

CodeBlanch Apr 18, 2024

Choose a reason for hiding this comment

reyang Apr 18, 2024

Choose a reason for hiding this comment

CodeBlanch Apr 18, 2024

Choose a reason for hiding this comment

utpilla Apr 18, 2024 • edited Loading

Choose a reason for hiding this comment

cijothomas commented Apr 18, 2024

CodeBlanch commented Apr 18, 2024

CodeBlanch commented Apr 17, 2024 •

edited by reyang

Loading

codecov bot commented Apr 17, 2024 •

edited

Loading

utpilla Apr 18, 2024 •

edited

Loading