New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Add the random fixed size exemplar reservoir #4852

Merged

MrAlias merged 18 commits into open-telemetry:main from MrAlias:add-rand-fixed-size-res

Jan 29, 2024

Contributor

MrAlias commented Jan 24, 2024 •

edited

Loading

Part of #559

This PR is split from #4455

This adds a Reservoir implementation that will randomly sample a specified number of measurements as exemplars. An explanation of the algorithm is copied from the included comment within the PR:

The following algorithm is "Algorithm L" from Li, Kim-Hung (4 December 1994). "Reservoir-Sampling Algorithms of Time Complexity O(n(1+log(N/n)))". ACM Transactions on Mathematical Software. 20 (4): 481–493 (https://dl.acm.org/doi/10.1145/198429.198435).

A high-level overview of "Algorithm L":

Pre-calculate the random count greater than the storage size when an exemplar will be replaced.

Accept all measurements offered until the configured storage size is reached.

Loop:
a) When the pre-calculate count is reached, replace a random existing exemplar with the offered measurement.
b) Calculate the next random count greater than the existing one which will replace another exemplars

The way a "replacement" count is computed is by looking at n number of independent random numbers each corresponding to an offered measurement. Of these numbers the smallest k (the same size as the storage capacity) of them are kept as a subset. The maximum value in this subset, called w is used to weight another random number generation for the next count that will be considered.

By weighting the next count computation like described, it is able to perform a uniformly-weighted sampling algorithm based on the number of samples the reservoir has seen so far. The sampling will "slow down" as more and more samples are offered so as to reduce a bias towards those offered just prior to the end of the collection.

This algorithm is preferred because of its balance of simplicity and performance. It will compute three random numbers (the bulk of computation time) for each item that becomes part of the reservoir, but it does not spend any time on items that do not. In particular it has an asymptotic runtime of O(k(1 + log(n/k)) where n is the number of measurements offered and k is the reservoir size.

See https://en.wikipedia.org/wiki/Reservoir_sampling for an overview of this and other reservoir sampling algorithms. See https://github.com/MrAlias/reservoir-sampling for a performance comparison of reservoir sampling algorithms.


          Add the random fixed size exemplar reservoir

1aacb9d

MrAlias requested review from Aneurysm9, evantorrie, XSAM, dashpole, MadVikingGod, pellared, hanyuancheung and dmathieu as code owners

January 24, 2024 16:17

MrAlias added the Skip Changelog label

MrAlias added this to the v1.23.0 milestone


          Rename fixed.go to storage.go

51da2b9

codecov bot commented Jan 24, 2024 •

edited

Loading

Codecov Report

Attention: 3 lines in your changes are missing coverage. Please review.

Comparison is base (ce3faf1) 82.3% compared to head (bd024cb) 82.5%.

Additional details and impacted files

@@           Coverage Diff           @@
##            main   #4852     +/-   ##
=======================================
+ Coverage   82.3%   82.5%   +0.1%     
=======================================
  Files        228     230      +2     
  Lines      18569   18754    +185     
=======================================
+ Hits       15300   15480    +180     
- Misses      2981    2985      +4     
- Partials     288     289      +1

Files	Coverage Δ
sdk/metric/internal/exemplar/storage.go	`100.0% <100.0%> (ø)`
sdk/metric/internal/exemplar/rand.go	`97.7% <97.7%> (ø)`

... and 1 file with indirect coverage changes

dashpole reviewed

View reviewed changes

sdk/metric/internal/exemplar/rand.go Outdated Show resolved Hide resolved

sdk/metric/internal/exemplar/rand.go Outdated Show resolved Hide resolved

sdk/metric/internal/exemplar/storage.go Show resolved Hide resolved

sdk/metric/internal/exemplar/rand.go Outdated Show resolved Hide resolved

sdk/metric/internal/exemplar/rand.go Show resolved Hide resolved

sdk/metric/internal/exemplar/rand.go Show resolved Hide resolved

sdk/metric/internal/exemplar/rand.go Show resolved Hide resolved

MrAlias and others added 4 commits

January 24, 2024 13:19


          Update sdk/metric/internal/exemplar/rand.go

d43fee3

Co-authored-by: David Ashpole <dashpole@google.com>


          Remove stale ref to spec recommendation

498f3d7


          Add comments to clarify the reset/advance/Collect methods

b8af1e7


          Apply comment from feedback

1150d3d

MrAlias commented

View reviewed changes

sdk/metric/internal/exemplar/rand.go Outdated Show resolved Hide resolved


          Add random func to gen rand float64 on (0,1)

81d4492

MrAlias force-pushed the add-rand-fixed-size-res branch from 0edbb7c to 81d4492 Compare

January 25, 2024 16:52

dashpole reviewed

View reviewed changes

sdk/metric/internal/exemplar/rand.go Show resolved Hide resolved

MrAlias added 3 commits

January 25, 2024 11:56


          Use random in TestFixedSizeSamplingCorrectness

c438683


          Add clarifying algorithm comments

fef8df6

Include a high-level overview of the algorithm implemented and clarify
parameter names to be consistent.


          Fix duplicate word

24b8b0f

MrAlias commented

View reviewed changes

sdk/metric/internal/exemplar/rand.go Outdated Show resolved Hide resolved

MrAlias added 3 commits

January 25, 2024 12:03


          Update sdk/metric/internal/exemplar/rand.go

efa02aa


          Merge branch 'main' into add-rand-fixed-size-res

b5fb430


          Merge branch 'main' into add-rand-fixed-size-res

d8d279c

pellared reviewed

View reviewed changes

sdk/metric/internal/exemplar/rand_test.go Outdated Show resolved Hide resolved

sdk/metric/internal/exemplar/rand_test.go Show resolved Hide resolved

sdk/metric/internal/exemplar/rand.go Show resolved Hide resolved

MrAlias added 2 commits

January 26, 2024 11:19


          Comment TestFixedSizeSamplingCorrectness

885e44e


          Update test delta

4af0c25

pellared approved these changes

View reviewed changes

sdk/metric/internal/exemplar/storage.go Outdated Show resolved Hide resolved

sdk/metric/internal/exemplar/storage.go Show resolved Hide resolved

MrAlias added 2 commits

January 26, 2024 12:54


          Test collect less than cap

001bdc8


          Remove measurement.Valid method

7dfbabe

pellared approved these changes

View reviewed changes

dmathieu approved these changes

View reviewed changes


          Merge branch 'main' into add-rand-fixed-size-res

bd024cb

hanyuancheung approved these changes

View reviewed changes

dashpole approved these changes

View reviewed changes

MrAlias merged commit dcfec0c into open-telemetry:main

25 checks passed

MrAlias deleted the add-rand-fixed-size-res branch

January 29, 2024 15:26

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Reviewers

dmathieu dmathieu approved these changes

dashpole dashpole approved these changes

pellared pellared approved these changes

hanyuancheung hanyuancheung approved these changes

Aneurysm9 Awaiting requested review from Aneurysm9

evantorrie Awaiting requested review from evantorrie

XSAM Awaiting requested review from XSAM XSAM is a code owner

MadVikingGod Awaiting requested review from MadVikingGod MadVikingGod is a code owner

Labels