Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

receiver/prometheus: add store to track stale metrics #3414

Conversation

odeke-em
Copy link
Member

Implements a simple store that will be used to record stale metrics;
those that are NOT present in the current scrape, but were in the prior
scrape. This logic will then be attached to our receiver logic
to then complete the mechanism and then issue stale markers.

Updates #3413
Updates #2961

@odeke-em odeke-em requested a review from alolita as a code owner June 10, 2021 12:56
@odeke-em odeke-em requested review from a team and owais June 10, 2021 12:56
@odeke-em odeke-em force-pushed the receiver-prometheus-addStalenessmarkersStore branch from 72934eb to 04ac4d6 Compare June 10, 2021 13:00
@odeke-em
Copy link
Member Author

If you'd like the big picture, please see https://github.com/orijtech/opentelemetry-collector/pull/new/receiver-prometheus-add-staleness-tracking+emit-staleness-markers and after it the remotewrite compliance test for Staleness passes with

        --- PASS: TestRemoteWrite/otelcollector/Staleness (10.04s)

Kindly cc-ing related reviewers @dashpole @rakyll @Aneurysm9 @bogdandrutu @tigrannajaryan.

@odeke-em odeke-em force-pushed the receiver-prometheus-addStalenessmarkersStore branch from 04ac4d6 to ef1d34d Compare June 10, 2021 13:14
@odeke-em
Copy link
Member Author

@bogdandrutu @tigrannajaryan the failed check seems to be flake from the CI itself and unrelated to code:

6d54c59f90a1c17832fdec2373b613351ac3046a5aa62f66c0e127895e61e5f7
Installing otel-collector_0.28.0-post_amd64.deb ...
Error response from daemon: Error processing tar file(exit status 1): lchown /tmp/otel-collector_0.28.0-post_amd64.deb: no such file or directory
Error: Process completed with exit code 1.

@odeke-em
Copy link
Member Author

'The PR to finish things up is #3423 but unfortunately Github doesn't stack/link changes so that PR has code from this one, but after this one is merged then it'll get to its normal size.

@odeke-em odeke-em force-pushed the receiver-prometheus-addStalenessmarkersStore branch from ef1d34d to 7c6b172 Compare June 11, 2021 22:00
"github.com/prometheus/prometheus/pkg/labels"
)

type stalenessStore struct {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a comment to explain the purpose and the intended usage. (Yes, I know the functions below have descriptions. I am looking for something that is more of an overall description of how the store should be used and why).
Links to previous discussions or issues which describe the problem can be also useful.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, thanks @tigrannajaryan!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@alolita
Copy link
Member

alolita commented Jun 15, 2021

@odeke-em some of the required tests are failing. Pl take a look and fix.

Implements a simple store that will be used to record stale metrics;
those that are NOT present in the current scrape, but were in the prior
scrape. This logic will then be attached to our receiver logic
to then complete the mechanism and then issue stale markers.

Updates open-telemetry#3413
Updates open-telemetry#2961
@odeke-em
Copy link
Member Author

@alolita, the Windows test failure is a flake from an entirely different package that we haven't touched per https://github.com/open-telemetry/opentelemetry-collector/pull/3414/checks?check_run_id=2823559150#step:6:179

ok  	go.opentelemetry.io/collector/config/configauth	0.029s
# go.opentelemetry.io/collector/config/configgrpc [go.opentelemetry.io/collector/config/configgrpc.test]
config\configgrpc\configgrpc_test.go:456:38: cannot use &grpcTraceServer{} (type *grpcTraceServer) as type pdatagrpc.TracesServer in argument to pdatagrpc.RegisterTracesServer:
	*grpcTraceServer does not implement pdatagrpc.TracesServer (wrong type for Export method)
		have Export(context.Context, pdata.Traces) (interface {}, error)
		want Export(context.Context, pdata.Traces) (pdatagrpc.TracesResponse, error)
config\configgrpc\configgrpc_test.go:501:36: cannot use &grpcTraceServer{} (type *grpcTraceServer) as type pdatagrpc.TracesServer in argument to pdatagrpc.RegisterTracesServer:
	*grpcTraceServer does not implement pdatagrpc.TracesServer (wrong type for Export method)
		have Export(context.Context, pdata.Traces) (interface {}, error)
		want Export(context.Context, pdata.Traces) (pdatagrpc.TracesResponse, error)

same for the lint failures being unrelated to this change

make[1]: Leaving directory '/home/runner/work/opentelemetry-collector/opentelemetry-collector'
config/configgrpc/configgrpc_test.go:456:38: cannot use &(grpcTraceServer literal) (value of type *grpcTraceServer) as pdatagrpc.TracesServer value in argument to pdatagrpc.RegisterTracesServer: wrong type for method Export (have func(context.Context, go.opentelemetry.io/collector/consumer/pdata.Traces) (interface{}, error), want func(context.Context, go.opentelemetry.io/collector/consumer/pdata.Traces) (go.opentelemetry.io/collector/internal/pdatagrpc.TracesResponse, error)) (typecheck)
			pdatagrpc.RegisterTracesServer(s, &grpcTraceServer{})
			                                  ^
config/configgrpc/configgrpc_test.go:501:36: cannot use &(grpcTraceServer literal) (value of type *grpcTraceServer) as pdatagrpc.TracesServer value in argument to pdatagrpc.RegisterTracesServer: wrong type for method Export (have func(context.Context, go.opentelemetry.io/collector/consumer/pdata.Traces) (interface{}, error), want func(context.Context, go.opentelemetry.io/collector/consumer/pdata.Traces) (go.opentelemetry.io/collector/internal/pdatagrpc.TracesResponse, error)) (typecheck)
	pdatagrpc.RegisterTracesServer(s, &grpcTraceServer{})
	                                  ^
make[2]: *** [Makefile.Common:26: lint] Error 1

@odeke-em odeke-em force-pushed the receiver-prometheus-addStalenessmarkersStore branch from 0b320e4 to b6fe1c3 Compare June 15, 2021 09:19
@odeke-em
Copy link
Member Author

All tests now passing on their own after the flakes.

@alolita alolita added the ready-to-merge Code review completed; ready to merge by maintainers label Jun 16, 2021
"github.com/prometheus/prometheus/pkg/labels"
)

type stalenessStore struct {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@tigrannajaryan tigrannajaryan merged commit cdc1634 into open-telemetry:main Jun 16, 2021
tigrannajaryan pushed a commit that referenced this pull request Jul 8, 2021
…ring metrics (#3423)

Adds the staleness store to the metricsBuilder and transaction to track when metrics
disappear between scrapes.

Depends on #3414
Fixes #3413
Fixes #2961
@odeke-em odeke-em deleted the receiver-prometheus-addStalenessmarkersStore branch July 11, 2021 05:46
dashpole added a commit to dashpole/opentelemetry-collector that referenced this pull request Sep 8, 2021
dashpole added a commit to dashpole/opentelemetry-collector that referenced this pull request Sep 8, 2021
dashpole added a commit to dashpole/opentelemetry-collector that referenced this pull request Sep 8, 2021
bogdandrutu pushed a commit that referenced this pull request Sep 9, 2021
…heus, rather than making our own (#3989)

* Revert "receiver/prometheus: glue and complete staleness marking for disappearing metrics (#3423)"

This reverts commit 8b79380.

* Revert "receiver/prometheus: add store to track stale metrics (#3414)"

This reverts commit cdc1634.

* stop dropping staleness markers from prometheus, and fix tests

* add staleness end to end test from #3423

* fix import grouping
bogdandrutu pushed a commit that referenced this pull request Sep 9, 2021
…heus, rather than making our own (#3991)

* Revert "receiver/prometheus: glue and complete staleness marking for disappearing metrics (#3423)"

This reverts commit 8b79380.

* Revert "receiver/prometheus: add store to track stale metrics (#3414)"

This reverts commit cdc1634.

* stop dropping staleness markers from prometheus, and fix tests

* add staleness end to end test from #3423

* fix import grouping
bogdandrutu pushed a commit that referenced this pull request Sep 9, 2021
…heus, rather than making our own (#3990)

* Revert "receiver/prometheus: glue and complete staleness marking for disappearing metrics (#3423)"

This reverts commit 8b79380.

* Revert "receiver/prometheus: add store to track stale metrics (#3414)"

This reverts commit cdc1634.

* stop dropping staleness markers from prometheus, and fix tests

* add staleness end to end test from #3423

* fix import grouping
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:receiver ready-to-merge Code review completed; ready to merge by maintainers
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants