Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Instrument MetricsReader with metrics #3667

Merged

Conversation

albertteoh
Copy link
Contributor

@albertteoh albertteoh commented May 9, 2022

Which problem is this PR solving?

Short description of the changes

Testing

  • Add unit tests.
  • Manually tested both jaeger-query and all-in-one. Results below:

jaeger-query

curl localhost:16687/metrics

Success

With Prometheus server running.

requests

jaeger_query_requests_total{operation="get_call_rates",result="err"} 0
jaeger_query_requests_total{operation="get_call_rates",result="ok"} 18
jaeger_query_requests_total{operation="get_error_rates",result="err"} 0
jaeger_query_requests_total{operation="get_error_rates",result="ok"} 18
jaeger_query_requests_total{operation="get_latencies",result="err"} 0
jaeger_query_requests_total{operation="get_latencies",result="ok"} 36
jaeger_query_requests_total{operation="get_min_step_duration",result="err"} 0
jaeger_query_requests_total{operation="get_min_step_duration",result="ok"} 0

latency_buckets

jaeger_query_latency_bucket{operation="get_call_rates",result="ok",le="0.005"} 5
jaeger_query_latency_bucket{operation="get_call_rates",result="ok",le="0.01"} 13
jaeger_query_latency_bucket{operation="get_call_rates",result="ok",le="0.025"} 18

jaeger_query_latency_bucket{operation="get_error_rates",result="ok",le="0.005"} 7
jaeger_query_latency_bucket{operation="get_error_rates",result="ok",le="0.01"} 13
jaeger_query_latency_bucket{operation="get_error_rates",result="ok",le="0.025"} 18

jaeger_query_latency_bucket{operation="get_latencies",result="ok",le="0.005"} 7
jaeger_query_latency_bucket{operation="get_latencies",result="ok",le="0.01"} 25
jaeger_query_latency_bucket{operation="get_latencies",result="ok",le="0.025"} 36

Fail

Prometheus server NOT running.

requests

jaeger_query_requests_total{operation="get_call_rates",result="err"} 4
jaeger_query_requests_total{operation="get_call_rates",result="ok"} 0
jaeger_query_requests_total{operation="get_error_rates",result="err"} 4
jaeger_query_requests_total{operation="get_error_rates",result="ok"} 0
jaeger_query_requests_total{operation="get_latencies",result="err"} 8
jaeger_query_requests_total{operation="get_latencies",result="ok"} 0
jaeger_query_requests_total{operation="get_min_step_duration",result="err"} 0
jaeger_query_requests_total{operation="get_min_step_duration",result="ok"} 0

all-in-one

curl localhost:14269/metrics

Success

With Prometheus server running.

requests

jaeger_requests_total{operation="get_call_rates",result="err"} 0
jaeger_requests_total{operation="get_call_rates",result="ok"} 6
jaeger_requests_total{operation="get_error_rates",result="err"} 0
jaeger_requests_total{operation="get_error_rates",result="ok"} 6
jaeger_requests_total{operation="get_latencies",result="err"} 0
jaeger_requests_total{operation="get_latencies",result="ok"} 12
jaeger_requests_total{operation="get_min_step_duration",result="err"} 0
jaeger_requests_total{operation="get_min_step_duration",result="ok"} 0

latency_buckets

jaeger_latency_bucket{operation="get_call_rates",result="ok",le="0.005"} 0
jaeger_latency_bucket{operation="get_call_rates",result="ok",le="0.01"} 1
jaeger_latency_bucket{operation="get_call_rates",result="ok",le="0.025"} 4
jaeger_latency_bucket{operation="get_call_rates",result="ok",le="0.05"} 6

jaeger_latency_bucket{operation="get_error_rates",result="ok",le="0.005"} 0
jaeger_latency_bucket{operation="get_error_rates",result="ok",le="0.01"} 1
jaeger_latency_bucket{operation="get_error_rates",result="ok",le="0.025"} 4
jaeger_latency_bucket{operation="get_error_rates",result="ok",le="0.05"} 6

jaeger_latency_bucket{operation="get_latencies",result="ok",le="0.005"} 0
jaeger_latency_bucket{operation="get_latencies",result="ok",le="0.01"} 0
jaeger_latency_bucket{operation="get_latencies",result="ok",le="0.025"} 6
jaeger_latency_bucket{operation="get_latencies",result="ok",le="0.05"} 12

Fail

Prometheus server NOT running.

requests

jaeger_requests_total{operation="get_call_rates",result="err"} 6
jaeger_requests_total{operation="get_call_rates",result="ok"} 0
jaeger_requests_total{operation="get_error_rates",result="err"} 6
jaeger_requests_total{operation="get_error_rates",result="ok"} 0
jaeger_requests_total{operation="get_latencies",result="err"} 12
jaeger_requests_total{operation="get_latencies",result="ok"} 0
jaeger_requests_total{operation="get_min_step_duration",result="err"} 0
jaeger_requests_total{operation="get_min_step_duration",result="ok"} 0

Signed-off-by: Albert Teoh <see.kwang.teoh@gmail.com>
Signed-off-by: Albert Teoh <see.kwang.teoh@gmail.com>
Signed-off-by: Albert Teoh <see.kwang.teoh@gmail.com>
Signed-off-by: Albert Teoh <see.kwang.teoh@gmail.com>
Signed-off-by: Albert Teoh <see.kwang.teoh@gmail.com>
Signed-off-by: Albert Teoh <see.kwang.teoh@gmail.com>
Signed-off-by: Albert Teoh <see.kwang.teoh@gmail.com>
@codecov
Copy link

codecov bot commented May 9, 2022

Codecov Report

Merging #3667 (857f64a) into main (87d9ca4) will decrease coverage by 0.02%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##             main    #3667      +/-   ##
==========================================
- Coverage   96.52%   96.49%   -0.03%     
==========================================
  Files         265      266       +1     
  Lines       15581    15622      +41     
==========================================
+ Hits        15040    15075      +35     
- Misses        453      457       +4     
- Partials       88       90       +2     
Impacted Files Coverage Δ
storage/metricsstore/metrics/decorator.go 100.00% <100.00%> (ø)
cmd/query/app/static_handler.go 95.80% <0.00%> (-1.80%) ⬇️
plugin/storage/badger/spanstore/reader.go 95.50% <0.00%> (-0.71%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 87d9ca4...857f64a. Read the comment docs.

if err := factory.Initialize(logger); err != nil {
func createMetricsQueryService(
metricsReaderFactory *metricsPlugin.Factory,
metricsReaderMetricsFactory metrics.Factory,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Group logger and metrics next to each other

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in a41b9be.

type queryMetrics struct {
Errors metrics.Counter `metric:"requests" tags:"result=err"`
Successes metrics.Counter `metric:"requests" tags:"result=ok"`
Responses metrics.Timer `metric:"responses"` // used as a histogram, not necessary for GetTrace
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copied comment?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed in 857f64a.

} else {
q.Successes.Inc(1)
q.OKLatency.Record(latency)
q.Responses.Record(time.Duration(responses))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't follow, what is 1ns here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question, I actually don't really understand how it was used in spanstore either.

Either way, I don't think it applies for the metrics reader use case so I've removed it in 857f64a.

Signed-off-by: Albert Teoh <see.kwang.teoh@gmail.com>
Signed-off-by: Albert Teoh <see.kwang.teoh@gmail.com>
Signed-off-by: Albert Teoh <see.kwang.teoh@gmail.com>
@yurishkuro yurishkuro enabled auto-merge (squash) May 9, 2022 22:04
@yurishkuro yurishkuro merged commit da861ec into jaegertracing:main May 9, 2022
@albertteoh albertteoh deleted the add-metrics-to-metricsreader branch May 10, 2022 07:44
albertteoh added a commit to albertteoh/jaeger that referenced this pull request Jul 13, 2022
* Add metrics instrumentation for SPM (MetricsReader)

Signed-off-by: Albert Teoh <see.kwang.teoh@gmail.com>

* Add metrics for all-in-one and new prometheus reader

Signed-off-by: Albert Teoh <see.kwang.teoh@gmail.com>

* Remove new prometheus metrics instrumentation

Signed-off-by: Albert Teoh <see.kwang.teoh@gmail.com>

* Add license

Signed-off-by: Albert Teoh <see.kwang.teoh@gmail.com>

* Revert unused metrics for prometheus factory

Signed-off-by: Albert Teoh <see.kwang.teoh@gmail.com>

* Long params to multiline and inline return

Signed-off-by: Albert Teoh <see.kwang.teoh@gmail.com>

* make fmt

Signed-off-by: Albert Teoh <see.kwang.teoh@gmail.com>

* Group logger and metrics next to each other

Signed-off-by: Albert Teoh <see.kwang.teoh@gmail.com>

* Remove copied comment

Signed-off-by: Albert Teoh <see.kwang.teoh@gmail.com>

* Remove responses metric

Signed-off-by: Albert Teoh <see.kwang.teoh@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add metrics instrumentation around MetricsReader
2 participants