native histograms: experimental cardinality api #8008

krajorama · 2024-04-30T06:59:11Z

What this PR does

Query-frontend, querier: new experimental /cardinality/active_native_histogram_metrics API to get active native histogram metric names with statistics about active native histogram buckets.

Which issue(s) this PR fixes or relates to

Fixes #7981

Checklist

Tests updated.
Documentation added.
CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX].
about-versioning.md updated with experimental features.

krajorama · 2024-05-03T12:55:36Z

Regeression benchmark for querier

goos: linux
goarch: amd64
pkg: github.com/grafana/mimir/pkg/querier
cpu: AMD Ryzen 7 4700U with Radeon Graphics         
                                │   old.txt   │            new.txt            │
                                │   sec/op    │   sec/op     vs base          │
ActiveSeriesHandler_ServeHTTP-8   258.7µ ± 3%   253.8µ ± 4%  ~ (p=0.075 n=10)

                                │   old.txt    │            new.txt             │
                                │     B/op     │     B/op      vs base          │
ActiveSeriesHandler_ServeHTTP-8   146.4Ki ± 0%   146.3Ki ± 0%  ~ (p=0.353 n=10)

                                │  old.txt   │           new.txt            │
                                │ allocs/op  │ allocs/op   vs base          │
ActiveSeriesHandler_ServeHTTP-8   115.0 ± 0%   115.0 ± 0%  ~ (p=1.000 n=10)

The new endpoint gives a list of metric names of active native histogram series with associated statistics about active bucket counts. Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>

Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>

flxbk

I had a few minor questions.

flxbk · 2024-05-07T06:47:28Z

pkg/distributor/distributor.go

+	// For native histograms we are going to return metrics level information so we don't need to limit the response size.
+	if !nativeHistograms {
+		if limit := d.limits.ActiveSeriesResultsMaxSizeBytes(tenantID); limit > 0 {
+			maxResponseSize = limit
+		}


Why should we apply the limit selectively only? It's there to bound in-use memory per call and querier, if it is exceeded the request should be retried with a higher shard count. Maybe I'm missing something, but I don't see why the shape of the response data makes any difference?

Yeah, I'm not sure about this one. My problem is the name of the limit , it's called querier.active-series-results-max-size-bytes , but we'd be limiting the intermediate result size. Something more relevant could be querier.max-fetched-series-per-query, but that's for queries which have very different semantics than active series requests.

I'll restore it, add a test and update the help text to try and make it clear how it's used.

Done, CI running now

From a user perspective, I'm not sure if the new help text is clearer. What exactly is your issue with
Maximum size of an active series request result shard in bytes?

but we'd be limiting the intermediate result size

that's exactly what we want though to avoid high in-use space in queriers if the shard count is too low

From a user perspective, I'm not sure if the new help text is clearer. What exactly is your issue with
Maximum size of an active series request result shard in bytes?

I think I just missed the word "shard" in it. I'll update to just add active native histogram metrics to the existing sentence.

pkg/distributor/distributor.go

pkg/distributor/distributor_test.go

flxbk · 2024-05-07T07:32:33Z

pkg/frontend/querymiddleware/shard_active_native_histogram_metrics.go

+	// Cannot start streaming until we merged all results.
+	err := g.Wait()


If we build the full response in memory then we don't need streaming at all, or am I missing something?

True, but on the other hand stream writing JSON is more efficient as well as far as I know? So it made sense to adopt what worked already.

I'm not sure about the efficiency. In general I think the non-streaming version is more robust because if there's any error during accumulation of the response the client will get a non-200 HTTP status code whereas with streaming the client will always get a 200 and if the response can't be built without error the stream will abort. Also, there would be less code complexity without the streaming. But since we've already got that complexity elsewhere maybe it's ok to duplicate it.

Hmm, the error handling is more complicated in my case. That convinced me, I'll change to non streaming.

Reworked it. I also switched from supporting "x-snappy-framed" to supporting "snappy" as encoding, since framing is for streaming and we don't need it here.

Co-authored-by: Felix Beuke <felix.j.beuke@gmail.com>

From review comment. Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>

Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>

flxbk

Thanks, lgtm

* query-frontend: add /api/v1/cardinality/active_native_histogram_metrics The new endpoint gives a list of metric names of active native histogram series with associated statistics about active bucket counts. Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com> Co-authored-by: Felix Beuke <felix.j.beuke@gmail.com>

krajorama force-pushed the krajo/native-histograms-cardinality-api branch from 8539e11 to e19a6b2 Compare April 30, 2024 07:26

krajorama force-pushed the krajo/native-histograms-cardinality-api-querier branch from 220f25c to 73ef049 Compare May 3, 2024 08:24

Base automatically changed from krajo/native-histograms-cardinality-api to main May 3, 2024 09:52

krajorama force-pushed the krajo/native-histograms-cardinality-api-querier branch from 73ef049 to 000c750 Compare May 3, 2024 12:45

krajorama force-pushed the krajo/native-histograms-cardinality-api-querier branch from 000c750 to b0a0626 Compare May 3, 2024 17:26

query-frontend: add /api/v1/cardinality/active_native_histogram_metrics

815d55d

The new endpoint gives a list of metric names of active native histogram series with associated statistics about active bucket counts. Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>

krajorama force-pushed the krajo/native-histograms-cardinality-api-querier branch from b0a0626 to 815d55d Compare May 3, 2024 17:29

krajorama added 2 commits May 6, 2024 12:50

Merge branch 'main' into krajo/native-histograms-cardinality-api-querier

91314ea

Fix lint error

2046595

Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>

krajorama changed the title ~~Krajo/native histograms cardinality api querier~~ native histograms: experimental cardinality api May 6, 2024

krajorama added 6 commits May 6, 2024 13:29

Add sorting of the result for querier and query-frontend

303fd6e

Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>

Fix average calculation and error handling in sharding

97d8224

Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>

Make commend easier to understand.

f8276df

Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>

Add more tests

c7989e3

Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>

Add context cancelation test.

579357d

Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>

Add changelog

97811bd

Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>

krajorama marked this pull request as ready for review May 6, 2024 13:48

krajorama requested a review from a team as a code owner May 6, 2024 13:48

flxbk reviewed May 7, 2024

View reviewed changes

krajorama and others added 4 commits May 8, 2024 11:54

Apply suggestions from code review

9a8a718

Co-authored-by: Felix Beuke <felix.j.beuke@gmail.com>

Merge branch 'main' into krajo/native-histograms-cardinality-api-querier

54b1ea9

Reinstate limit on the size of series names loaded

d9144a9

From review comment. Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>

Update ref help

af95028

Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>

krajorama requested a review from jdbaldry as a code owner May 10, 2024 06:31

krajorama added 3 commits May 15, 2024 08:39

Merge branch 'main' into krajo/native-histograms-cardinality-api-querier

dc7b3c7

Simplify flag help text

ec3c79f

Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>

frontend: do not use streaming to return the result since

d688b78

Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>

flxbk approved these changes May 15, 2024

View reviewed changes

krajorama merged commit 30c5fbf into main May 16, 2024
29 checks passed

krajorama deleted the krajo/native-histograms-cardinality-api-querier branch May 16, 2024 15:28

tacole02 mentioned this pull request Aug 19, 2024

Docs: Native histogram cardinality API #9056

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

native histograms: experimental cardinality api #8008

native histograms: experimental cardinality api #8008

krajorama commented Apr 30, 2024 •

edited

Loading

krajorama commented May 3, 2024 •

edited

Loading

flxbk left a comment

flxbk May 7, 2024

krajorama May 8, 2024 •

edited

Loading

krajorama May 8, 2024

flxbk May 13, 2024

krajorama May 15, 2024 •

edited

Loading

flxbk May 7, 2024

krajorama May 8, 2024

flxbk May 13, 2024

krajorama May 15, 2024

krajorama May 15, 2024

flxbk left a comment

		// Cannot start streaming until we merged all results.
		err := g.Wait()

native histograms: experimental cardinality api #8008

native histograms: experimental cardinality api #8008

Conversation

krajorama commented Apr 30, 2024 • edited Loading

What this PR does

Which issue(s) this PR fixes or relates to

Checklist

krajorama commented May 3, 2024 • edited Loading

flxbk left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

krajorama May 8, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

krajorama May 15, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

flxbk left a comment

Choose a reason for hiding this comment

krajorama commented Apr 30, 2024 •

edited

Loading

krajorama commented May 3, 2024 •

edited

Loading

krajorama May 8, 2024 •

edited

Loading

krajorama May 15, 2024 •

edited

Loading