-
Notifications
You must be signed in to change notification settings - Fork 512
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Set shuffle sharding ingester lookback automatically #2110
Conversation
This change deprecates the `querier.shuffle-sharding-ingesters-lookback-period` option, instead setting the lookback period from the value of the `querier.query-ingesters-within` option. Outside of our own integration tests, there's no situation where it's useful to set the lookback period to a different value than the `query-ingesters-within` option. This also adds a new option `querier.shuffle-sharding-ingesters-enabled` that can be used to enable or disable shuffle sharding on ingesters on the read path instead of using the lookback period for this. Fixes #1810 Signed-off-by: Nick Pillitteri <nick.pillitteri@grafana.com>
Edit: I didn't see you added an option to disable this - then there should be nothing to worry about. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on this! Logic changes LGTM. I left few comments here and there that I would like you to consider before merging. I also left a possible idea on the integration test.
pkg/querier/querier.go
Outdated
cfg.StoreGatewayClient.RegisterFlagsWithPrefix("querier.store-gateway-client", f) | ||
f.BoolVar(&cfg.Iterators, "querier.iterators", false, "Use iterators to execute query, as opposed to fully materialising the series in memory.") | ||
f.BoolVar(&cfg.BatchIterators, "querier.batch-iterators", true, "Use batch iterators to execute query, as opposed to fully materialising the series in memory. Takes precedent over the -querier.iterators flag.") | ||
f.DurationVar(&cfg.QueryIngestersWithin, queryIngestersWithinFlag, 13*time.Hour, "Maximum lookback beyond which queries are not sent to ingester. 0 means all queries are sent to ingester.") | ||
f.DurationVar(&cfg.MaxQueryIntoFuture, "querier.max-query-into-future", 10*time.Minute, "Maximum duration into the future you can query. 0 to disable.") | ||
f.DurationVar(&cfg.QueryStoreAfter, queryStoreAfterFlag, 12*time.Hour, "The time after which a metric should be queried from storage and not just ingesters. 0 means all queries are sent to store. If this option is enabled, the time range of the query sent to the store-gateway will be manipulated to ensure the query end is not more recent than 'now - query-store-after'.") | ||
f.DurationVar(&cfg.ShuffleShardingIngestersLookbackPeriod, shuffleShardingIngestersLookbackPeriodFlag, 13*time.Hour, "When this setting is > 0, queriers fetch in-memory series from the minimum set of required ingesters, selecting only ingesters which may have received series since 'now - lookback period'. The lookback period should be greater or equal than the configured -querier.query-store-after and -querier.query-ingesters-within. If this setting is 0, queriers always query all ingesters (ingesters shuffle sharding on read path is disabled).") | ||
flagext.DeprecatedFlag(f, shuffleShardingIngestersLookbackPeriodFlag, fmt.Sprintf("Deprecated: this setting should always be the same as %s and will now behave as if it is", queryIngestersWithinFlag), logger) | ||
f.BoolVar(&cfg.ShuffleShardingIngestersEnabled, "querier.shuffle-sharding-ingesters-enabled", true, fmt.Sprintf("Fetch in-memory series from the minimum set of required ingesters, selecting only ingesters which may have received series since %s. If this setting is false or %s is '0', queriers always query all ingesters (ingesters shuffle sharding on read path is disabled).", queryIngestersWithinFlag, queryIngestersWithinFlag)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I recommend to prefix flags with -
to make it more clear they're CLI flags.
f.BoolVar(&cfg.ShuffleShardingIngestersEnabled, "querier.shuffle-sharding-ingesters-enabled", true, fmt.Sprintf("Fetch in-memory series from the minimum set of required ingesters, selecting only ingesters which may have received series since %s. If this setting is false or %s is '0', queriers always query all ingesters (ingesters shuffle sharding on read path is disabled).", queryIngestersWithinFlag, queryIngestersWithinFlag)) | |
f.BoolVar(&cfg.ShuffleShardingIngestersEnabled, "querier.shuffle-sharding-ingesters-enabled", true, fmt.Sprintf("Fetch in-memory series from the minimum set of required ingesters, selecting only ingesters which may have received series since -%s. If this setting is false or -%s is '0', queriers always query all ingesters (ingesters shuffle sharding on read path is disabled).", queryIngestersWithinFlag, queryIngestersWithinFlag)) |
docs/sources/operators-guide/configuring/configuring-shuffle-sharding/index.md
Outdated
Show resolved
Hide resolved
docs/sources/operators-guide/configuring/configuring-shuffle-sharding/index.md
Show resolved
Hide resolved
docs/sources/operators-guide/configuring/configuring-shuffle-sharding/index.md
Outdated
Show resolved
Hide resolved
docs/sources/operators-guide/configuring/configuring-shuffle-sharding/index.md
Outdated
Show resolved
Hide resolved
// path). This is because the way which ingesters to query is calculated depends on | ||
// when they were registered compared to the "query ingesters within" time. They'll | ||
// never be registered long enough as part of this test to be queried for the series. | ||
// Instead, all ingesters are queried. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this is not very good. We're not really testing shuffle sharding anymore here. What if we set a query-ingesters-within to a low value (e.g. 3s), then we start ingesters, we wait 3s, we push samples and query back.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes! I tried out using a small value (30s) and a time.Sleep()
locally. I was able to get the tests passing once but they seemed a little flaky. I'll look into this more and see if I can get them working reliably.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make sure to set very very low ring heartbeat period, otherwise the registration timestamp is not updated frequently enough.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a lookback and delay of 5s
. I'm hoping this is enough for these tests to work reliably (haven't seen any failures so far) 🤞
Signed-off-by: Nick Pillitteri <nick.pillitteri@grafana.com>
10dd18b
to
aaa8cbf
Compare
@@ -2022,16 +2022,15 @@ func TestDistributor_MetricsMetadata(t *testing.T) { | |||
_, err := ds[0].Push(ctx, req) | |||
require.NoError(t, err) | |||
|
|||
// Check how many ingesters are queried as part of the shuffle sharding subring. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was going to add unit tests explicitly for GetIngestersForMetadata
and GetIngestersForQuery
but this test is already doing a similar verification. Swapped it to use this method instead of counting mock calls because we don't have to test both expectedIngesters
and expectedIngesters - 1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Signed-off-by: Nick Pillitteri <nick.pillitteri@grafana.com>
This change deprecates the `querier.shuffle-sharding-ingesters-lookback-period` option, instead setting the lookback period from the value of the `querier.query-ingesters-within` option. Outside of our own integration tests, there's no situation where it's useful to set the lookback period to a different value than the `query-ingesters-within` option. This also adds a new option `querier.shuffle-sharding-ingesters-enabled` that can be used to enable or disable shuffle sharding on ingesters on the read path instead of using the lookback period for this. Fixes grafana#1810 Signed-off-by: Nick Pillitteri <nick.pillitteri@grafana.com>
What this PR does
This change deprecates the
querier.shuffle-sharding-ingesters-lookback-period
option, insteadsetting the lookback period from the value of the
querier.query-ingesters-within
option. Outside of our own integrationtests, there's no situation where it's useful to set the lookback period
to a different value than the
query-ingesters-within
option.This also adds a new option
querier.shuffle-sharding-ingesters-enabled
that can be used to enable or disable shuffle sharding on ingesters on
the read path instead of using the lookback period for this.
Which issue(s) this PR fixes or relates to
Fixes #1810
Signed-off-by: Nick Pillitteri nick.pillitteri@grafana.com
Checklist
CHANGELOG.md
updated - the order of entries should be[CHANGE]
,[FEATURE]
,[ENHANCEMENT]
,[BUGFIX]