Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a limit on max remote write request samples #6935

Open
aknuds1 opened this issue Dec 14, 2023 · 5 comments
Open

Add a limit on max remote write request samples #6935

aknuds1 opened this issue Dec 14, 2023 · 5 comments

Comments

@aknuds1
Copy link
Contributor

aknuds1 commented Dec 14, 2023

Is your feature request related to a problem? Please describe.

We need a limit on the maximum number of samples per remote write request (including OTLP). The motivation is that OTLP write requests are typically batched, and could contain so many samples that Mimir takes too long in processing them.

Describe the solution you'd like

A limit on the number of samples a remote write request is permitted to contain. When the limit is hit, the request should be rejected with HTTP 413. A suggested default is 10k.

Describe alternatives you've considered

Additional context

See https://github.com/grafana/mimir-squad/issues/2180, which mentions the following:

Another idea was to validate normal distributor limits prior to translating OTLP to Mimir requests, but this is much more difficult since the validation middlewares currently expect a normal Mimir write request.

@aknuds1 aknuds1 changed the title Add a limit on max remote writte request samples Add a limit on max remote write request samples Dec 14, 2023
@ying-jeanne
Copy link
Contributor

ying-jeanne commented Jan 10, 2024

Prior to enabling the default value of 10k, we intend to introduce a new metrics, potentially in the form of a histogram, to monitor the request samples value, so that give some time for collector configuration change. This approach ensures that no requests are rejected due to the updated default value, preventing any potential data loss.

@ying-jeanne
Copy link
Contributor

The histograms on the client side do not provide the accurate total sample count; the actual series number should be the number of buckets plus 2, but the client side incorrectly counts this as 1. Configuring batch sizes appropriately on the client side poses a challenge.

@aknuds1
Copy link
Contributor Author

aknuds1 commented Jan 23, 2024

According to Ying, these are classical histograms. Uncertain as to whether the same behaviour holds for native histograms.

@ying-jeanne
Copy link
Contributor

related ticket #8260

@ying-jeanne
Copy link
Contributor

Prior to enabling the default value of 10k, we intend to introduce a new metrics, potentially in the form of a histogram, to monitor the request samples value, so that give some time for collector configuration change. This approach ensures that no requests are rejected due to the updated default value, preventing any potential data loss.

this is implemented.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants