Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

query: Staleness problem #2608

Closed
IKSIN opened this issue May 14, 2020 · 22 comments · Fixed by #3277
Closed

query: Staleness problem #2608

IKSIN opened this issue May 14, 2020 · 22 comments · Fixed by #3277

Comments

@IKSIN
Copy link
Contributor

IKSIN commented May 14, 2020

Thanos, Prometheus and Golang version used:
thanos: v0.12.0

Object Storage Provider:
private CEPH (S3)

What happened:
See on end_input time and resolution:
Снимок экрана 2020-05-14 в 16 34 52
Снимок экрана 2020-05-14 в 16 35 02
Staleness functionality in prometheus library get rid of some points returned from thanos-stores.

What you expected to happen:
Return all data from store on any time_range

How to reproduce it (as minimally and precisely as possible):
see on screenshots.

Full logs to relevant components:

Anything else we need to know:
I think, that we have few ways to resolve problem:

  1. Update prometheus library and set LookbackDelta parameter > 5 min (need check)
  2. Update query and move/duplicate points to needed timestamp. (Interpolate data for PromQL)
  3. Update prometheus library to return all points from stores.
@daixiang0
Copy link
Member

What is the ceph version?

@IKSIN
Copy link
Contributor Author

IKSIN commented May 15, 2020

CEPH does not matter, because thanos-stores return all data to query, and only in prometheus library points marked as staleness.

@bwplotka
Copy link
Member

Nice, thanks for this. Funnily enough we just talked about this exact problem with @juliusv (:

We need different lookbackDelta for different resolution I think, right? @juliusv

@juliusv
Copy link
Contributor

juliusv commented May 15, 2020

At a minimum, it would be good to add the --query.lookback-delta that we have in Prometheus to Thanos as well. However, since it's a global setting, it would apply to all time series, even the ones that are scraped a intervals <5m. Normally you wouldn't want to set this lookback delta higher than needed for everything, as that will result in old samples being returned for quite long (although explicit staleness markers already help with that).

@IKSIN
Copy link
Contributor Author

IKSIN commented May 15, 2020

I think that we need in dynamic lookback-delta inpdepended on resolution, forexample resolution/2.

@bwplotka
Copy link
Member

Well. The main problem is that we can use different resolution in single PromQL eval (:

So it can be [1h of raw data, 2w of 1h resolution, and 5h of 5m resolution] combined.

So I think we might need to think of something in the PromQL itself. @brian-brazil do you know how hard would be that?

Also we can temporarily add lookback delta per query as well 🤔

@brian-brazil
Copy link
Contributor

Varying resolution within one query is unlikely to wrok. What I'd do is present that to PromQL that looks real from the downsampled data - e.g. here you might provide interpolated samples every 1m.

It kinda depends on what the query is though.

@IKSIN
Copy link
Contributor Author

IKSIN commented May 18, 2020

@bwplotka @brian-brazil
Can we choose solution as soon as possible? I'm work on this problem now, and can to implement both solutions...

@bwplotka
Copy link
Member

Can you elaborate more @brian-brazil ? So essentially you would actually for each downsampled data, actually expand it to have samples every 1m, fake interval? 🤔

What would be the corner cases? Why it depends on query?

Alternatively we could have 3 PromQL engines in Querier and chose what to use based on the returned data. Then we can evaluate for the given periods and contact the results. However for large steps and intervals, it would be most likely bad....

@brian-brazil
Copy link
Contributor

So essentially you would actually for each downsampled data, actually expand it to have samples every 1m, fake interval

Yes, something like that.

Why it depends on query?

For e.g. sum_over_time you need different data than count_over_time to produce the desired result.

@bwplotka
Copy link
Member

Looks like @IKSIN we could try that in querier.go

@IKSIN
Copy link
Contributor Author

IKSIN commented May 18, 2020

Ok! I try do it )

@bwplotka
Copy link
Member

I am pretty sure we need special iterator for downsampled chunks.

@bwplotka
Copy link
Member

For e.g. sum_over_time you need different data than count_over_time to produce the desired result.

This is already well handled.

@stale
Copy link

stale bot commented Jun 17, 2020

Hello 👋 Looks like there was no activity on this issue for last 30 days.
Do you mind updating us on the status? Is this still reproducible or needed? If yes, just comment on this PR or push a commit. Thanks! 🤗
If there will be no activity for next week, this issue will be closed (we can always reopen an issue if we need!). Alternatively, use remind command if you wish to be reminded at some point in future.

@stale stale bot added the stale label Jun 17, 2020
@stale
Copy link

stale bot commented Jun 24, 2020

Closing for now as promised, let us know if you need this to be reopened! 🤗

@stale stale bot closed this as completed Jun 24, 2020
@d-ulyanov
Copy link
Contributor

we still work on this, let's reopen

krya-kryak added a commit to krya-kryak/thanos that referenced this issue Oct 1, 2020
Signed-off-by: Vladimir Kononov <krya-kryak@users.noreply.github.com>
@bwplotka bwplotka reopened this Oct 2, 2020
@stale stale bot removed the stale label Oct 2, 2020
@bwplotka
Copy link
Member

bwplotka commented Oct 2, 2020

BTW do you know we can now configure stalenees Lookback delta?

However we might want to adjust it for different resolutions indeed

@d-ulyanov
Copy link
Contributor

@bwplotka As I remember staleness lookback delta is not something new. Or it was changed recently somehow?

@bwplotka
Copy link
Member

bwplotka commented Oct 2, 2020

We just allow users to configure it on Querier from flag that's it.

krya-kryak added a commit to krya-kryak/thanos that referenced this issue Oct 5, 2020
Closes thanos-io#2608
This allows queries with large step to make use of downsampled data.

Signed-off-by: Vladimir Kononov <krya-kryak@users.noreply.github.com>
@krya-kryak
Copy link
Contributor

BTW do you know we can now configure stalenees Lookback delta?

However we might want to adjust it for different resolutions indeed

Well, here's my attempt at it: #3277

krya-kryak added a commit to krya-kryak/thanos that referenced this issue Oct 5, 2020
Closes thanos-io#2608
This allows queries with large step to make use of downsampled data.

Signed-off-by: Vladimir Kononov <krya-kryak@users.noreply.github.com>
bwplotka added a commit that referenced this issue Nov 6, 2020
* query: introduce dynamic lookback interval

Closes #2608
This allows queries with large step to make use of downsampled data.

Signed-off-by: Vladimir Kononov <krya-kryak@users.noreply.github.com>

* Fix minor checks

Signed-off-by: Vladimir Kononov <krya-kryak@users.noreply.github.com>

* Append changelog

Signed-off-by: Vladimir Kononov <krya-kryak@users.noreply.github.com>

* Add missing copyright

Signed-off-by: Vladimir Kononov <krya-kryak@users.noreply.github.com>

* Use pre-defined downsampling resolution constatns

Signed-off-by: Vladimir Kononov <krya-kryak@users.noreply.github.com>

* Use dynamic lookback delta by default

Co-authored-by: Bartlomiej Plotka <bwplotka@gmail.com>
Signed-off-by: Vladimir Kononov <krya-kryak@users.noreply.github.com>

* Rename defaultEngine to rawEngine

Signed-off-by: Vladimir Kononov <krya-kryak@users.noreply.github.com>

* Merge engineFunc and newEngine into single engineFactory

Signed-off-by: Vladimir Kononov <krya-kryak@users.noreply.github.com>

* Remove query.dynamic-lookback-delta from docs

Signed-off-by: Vladimir Kononov <krya-kryak@users.noreply.github.com>

* Rename test

Signed-off-by: Vladimir Kononov <krya-kryak@users.noreply.github.com>

* Review fixes

Signed-off-by: Vladimir Kononov <krya-kryak@users.noreply.github.com>

Co-authored-by: Bartlomiej Plotka <bwplotka@gmail.com>
Oghenebrume50 pushed a commit to Oghenebrume50/thanos that referenced this issue Dec 7, 2020
* query: introduce dynamic lookback interval

Closes thanos-io#2608
This allows queries with large step to make use of downsampled data.

Signed-off-by: Vladimir Kononov <krya-kryak@users.noreply.github.com>

* Fix minor checks

Signed-off-by: Vladimir Kononov <krya-kryak@users.noreply.github.com>

* Append changelog

Signed-off-by: Vladimir Kononov <krya-kryak@users.noreply.github.com>

* Add missing copyright

Signed-off-by: Vladimir Kononov <krya-kryak@users.noreply.github.com>

* Use pre-defined downsampling resolution constatns

Signed-off-by: Vladimir Kononov <krya-kryak@users.noreply.github.com>

* Use dynamic lookback delta by default

Co-authored-by: Bartlomiej Plotka <bwplotka@gmail.com>
Signed-off-by: Vladimir Kononov <krya-kryak@users.noreply.github.com>

* Rename defaultEngine to rawEngine

Signed-off-by: Vladimir Kononov <krya-kryak@users.noreply.github.com>

* Merge engineFunc and newEngine into single engineFactory

Signed-off-by: Vladimir Kononov <krya-kryak@users.noreply.github.com>

* Remove query.dynamic-lookback-delta from docs

Signed-off-by: Vladimir Kononov <krya-kryak@users.noreply.github.com>

* Rename test

Signed-off-by: Vladimir Kononov <krya-kryak@users.noreply.github.com>

* Review fixes

Signed-off-by: Vladimir Kononov <krya-kryak@users.noreply.github.com>

Co-authored-by: Bartlomiej Plotka <bwplotka@gmail.com>
Signed-off-by: Oghenebrume50 <raphlbrume@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
7 participants