Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

query: 30 seconds lag of adding store to query storeset on start #830

Closed
R4scal opened this issue Feb 10, 2019 · 7 comments · Fixed by #933
Closed

query: 30 seconds lag of adding store to query storeset on start #830

R4scal opened this issue Feb 10, 2019 · 7 comments · Fixed by #933

Comments

@R4scal
Copy link

R4scal commented Feb 10, 2019

Hi
After thanos query service start and listen for queries passes approximately 30 seconds before it read targets from store.sd-files config and in logs No store matched for this query.

May be not listen for queries before store backend initialized?

thanos, version 0.3.0 (branch: HEAD, revision: 837e9671737698bf1778a4a9abfebbf96117a0be)
  build user:       root@986454de7a63
  build date:       20190208-15:23:51
  go version:       go1.11.5

Run thanos query with options

thanos query --http-address 0.0.0.0:9090 \
 --query.replica-label replica \
 --store.sd-files=/etc/thanos/file_sd/thanos_sidecar.yml \
 --store.sd-files=/etc/thanos/file_sd/thanos_store.yml \
 --store.sd-interval=15s \
 --cluster.disable

Logs:

Feb 10 12:18:29 thanos-query01 thanos[27486]: level=info ts=2019-02-10T12:18:29.547430546Z caller=flags.go:75 msg="gossip is disabled"
Feb 10 12:18:29 thanos-query01 thanos[27486]: level=info ts=2019-02-10T12:18:29.549568879Z caller=main.go:256 component=query msg="disabled TLS, key and cert must be set to enable"
Feb 10 12:18:29 thanos-query01 thanos[27486]: level=info ts=2019-02-10T12:18:29.549606049Z caller=query.go:468 msg="starting query node"
Feb 10 12:18:29 thanos-query01 thanos[27486]: level=info ts=2019-02-10T12:18:29.549721854Z caller=query.go:437 msg="Listening for query and metrics" address=0.0.0.0:9090
Feb 10 12:18:29 thanos-query01 thanos[27486]: level=info ts=2019-02-10T12:18:29.54980556Z caller=query.go:460 component=query msg="Listening for StoreAPI gRPC" address=0.0.0.0:10901
Feb 10 12:18:30 thanos-query01 thanos[27486]: level=warn ts=2019-02-10T12:18:30.68789894Z caller=proxy.go:181 err="No store matched for this query" stores=
Feb 10 12:18:30 thanos-query01 thanos[27486]: level=warn ts=2019-02-10T12:18:30.688084505Z caller=proxy.go:181 err="No store matched for this query" stores=
Feb 10 12:18:30 thanos-query01 thanos[27486]: level=warn ts=2019-02-10T12:18:30.689230174Z caller=proxy.go:181 err="No store matched for this query" stores=
Feb 10 12:18:30 thanos-query01 thanos[27486]: level=warn ts=2019-02-10T12:18:30.68932082Z caller=proxy.go:181 err="No store matched for this query" stores=
Feb 10 12:18:33 thanos-query01 thanos[27486]: level=warn ts=2019-02-10T12:18:33.316676018Z caller=proxy.go:181 err="No store matched for this query" stores=
Feb 10 12:18:35 thanos-query01 thanos[27486]: level=warn ts=2019-02-10T12:18:35.002749295Z caller=proxy.go:181 err="No store matched for this query" stores=
Feb 10 12:18:35 thanos-query01 thanos[27486]: level=warn ts=2019-02-10T12:18:35.008645702Z caller=proxy.go:181 err="No store matched for this query" stores=
Feb 10 12:18:38 thanos-query01 thanos[27486]: level=warn ts=2019-02-10T12:18:38.890962012Z caller=proxy.go:181 err="No store matched for this query" stores=
Feb 10 12:18:38 thanos-query01 thanos[27486]: level=warn ts=2019-02-10T12:18:38.891006167Z caller=proxy.go:181 err="No store matched for this query" stores=
Feb 10 12:18:43 thanos-query01 thanos[27486]: level=warn ts=2019-02-10T12:18:43.447676471Z caller=proxy.go:181 err="No store matched for this query" stores=
Feb 10 12:18:43 thanos-query01 thanos[27486]: level=warn ts=2019-02-10T12:18:43.448482479Z caller=proxy.go:181 err="No store matched for this query" stores=
Feb 10 12:18:44 thanos-query01 thanos[27486]: level=warn ts=2019-02-10T12:18:44.016477579Z caller=proxy.go:181 err="No store matched for this query" stores=
Feb 10 12:18:44 thanos-query01 thanos[27486]: level=warn ts=2019-02-10T12:18:44.01648745Z caller=proxy.go:181 err="No store matched for this query" stores=
Feb 10 12:18:44 thanos-query01 thanos[27486]: level=warn ts=2019-02-10T12:18:44.952327456Z caller=proxy.go:181 err="No store matched for this query" stores=
Feb 10 12:18:45 thanos-query01 thanos[27486]: level=warn ts=2019-02-10T12:18:45.850134356Z caller=proxy.go:181 err="No store matched for this query" stores=
Feb 10 12:18:46 thanos-query01 thanos[27486]: level=warn ts=2019-02-10T12:18:46.311129224Z caller=proxy.go:181 err="No store matched for this query" stores=
Feb 10 12:18:53 thanos-query01 thanos[27486]: level=warn ts=2019-02-10T12:18:53.296178654Z caller=proxy.go:181 err="No store matched for this query" stores=
Feb 10 12:18:55 thanos-query01 thanos[27486]: level=warn ts=2019-02-10T12:18:55.854480378Z caller=proxy.go:181 err="No store matched for this query" stores=
Feb 10 12:18:55 thanos-query01 thanos[27486]: level=warn ts=2019-02-10T12:18:55.85469596Z caller=proxy.go:181 err="No store matched for this query" stores=
Feb 10 12:18:58 thanos-query01 thanos[27486]: level=warn ts=2019-02-10T12:18:58.967238319Z caller=proxy.go:181 err="No store matched for this query" stores=
Feb 10 12:18:58 thanos-query01 thanos[27486]: level=warn ts=2019-02-10T12:18:58.967775992Z caller=proxy.go:181 err="No store matched for this query" stores=
Feb 10 12:18:59 thanos-query01 thanos[27486]: level=info ts=2019-02-10T12:18:59.561699715Z caller=storeset.go:250 component=storeset msg="adding new store to query storeset" address=prometheus01:10901
Feb 10 12:18:59 thanos-query01 thanos[27486]: level=info ts=2019-02-10T12:18:59.561759194Z caller=storeset.go:250 component=storeset msg="adding new store to query storeset" address=thanos-store01:10901
@GiedriusS
Copy link
Member

GiedriusS commented Feb 10, 2019

Indeed you are right - it takes a long time for Thanos Store to start. Did you check the logs of thanos-store01:10901 to see if it had successfully initialized before 2019-02-10T12:18:58.967775992Z?

@R4scal
Copy link
Author

R4scal commented Feb 10, 2019

prometheus01 (sidecar) and thanos-store01 (store) are runing all time when I'm restart query service. If I'm use cluster - store added faster, when probe cluster address success.
If I'm use store.sd-files with disabled cluster - 30s need after every restart for connect to store targets. Parameter store.sd-interval has no influence for this time

@jojohappy
Copy link
Member

@R4scal @GiedriusS I found that you can set the --store.sd-dns-interval flag to reduce the interval refreshing the storeset list.

Please look at those code:
https://github.com/improbable-eng/thanos/blob/9a33c996e6b25350b93adad7df0e328cdc9ff99b/cmd/thanos/query.go#L85-L86

https://github.com/improbable-eng/thanos/blob/9a33c996e6b25350b93adad7df0e328cdc9ff99b/cmd/thanos/query.go#L384-L395

The dnsProvider will update the store addresses from file-sd every --store.sd-dns-interval, the default value is 30s, so it is the lag. Not sure it is a bug, but maybe file-sd happens after dnsProvider updating at the beginning of Query starts.

Could we add one line of code after there?
https://github.com/improbable-eng/thanos/blob/9a33c996e6b25350b93adad7df0e328cdc9ff99b/cmd/thanos/query.go#L358

  fileSDCache.Update(update)
+ dnsProvider.Resolve(ctx, append(fileSDCache.Addresses(), storeAddrs...))
  stores.Update(ctxUpdate)

What do you think? @R4scal @GiedriusS

@bwplotka
Copy link
Member

@jojohappy sounds like reasonable change potentially.

@R4scal
Copy link
Author

R4scal commented Feb 11, 2019

@jojohappy quite right

@GiedriusS
Copy link
Member

@jojohappy care to convert it into a PR? :P If not, I'll do that

@jojohappy
Copy link
Member

@GiedriusS Sorry for the delay. I'll write a PR as soon as possible. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants