Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

receive: Add liveness and readiness probe #1537

Merged
merged 5 commits into from
Sep 20, 2019

Conversation

kakkoyun
Copy link
Member

This PR,

  • Adds /-/healthy endpoint for liveness checks.
  • Adds /-/ready endpoint for readiness checks.

Changes

  • Adds /-/healthy endpoint for liveness checks.
  • Adds /-/ready endpoint for readiness checks.
  • Uses prober.Prober for readiness and liveness endpoints.

Verification

  1. make test

  2. Started thanos receive and made a request to related endpoints.

curl http://0.0.0.0:10902/-/healthy
thanos receive is healthy%
curl http://0.0.0.0:10902/-/ready
thanos receive is not ready. Reason: thanos receive is initializing
curl http://0.0.0.0:10902/-/ready
thanos receive is ready%

@kakkoyun
Copy link
Member Author

cc @FUSAKLA

Copy link
Member

@bwplotka bwplotka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, Thanks!

@@ -278,6 +284,7 @@ func runReceive(
s := newStoreGRPCServer(logger, reg, tracer, tsdbStore, opts)

level.Info(logger).Log("msg", "listening for StoreAPI gRPC", "address", grpcBindAddr)
statusProber.SetReady()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the receiver should probably not be ready until the TSDB is ready? Also not sure about the hashring and the receive interface is also not guarantied to be up at this point.

Maybe this will require some more complex condition for the ready state 🤔

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@FUSAKLA For the TSDB, it's ready at this stage if you check line 270. It runs after TSDB is open.
For receive interface, I thought if something goes south it'll change liveness state so, readiness won't be needed.

I guess I need to double-check the hashring readiness.

I'll have another look at it.

@kakkoyun
Copy link
Member Author

@FUSAKLA @bwplotka I've updated the logic to set receive ready. Please have another look at it.

Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com>
Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com>
Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com>
Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com>
Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com>
@brancz brancz merged commit 3a6f8e1 into thanos-io:master Sep 20, 2019
@@ -277,6 +290,8 @@ func runReceive(
}
s := newStoreGRPCServer(logger, reg, tracer, tsdbStore, opts)

// Wait hashring to be ready before start serving metrics
<-hashringReady
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we waiting for the hashring to be ready before serving metrics from the store? These things are entirely independent IMO

ivan-kiselev pushed a commit to ivan-kiselev/thanos that referenced this pull request Sep 26, 2019
* Add prober to receive

Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com>

* Add changelog entries

Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com>

* Update README

Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com>

* Remove default

Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com>

* Wait hashring to be ready

Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com>
ivan-kiselev pushed a commit to ivan-kiselev/thanos that referenced this pull request Sep 26, 2019
* Add prober to receive

Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com>

* Add changelog entries

Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com>

* Update README

Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com>

* Remove default

Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com>

* Wait hashring to be ready

Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com>
Signed-off-by: Ivan Kiselev <kiselev_ivan@pm.me>
brancz pushed a commit that referenced this pull request Sep 26, 2019
* Some updates to compact docs

Signed-off-by: Ivan Kiselev <kiselev_ivan@pm.me>

* some formatting

Signed-off-by: Ivan Kiselev <kiselev_ivan@pm.me>

* Update docs/components/compact.md

accept PR suggestions

Co-Authored-By: Bartlomiej Plotka <bwplotka@gmail.com>
Signed-off-by: Ivan Kiselev <kiselev_ivan@pm.me>

* Add metalmatze to list of maintainers (#1547)

Signed-off-by: Matthias Loibl <mail@matthiasloibl.com>
Signed-off-by: Ivan Kiselev <kiselev_ivan@pm.me>

* resolve comments

Signed-off-by: Ivan Kiselev <kiselev_ivan@pm.me>

* resolve last comment

Signed-off-by: Ivan Kiselev <kiselev_ivan@pm.me>

* receive: Add liveness and readiness probe (#1537)

* Add prober to receive

Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com>

* Add changelog entries

Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com>

* Update README

Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com>

* Remove default

Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com>

* Wait hashring to be ready

Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com>
Signed-off-by: Ivan Kiselev <kiselev_ivan@pm.me>

* downsample: Add liveness and readiness probe (#1540)

* Add readiness and liveness probes for downsampler

Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com>

* Add changelog entry

Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com>

* Remove default

Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com>

* Set ready

Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com>

* Update CHANGELOG

Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com>

* Clean CHANGELOG

Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com>
Signed-off-by: Ivan Kiselev <kiselev_ivan@pm.me>

* Document the dnssrvnoa option (#1551)

Signed-off-by: Antonio Santos <antonio@santosvelasco.com>
Signed-off-by: Ivan Kiselev <kiselev_ivan@pm.me>

* feat store: added readiness and livenes prober (#1460)

Signed-off-by: Martin Chodur <m.chodur@seznam.cz>
Signed-off-by: Ivan Kiselev <kiselev_ivan@pm.me>

* Add Hotstar to adopters. (#1553)

It's the largest streaming service in India that does cricket and GoT
for India. They have insane scale and are using Thanos to scale their
Prometheus.

Spoke to them offline about adding the logo and will get a signoff here
too.

Signed-off-by: Goutham Veeramachaneni <gouthamve@gmail.com>
Signed-off-by: Ivan Kiselev <kiselev_ivan@pm.me>

* Fix hotstar logo in the adoptor's list (#1558)

Signed-off-by: Karthik Vijayaraju <karthik@hotstar.com>
Signed-off-by: Ivan Kiselev <kiselev_ivan@pm.me>

* Fix typos, including 'fomrat' -> 'format' in tracing.config-file help text. (#1552)

Signed-off-by: Callum Styan <callumstyan@gmail.com>
Signed-off-by: Ivan Kiselev <kiselev_ivan@pm.me>

* Compactor: Fix for #844 - Ignore object if it is the current directory (#1544)

* Ignore object if it is the current directory

Signed-off-by: Jamie Poole <jimbobby5@yahoo.com>

* Add full-stop

Signed-off-by: Jamie Poole <jimbobby5@yahoo.com>
Signed-off-by: Ivan Kiselev <kiselev_ivan@pm.me>

* Adding doc explaining the importance of groups for compactor (#1555)

Signed-off-by: Leo Meira Vital <leo.vital@nubank.com.br>
Signed-off-by: Ivan Kiselev <kiselev_ivan@pm.me>

* Add blank line for list (#1566)

The format of these files is wrong in the web.

Signed-off-by: dongwenjuan <dong.wenjuan@zte.com.cn>
Signed-off-by: Ivan Kiselev <kiselev_ivan@pm.me>

* Refactor compactor constants, fix bucket column (#1561)

* compact: unify different time constants

Use downsample.* constants where possible. Move the downsampling time
ranges into constants and use them as well.

Signed-off-by: Giedrius Statkevičius <giedriuswork@gmail.com>

* bucket: refactor column calculation into compact

Fix the column's name and name it UNTIL-DOWN because that is what it
actually shows - time until the next downsampling.

Move out the calculation into a separate function into the compact
package. Ideally we could use the retention policies in this calculation
as well but the `bucket` subcommand knows nothing about them :-(

Signed-off-by: Giedrius Statkevičius <giedriuswork@gmail.com>

* compact: fix issues with naming

Reorder the constants and fix mistakes.

Signed-off-by: Giedrius Statkevičius <giedriuswork@gmail.com>
Signed-off-by: Ivan Kiselev <kiselev_ivan@pm.me>

* remove duplicate

Signed-off-by: Ivan Kiselev <kiselev_ivan@pm.me>
GiedriusS pushed a commit that referenced this pull request Oct 28, 2019
* Add prober to receive

Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com>

* Add changelog entries

Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com>

* Update README

Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com>

* Remove default

Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com>

* Wait hashring to be ready

Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com>
Signed-off-by: Giedrius Statkevičius <giedriuswork@gmail.com>
GiedriusS pushed a commit that referenced this pull request Oct 28, 2019
* Some updates to compact docs

Signed-off-by: Ivan Kiselev <kiselev_ivan@pm.me>

* some formatting

Signed-off-by: Ivan Kiselev <kiselev_ivan@pm.me>

* Update docs/components/compact.md

accept PR suggestions

Co-Authored-By: Bartlomiej Plotka <bwplotka@gmail.com>
Signed-off-by: Ivan Kiselev <kiselev_ivan@pm.me>

* Add metalmatze to list of maintainers (#1547)

Signed-off-by: Matthias Loibl <mail@matthiasloibl.com>
Signed-off-by: Ivan Kiselev <kiselev_ivan@pm.me>

* resolve comments

Signed-off-by: Ivan Kiselev <kiselev_ivan@pm.me>

* resolve last comment

Signed-off-by: Ivan Kiselev <kiselev_ivan@pm.me>

* receive: Add liveness and readiness probe (#1537)

* Add prober to receive

Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com>

* Add changelog entries

Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com>

* Update README

Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com>

* Remove default

Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com>

* Wait hashring to be ready

Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com>
Signed-off-by: Ivan Kiselev <kiselev_ivan@pm.me>

* downsample: Add liveness and readiness probe (#1540)

* Add readiness and liveness probes for downsampler

Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com>

* Add changelog entry

Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com>

* Remove default

Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com>

* Set ready

Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com>

* Update CHANGELOG

Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com>

* Clean CHANGELOG

Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com>
Signed-off-by: Ivan Kiselev <kiselev_ivan@pm.me>

* Document the dnssrvnoa option (#1551)

Signed-off-by: Antonio Santos <antonio@santosvelasco.com>
Signed-off-by: Ivan Kiselev <kiselev_ivan@pm.me>

* feat store: added readiness and livenes prober (#1460)

Signed-off-by: Martin Chodur <m.chodur@seznam.cz>
Signed-off-by: Ivan Kiselev <kiselev_ivan@pm.me>

* Add Hotstar to adopters. (#1553)

It's the largest streaming service in India that does cricket and GoT
for India. They have insane scale and are using Thanos to scale their
Prometheus.

Spoke to them offline about adding the logo and will get a signoff here
too.

Signed-off-by: Goutham Veeramachaneni <gouthamve@gmail.com>
Signed-off-by: Ivan Kiselev <kiselev_ivan@pm.me>

* Fix hotstar logo in the adoptor's list (#1558)

Signed-off-by: Karthik Vijayaraju <karthik@hotstar.com>
Signed-off-by: Ivan Kiselev <kiselev_ivan@pm.me>

* Fix typos, including 'fomrat' -> 'format' in tracing.config-file help text. (#1552)

Signed-off-by: Callum Styan <callumstyan@gmail.com>
Signed-off-by: Ivan Kiselev <kiselev_ivan@pm.me>

* Compactor: Fix for #844 - Ignore object if it is the current directory (#1544)

* Ignore object if it is the current directory

Signed-off-by: Jamie Poole <jimbobby5@yahoo.com>

* Add full-stop

Signed-off-by: Jamie Poole <jimbobby5@yahoo.com>
Signed-off-by: Ivan Kiselev <kiselev_ivan@pm.me>

* Adding doc explaining the importance of groups for compactor (#1555)

Signed-off-by: Leo Meira Vital <leo.vital@nubank.com.br>
Signed-off-by: Ivan Kiselev <kiselev_ivan@pm.me>

* Add blank line for list (#1566)

The format of these files is wrong in the web.

Signed-off-by: dongwenjuan <dong.wenjuan@zte.com.cn>
Signed-off-by: Ivan Kiselev <kiselev_ivan@pm.me>

* Refactor compactor constants, fix bucket column (#1561)

* compact: unify different time constants

Use downsample.* constants where possible. Move the downsampling time
ranges into constants and use them as well.

Signed-off-by: Giedrius Statkevičius <giedriuswork@gmail.com>

* bucket: refactor column calculation into compact

Fix the column's name and name it UNTIL-DOWN because that is what it
actually shows - time until the next downsampling.

Move out the calculation into a separate function into the compact
package. Ideally we could use the retention policies in this calculation
as well but the `bucket` subcommand knows nothing about them :-(

Signed-off-by: Giedrius Statkevičius <giedriuswork@gmail.com>

* compact: fix issues with naming

Reorder the constants and fix mistakes.

Signed-off-by: Giedrius Statkevičius <giedriuswork@gmail.com>
Signed-off-by: Ivan Kiselev <kiselev_ivan@pm.me>

* remove duplicate

Signed-off-by: Ivan Kiselev <kiselev_ivan@pm.me>
Signed-off-by: Giedrius Statkevičius <giedriuswork@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants