Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify prober #1694

Merged
merged 3 commits into from
Nov 26, 2019
Merged

Simplify prober #1694

merged 3 commits into from
Nov 26, 2019

Conversation

kakkoyun
Copy link
Member

@kakkoyun kakkoyun commented Oct 29, 2019

This PR aims to simplify prober implementation by keeping its existing behaviour.
It also tries to simplify API and tries to use more idiomatic go naming for methods.
It also moves prober API touchpoints to server/http call sites to make the usage more explicit.
These changes are purely cosmetic and just to increase the maintainability of the package, you can ignore this PR, if you think the changes don't provide any value.

Signed-off-by: Kemal Akkoyun kakkoyun@gmail.com

  • I added CHANGELOG entry for this change.
  • Change is not relevant to the end user.

Changes

  • Refactor prober and its usages for sake of simplicity.

Verification

Runs local test to ensure nothing changed, make test-local.

@kakkoyun kakkoyun force-pushed the simplify_probe branch 2 times, most recently from 750afc7 to 2aee205 Compare November 4, 2019 12:44
@kakkoyun kakkoyun marked this pull request as ready for review November 4, 2019 13:10
@kakkoyun
Copy link
Member Author

kakkoyun commented Nov 4, 2019

cc @bwplotka @squat @FUSAKLA

Copy link
Member

@bwplotka bwplotka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome, Good work. Some comments, but overall LGTM (:

cc @FUSAKLA as you initially wrote this part.

CHANGELOG.md Outdated Show resolved Hide resolved
cmd/thanos/query.go Show resolved Hide resolved
cmd/thanos/query.go Show resolved Hide resolved
cmd/thanos/receive.go Show resolved Hide resolved
cmd/thanos/rule.go Show resolved Hide resolved
cmd/thanos/sidecar.go Show resolved Hide resolved
cmd/thanos/store.go Show resolved Hide resolved
@kakkoyun
Copy link
Member Author

kakkoyun commented Nov 5, 2019

Fixed the issues, now we only set Ready and not NotReady in a single rungroup actor unless we have special logic for that component. If that component only exposes http endpoint we do it around httpserver start. If the component also exposes grpc endpoint, we utilize grpcserver actor.

@kakkoyun
Copy link
Member Author

ping @FUSAKLA

Copy link
Member

@FUSAKLA FUSAKLA left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great! 🎉
Sorry for the lag

I like it, makes it simpler. Just few questions about the changes of setting the ready status because it's not 100% clear to me how the run package handles the errors.

Otherwise 👍

@@ -404,10 +413,9 @@ func runQuery(
)

g.Add(func() error {
statusProber.SetReady()
statusProber.Ready()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's feels bit confusing to set ready on gRPC start but not ready on HTTP stop 🤔
What is the motivation for this? Sorry I'm maybe missing something out :/

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Each component actually has a different life-cycle, as you already know, they set their ready state with a different flow. Some of the components have more complicated logic, e.g receive.

For the rest, just to set a convention I decided to set ready just before the component ready to serve gRPC traffic if they serve gRPC because http starts to serve slightly faster than gRPC and usually there are other steps before gRPC server initialized. Otherwise, I stick to the http being served since all of them serve HTTP.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Furthermore, questions concerning oklog/run package, AFAIK whenever an error happens in a rungroup all of the error handlers get called sequentially with registration order, and under normal circumstances, this is a very fast process. However, we recently introduced the graceful shutdown of gRPC and HTTP servers. With those changes they can wait for connections to drain, up to 2m (which is the default query timeout). Since we always register the HTTP server first, I decided to set the probe to not ready, when HTTP server gets shut down, and stop receiving traffic.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for clearing this up! LGTM 👍

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Plus it's worth to note that if HTTP server is drained, we are not ready anyway.

component: component,
logger: logger,
status: prometheus.NewGaugeVec(prometheus.GaugeOpts{
Name: "status",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The metric name change should be probably mentioned in changelog?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 I'll update to change log.

cmd/thanos/query.go Show resolved Hide resolved
Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com>
Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com>
Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com>
Copy link
Member

@bwplotka bwplotka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! LGTM (:

@@ -404,10 +413,9 @@ func runQuery(
)

g.Add(func() error {
statusProber.SetReady()
statusProber.Ready()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Plus it's worth to note that if HTTP server is drained, we are not ready anyway.

@bwplotka bwplotka merged commit 825f119 into thanos-io:master Nov 26, 2019
@kakkoyun kakkoyun deleted the simplify_probe branch November 26, 2019 08:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants