Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move the ring package from cortex to dskit #45

Merged
merged 258 commits into from
Sep 29, 2021

Conversation

treid314
Copy link
Contributor

@treid314 treid314 commented Sep 23, 2021

What this PR does: Adds the ring package from cortex to dskit. Also included is the proto package with ParseProtoReader function.

Which issue(s) this PR fixes:

Checklist

  • Tests updated
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

bboreham and others added 30 commits September 1, 2021 11:36
instead of spinning round the loop as fast as possible
This makes the codebase work better together with the Prometheus components
which expect an injected go-kit logger, and also makes things more consistent
overall.

I'm introducing a Cortex-wide shared logger for now so that this change doesn't
become even bigger. Eventually, it would be good to change all Cortex
components to take injected loggers.
* Replace four backoff implementations with common one
* Remove unused WatchPrefix()

* Break out of backoff loop if Context is terminated

Previously we would only stop if the Context signalled cancellation
while we were in a sleep. Now, loops that check `backoff.Ongoing()`
will end if the Context is terminated any time.

A new `backoff.Err()` function is added to report back the reason for termination
* Adjust log levels

Things that are normal occurences are just debugs.

* Don't close stream until we're sure we've got the chunks

This fixes a race condition in `TestIngesterTransfer`.

I don't think it makes the actually-running-in-production situation any worse, and might even make it better.

* Log result of `TransferChunks`
If we don't emit anything then Prometheus takes the last value until
it is stale, i.e. for 5 minutes.
* Unify the replication calculation between query and push.

* Replace 'circle' with 'ring'.

* Deal with ingesters coming / going better.
* Move ingester_lifecycle.go to ring pkg, remove ingester_ prefix from other files in pkg/ingester.

* Decouples the lifecycle of the ingester from the ingester itself, and put it into its own struct.

Laying groundwork for the incremental transfer of token, to unify rolling upgrades and elastic scaling.

* s/Transfer/TransferOut/

Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>
* Add client-side request metrics for consul.

Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>

* s/Put/List/

Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>

* Update github.com/hashicorp/consul

Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>

* Add contexts to the consul api so we can get traces of heartbeats.

Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>

* Moar context.

Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>

* Missed one.

Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>

* Make consul consistent reads optional.

Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>
* Remove bazel.

Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>

* Review feedback.

Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>
* allow specifying ingester port for registration in case it gets transformed

* use "advertise" for clarity of description

* default the advertised port to grpc-listen-port instead of 9095 so that overrides of grpc port work
Should reduce CPU load on ingesters & network load as chunks as compressed.

Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>
* Also make sure the percentage is exposed for ingesters
  that are not having any tokens

Signed-off-by: Goutham Veeramachaneni <gouthamve@gmail.com>
Signed-off-by: Goutham Veeramachaneni <gouthamve@gmail.com>
* Delay more than the minimum ingester queries

If one of the initial queries fails, or takes a long
time, then the extra queries will be executed.

Signed-off-by: Chris Marchbanks <csmarchbanks@gmail.com>
As it says in the comment on `time.After`:
// The underlying Timer is not recovered by the garbage collector
// until the timer fires. If efficiency is a concern, use NewTimer
// instead and call Timer.Stop if the timer is no longer needed.
* Remove leftover build.BAZEL files
* Add test-exporter to .gitignore
* Rename weaveworks/cortex to cortexproject/cortex everywhere

Signed-off-by: Chris Marchbanks <csmarchbanks@gmail.com>
Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>
This should remove the repetition of the ingester ID - a ~25 character string, repeated 128 times (3kb) for each ingester, allocated every 5 seconds.  For a 10 ingester cluster, thats 60KB/s of allocations (1/5s * 10 ingesters * 128 tokens * 10 ingester).  Yes, its N^2.

Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>
Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>
Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>
This allows us to start Cortex on the Mac.

Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>
Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>
Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>
…entation as the ring does.

Also, remove the two instances of KVClient mock hooks.

Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>
Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>
Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>
New version should be slightly more efficient as it avoids creating an
object on every call.

Signed-off-by: Bryan Boreham <bryan@weave.works>
This is to ensure we can set it to 0 in single node mode

Signed-off-by: Goutham Veeramachaneni <gouthamve@gmail.com>
Tyler Reid added 12 commits September 10, 2021 15:24
Signed-off-by: Tyler Reid <tyler.reid@grafana.com>
Signed-off-by: Tyler Reid <tyler.reid@grafana.com>
Signed-off-by: Tyler Reid <tyler.reid@grafana.com>
Signed-off-by: Tyler Reid <tyler.reid@grafana.com>
Signed-off-by: Tyler Reid <tyler.reid@grafana.com>
Signed-off-by: Tyler Reid <tyler.reid@grafana.com>
Signed-off-by: Tyler Reid <tyler.reid@grafana.com>
@treid314
Copy link
Contributor Author

There's still a couple of things that need tidying up regarding package visibility and some other odds and ends. So I'm leaving this as a draft for the time being.


hostname, err := os.Hostname()
if err != nil {
level.Error(log.NewNopLogger()).Log("msg", "failed to get hostname", "err", err)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was originally using the (util_logger) global logger, but now we don't wish to do that. I could add a logger to the config, but I'm not sure if that's the best path here. Right now I just create a new no-op logger which preserves the line, but doesn't log. Is there a system logger or a generic logger I could use in go? Or could I return an error and have that be uncaught since we send an os exit after this log?

Copy link
Contributor

@56quarters 56quarters Sep 27, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you're right, we definitely don't want os.Exit in library code. A panic feels fine in the case we can't look up the hostname of the machine this code is on. That seems like something that Should Never Fail.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure how to deal with this, but using the log.NewNopLogger() is useless.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took @56quarters suggestion and used a panic here.

Copy link
Contributor

@pracucci pracucci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You did a good job! I diffed all files with their Cortex counterparts and left few comments here and there. I didn't comment on metrics with cortex_ prefix because I would suggest to get this PR merged as soon as possible (with the least changes) and then work on removing the cortex_ prefix in a follow up PR.

However, we need a CI step to make sure compiled protos are updated (see check-protos target in Cortex Makefile for reference).

ring/http.go Outdated Show resolved Hide resolved
ring/http.go Outdated Show resolved Hide resolved
if req.Method == http.MethodPost {
ingesterID := req.FormValue("forget")
if err := r.forget(req.Context(), ingesterID); err != nil {
level.Error(r.logger).Log("msg", "error forgetting instance", "err", err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
level.Error(r.logger).Log("msg", "error forgetting instance", "err", err)
level.Error(log.WithContext(req.Context(), log.Logger)).Log("msg", "error forgetting instance", "err", err)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ring/http.go Outdated Show resolved Hide resolved
ring/lifecycler.go Outdated Show resolved Hide resolved
ring/ring.proto Outdated Show resolved Hide resolved
ring/tokens.go Outdated Show resolved Hide resolved
ring/shard/yolo.go Outdated Show resolved Hide resolved
ring/testutils/testutils.go Outdated Show resolved Hide resolved
proto/http.go Outdated Show resolved Hide resolved
Tyler Reid added 5 commits September 28, 2021 15:56
Signed-off-by: Tyler Reid <tyler.reid@grafana.com>
called after all workers finish, which is what we want.
(It is now also called on early error exit.)
Signed-off-by: Tyler Reid <tyler.reid@grafana.com>
@treid314 treid314 marked this pull request as ready for review September 28, 2021 23:11
Copy link
Contributor

@pracucci pracucci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, let's merge it! We can add the check-protos linter in a follow up PR (but please do it 🙏 )

@treid314 treid314 merged commit d49be0b into grafana:main Sep 29, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.