Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ingester: split push and read circuit breakers #8315

Merged
merged 12 commits into from
Jun 11, 2024

Conversation

duricanikolic
Copy link
Contributor

@duricanikolic duricanikolic commented Jun 8, 2024

What this PR does

In #8180 and #8285 we introduced an ingester server-side circuit breaker for both push and read requests together. This PR splits the introduced circuit breaker into 2 different circuit breakers: one for push and the other for read requests only.

They can be enabled by -ingester.push-circuit-breaker.enabled and -ingester.read-circuit-breaker.enabled configuration options. Other configuration options are:
- -ingester.push-circuit-breaker.failure-threshold-percentage
- -ingester.push-circuit-breaker.failure-execution-threshold
- -ingester.push-circuit-breaker.thresholding-period
- -ingester.push-circuit-breaker.cooldown-period
- -ingester.push-circuit-breaker.initial-delay
- -ingester.push-circuit-breaker.request-timeout
- -ingester.read-circuit-breaker.failure-threshold-percentage
- -ingester.read-circuit-breaker.failure-execution-threshold
- -ingester.read-circuit-breaker.thresholding-period
- -ingester.read-circuit-breaker.cooldown-period
- -ingester.read-circuit-breaker.initial-delay
- -ingester.read-circuit-breaker.request-timeout

The following experimental configuration options introduced in #8180 and #8285 have been removed:
- -ingester.push-circuit-breaker.enabled
- -ingester.push-circuit-breaker.failure-threshold-percentage
- -ingester.push-circuit-breaker.failure-execution-threshold
- -ingester.push-circuit-breaker.thresholding-period
- -ingester.push-circuit-breaker.cooldown-period
- -ingester.push-circuit-breaker.initial-delay
- -ingester.push-circuit-breaker.push-timeout
- -ingester.push-circuit-breaker.read-timeout

Which issue(s) this PR fixes or relates to

Fixes #

Checklist

  • Tests updated.
  • Documentation added.
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX].
  • about-versioning.md updated with experimental features.

Signed-off-by: Yuri Nikolic <durica.nikolic@grafana.com>
Signed-off-by: Yuri Nikolic <durica.nikolic@grafana.com>
@duricanikolic duricanikolic self-assigned this Jun 8, 2024
@duricanikolic duricanikolic requested review from jdbaldry and a team as code owners June 8, 2024 17:34
Signed-off-by: Yuri Nikolic <durica.nikolic@grafana.com>
Signed-off-by: Yuri Nikolic <durica.nikolic@grafana.com>
Signed-off-by: Yuri Nikolic <durica.nikolic@grafana.com>
…it()

Signed-off-by: Yuri Nikolic <durica.nikolic@grafana.com>
Signed-off-by: Yuri Nikolic <durica.nikolic@grafana.com>
Signed-off-by: Yuri Nikolic <durica.nikolic@grafana.com>
Signed-off-by: Yuri Nikolic <durica.nikolic@grafana.com>
Copy link
Member

@pstibrany pstibrany left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see much value for adding *ingesterCircuitBreaker, it makes everything more cumbersome.

I’d suggest to drop ingesterCircuitBreaker type, and simply use two *circuitBreaker implementations directly from ingester. If you want to use callback to release permit, then do it at (cb *circuitBreaker) tryAcquirePermit() method, not in (cb *ingesterCircuitBreaker) tryPushAcquirePermit().

I’d also make newCircuitBreakerMetrics work with single circuit breaker only (by adding ConstLabel request_type to all metrics — we can register multiple metrics with same name/help/labels, but different const label. example.

WDYT?

pkg/ingester/circuitbreaker.go Outdated Show resolved Hide resolved
pkg/ingester/circuitbreaker.go Outdated Show resolved Hide resolved
)

type circuitBreakerMetrics struct {
circuitBreakerTransitions *prometheus.CounterVec
circuitBreakerResults *prometheus.CounterVec
}

func newCircuitBreakerMetrics(r prometheus.Registerer, currentStateFn func() circuitbreaker.State) *circuitBreakerMetrics {
func newCircuitBreakerMetrics(r prometheus.Registerer, currentStateFn func(string) circuitbreaker.State, requestTypes []string) *circuitBreakerMetrics {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit:

Suggested change
func newCircuitBreakerMetrics(r prometheus.Registerer, currentStateFn func(string) circuitbreaker.State, requestTypes []string) *circuitBreakerMetrics {
func newCircuitBreakerMetrics(r prometheus.Registerer, currentState func(string) circuitbreaker.State, requestTypes []string) *circuitBreakerMetrics {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have removed all the occurrences of Fn in this source

}

circuitBreakerTransitionsCounterFn := func(metrics *circuitBreakerMetrics, state circuitbreaker.State) prometheus.Counter {
return metrics.circuitBreakerTransitions.WithLabelValues(state.String())
circuitBreakerTransitionsCounterFn := func(metrics *circuitBreakerMetrics, requestType string, state circuitbreaker.State) prometheus.Counter {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
circuitBreakerTransitionsCounterFn := func(metrics *circuitBreakerMetrics, requestType string, state circuitbreaker.State) prometheus.Counter {
circuitBreakerTransitionsCounter := func(metrics *circuitBreakerMetrics, requestType string, state circuitbreaker.State) prometheus.Counter {

@duricanikolic
Copy link
Contributor Author

I don't see much value for adding *ingesterCircuitBreaker, it makes everything more cumbersome.

I’d suggest to drop ingesterCircuitBreaker type, and simply use two *circuitBreaker implementations directly from ingester. If you want to use callback to release permit, then do it at (cb *circuitBreaker) tryAcquirePermit() method, not in (cb *ingesterCircuitBreaker) tryPushAcquirePermit().

I’d also make newCircuitBreakerMetrics work with single circuit breaker only (by adding ConstLabel request_type to all metrics — we can register multiple metrics with same name/help/labels, but different const label. example.

WDYT?

In a slack conversation @pstibrany and I agreed to keep the ingesterCircuitBreaker struct and to get rid of some checks.

}
circuitBreakerCurrentStateGaugeFn := func(state circuitbreaker.State) prometheus.GaugeFunc {
circuitBreakerCurrentStateGaugeFn := func(requestType string, state circuitbreaker.State) prometheus.GaugeFunc {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
circuitBreakerCurrentStateGaugeFn := func(requestType string, state circuitbreaker.State) prometheus.GaugeFunc {
circuitBreakerCurrentStateGauge := func(requestType string, state circuitbreaker.State) prometheus.GaugeFunc {

or

Suggested change
circuitBreakerCurrentStateGaugeFn := func(requestType string, state circuitbreaker.State) prometheus.GaugeFunc {
circuitBreakerCurrentStateGaugeFunc := func(requestType string, state circuitbreaker.State) prometheus.GaugeFunc {

If we want to be consistent with the type name.

}
return func(duration time.Duration, err error) {
if pushAcquiredPermit {
_ = cb.push.finishRequest(duration, cb.push.cfg.RequestTimeout, err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to pass to cb.push.foo() a config from cb.push.cfg?
I think we can refactor finishRequest to read it's own config.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A good catch. Thank you

if err != nil {
return nil, err
}

finishReadRequest := func(err error) {
if acquiredCircuitBreakerPermit {
i.circuitBreaker.finishReadRequest(time.Since(start), err)
if finish != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

finish can't be nil, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right

@@ -1004,8 +1006,8 @@ func (i *Ingester) FinishPushRequest(ctx context.Context) {
if st.requestSize > 0 {
i.inflightPushRequestsBytes.Sub(st.requestSize)
}
if st.acquiredCircuitBreakerPermit {
i.circuitBreaker.finishPushRequest(st.requestDuration, st.pushErr)
if st.requestFinish != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be nil?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are, again, right. It cannot be nil.

Signed-off-by: Yuri Nikolic <durica.nikolic@grafana.com>
pkg/ingester/circuitbreaker.go Outdated Show resolved Hide resolved
pkg/ingester/circuitbreaker.go Outdated Show resolved Hide resolved
pkg/ingester/circuitbreaker.go Outdated Show resolved Hide resolved
Signed-off-by: Yuri Nikolic <durica.nikolic@grafana.com>
Copy link
Member

@pstibrany pstibrany left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

haven't checked tests yet, but code changes look good to me.

// tryPushAcquirePermit tries to acquire a permit to use the push circuit breaker and returns whether a permit was acquired.
// If it was possible, tryPushAcquirePermit returns a function that should be called to release the acquired permit.
// If it was not possible, the causing error is returned.
func (cb *ingesterCircuitBreaker) tryPushAcquirePermit() (func(time.Duration, error), error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:do we need the comment here?

// If it was not possible, the causing error is returned.
func (cb *ingesterCircuitBreaker) tryReadAcquirePermit() (func(time.Duration, error), error) {
// If the read circuit breaker is not active, we don't try to acquire a permit.
if !cb.read.isActive() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this is now checked twice, right? once here, second time in tryAcquirePermit.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's true

// tryReadAcquirePermit tries to acquire a permit to use the read circuit breaker and returns whether a permit was acquired.
// If it was possible, tryReadAcquirePermit returns a function that should be called to release the acquired permit.
// If it was not possible, the causing error is returned.
func (cb *ingesterCircuitBreaker) tryReadAcquirePermit() (func(time.Duration, error), error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we simplify comment and just say that it's also checking push CB (but doesn't get permit)

Comment on lines 276 to 286
// tryPushAcquirePermit tries to acquire a permit to use the push circuit breaker and returns whether a permit was acquired.
// If it was possible, tryPushAcquirePermit returns a function that should be called to release the acquired permit.
// If it was not possible, the causing error is returned.
func (cb *ingesterCircuitBreaker) tryPushAcquirePermit() (func(time.Duration, error), error) {
return cb.push.tryAcquirePermit()
}

// tryReadAcquirePermit tries to acquire a permit to use the read circuit breaker and returns whether a permit was acquired.
// If it was possible, tryReadAcquirePermit returns a function that should be called to release the acquired permit.
// If it was not possible, the causing error is returned.
func (cb *ingesterCircuitBreaker) tryReadAcquirePermit() (func(time.Duration, error), error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// tryPushAcquirePermit tries to acquire a permit to use the push circuit breaker and returns whether a permit was acquired.
// If it was possible, tryPushAcquirePermit returns a function that should be called to release the acquired permit.
// If it was not possible, the causing error is returned.
func (cb *ingesterCircuitBreaker) tryPushAcquirePermit() (func(time.Duration, error), error) {
return cb.push.tryAcquirePermit()
}
// tryReadAcquirePermit tries to acquire a permit to use the read circuit breaker and returns whether a permit was acquired.
// If it was possible, tryReadAcquirePermit returns a function that should be called to release the acquired permit.
// If it was not possible, the causing error is returned.
func (cb *ingesterCircuitBreaker) tryReadAcquirePermit() (func(time.Duration, error), error) {
// tryAcquirePushPermit tries to acquire a permit to use the push circuit breaker and returns whether a permit was acquired.
// If it was possible, tryAcquirePushPermit returns a function that should be called to release the acquired permit.
// If it was not possible, the causing error is returned.
func (cb *ingesterCircuitBreaker) tryAcquirePushPermit() (func(time.Duration, error), error) {
return cb.push.tryAcquirePermit()
}
// tryAcquireReadPermit tries to acquire a permit to use the read circuit breaker and returns whether a permit was acquired.
// If it was possible, tryAcquireReadPermit returns a function that should be called to release the acquired permit.
// If it was not possible, the causing error is returned.
func (cb *ingesterCircuitBreaker) tryAcquireReadPermit() (func(time.Duration, error), error) {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have changed the method names. They actually sound better.

pkg/ingester/errors_test.go Outdated Show resolved Hide resolved
@@ -356,7 +358,7 @@ type Ingester struct {
ingestPartitionID int32
ingestPartitionLifecycler *ring.PartitionInstanceLifecycler

circuitBreaker *circuitBreaker
circuitBreaker *ingesterCircuitBreaker
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no need for this to be a pointer, and no need to return a pointer from newIngesterCircuitBreaker: if we just hold a ingesterCircuitBreaker struct here, we'll also handle the zero-valued case gracefully (as nil cb's inside of ingesterCircuitBreaker are already valid)

Signed-off-by: Yuri Nikolic <durica.nikolic@grafana.com>
@duricanikolic duricanikolic merged commit ee069d2 into main Jun 11, 2024
29 checks passed
@duricanikolic duricanikolic deleted the yuri/ingester-server-cb branch June 11, 2024 15:15
duricanikolic added a commit that referenced this pull request Jun 11, 2024
* Ingester: splitting push and read circuit breakers

Signed-off-by: Yuri Nikolic <durica.nikolic@grafana.com>

* Improving TestIngester_StartReadRequest

Signed-off-by: Yuri Nikolic <durica.nikolic@grafana.com>

* Updating documentation

Signed-off-by: Yuri Nikolic <durica.nikolic@grafana.com>

* Fixing lint issues

Signed-off-by: Yuri Nikolic <durica.nikolic@grafana.com>

* Fixing documentation issues

Signed-off-by: Yuri Nikolic <durica.nikolic@grafana.com>

* Do not call tryAcquirePermit() on the push cb from tryReadAcquirePermit()

Signed-off-by: Yuri Nikolic <durica.nikolic@grafana.com>

* Rename prCircuitBreaker into ingesterCircuitBreaker

Signed-off-by: Yuri Nikolic <durica.nikolic@grafana.com>

* Rename label name path to request_type and label value write to push

Signed-off-by: Yuri Nikolic <durica.nikolic@grafana.com>

* Do not allow acquiring permit if cbs are not active

Signed-off-by: Yuri Nikolic <durica.nikolic@grafana.com>

* Fixing review findings

Signed-off-by: Yuri Nikolic <durica.nikolic@grafana.com>

* Fixing review findings

Signed-off-by: Yuri Nikolic <durica.nikolic@grafana.com>

* Fixing review findings

Signed-off-by: Yuri Nikolic <durica.nikolic@grafana.com>

---------

Signed-off-by: Yuri Nikolic <durica.nikolic@grafana.com>
grafanabot pushed a commit that referenced this pull request Jun 12, 2024
* Ingester: splitting push and read circuit breakers

Signed-off-by: Yuri Nikolic <durica.nikolic@grafana.com>

* Improving TestIngester_StartReadRequest

Signed-off-by: Yuri Nikolic <durica.nikolic@grafana.com>

* Updating documentation

Signed-off-by: Yuri Nikolic <durica.nikolic@grafana.com>

* Fixing lint issues

Signed-off-by: Yuri Nikolic <durica.nikolic@grafana.com>

* Fixing documentation issues

Signed-off-by: Yuri Nikolic <durica.nikolic@grafana.com>

* Do not call tryAcquirePermit() on the push cb from tryReadAcquirePermit()

Signed-off-by: Yuri Nikolic <durica.nikolic@grafana.com>

* Rename prCircuitBreaker into ingesterCircuitBreaker

Signed-off-by: Yuri Nikolic <durica.nikolic@grafana.com>

* Rename label name path to request_type and label value write to push

Signed-off-by: Yuri Nikolic <durica.nikolic@grafana.com>

* Do not allow acquiring permit if cbs are not active

Signed-off-by: Yuri Nikolic <durica.nikolic@grafana.com>

* Fixing review findings

Signed-off-by: Yuri Nikolic <durica.nikolic@grafana.com>

* Fixing review findings

Signed-off-by: Yuri Nikolic <durica.nikolic@grafana.com>

* Fixing review findings

Signed-off-by: Yuri Nikolic <durica.nikolic@grafana.com>

---------

Signed-off-by: Yuri Nikolic <durica.nikolic@grafana.com>
(cherry picked from commit ee069d2)
@grafanabot
Copy link
Contributor

The backport to r294 failed:

The process '/usr/bin/git' failed with exit code 1

To backport manually, run these commands in your terminal:

# Fetch latest updates from GitHub
git fetch
# Create a new branch
git switch --create backport-8315-to-r294 origin/r294
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x ee069d247c7c22848ee668027c432a97e120cd9f
# Push it to GitHub
git push --set-upstream origin backport-8315-to-r294
git switch main
# Remove the local backport branch
git branch -D backport-8315-to-r294

Then, create a pull request where the base branch is r294 and the compare/head branch is backport-8315-to-r294.

grafanabot pushed a commit that referenced this pull request Jun 12, 2024
* Ingester: splitting push and read circuit breakers

Signed-off-by: Yuri Nikolic <durica.nikolic@grafana.com>

* Improving TestIngester_StartReadRequest

Signed-off-by: Yuri Nikolic <durica.nikolic@grafana.com>

* Updating documentation

Signed-off-by: Yuri Nikolic <durica.nikolic@grafana.com>

* Fixing lint issues

Signed-off-by: Yuri Nikolic <durica.nikolic@grafana.com>

* Fixing documentation issues

Signed-off-by: Yuri Nikolic <durica.nikolic@grafana.com>

* Do not call tryAcquirePermit() on the push cb from tryReadAcquirePermit()

Signed-off-by: Yuri Nikolic <durica.nikolic@grafana.com>

* Rename prCircuitBreaker into ingesterCircuitBreaker

Signed-off-by: Yuri Nikolic <durica.nikolic@grafana.com>

* Rename label name path to request_type and label value write to push

Signed-off-by: Yuri Nikolic <durica.nikolic@grafana.com>

* Do not allow acquiring permit if cbs are not active

Signed-off-by: Yuri Nikolic <durica.nikolic@grafana.com>

* Fixing review findings

Signed-off-by: Yuri Nikolic <durica.nikolic@grafana.com>

* Fixing review findings

Signed-off-by: Yuri Nikolic <durica.nikolic@grafana.com>

* Fixing review findings

Signed-off-by: Yuri Nikolic <durica.nikolic@grafana.com>

---------

Signed-off-by: Yuri Nikolic <durica.nikolic@grafana.com>
(cherry picked from commit ee069d2)
duricanikolic added a commit that referenced this pull request Jun 12, 2024
* Ingester: splitting push and read circuit breakers

Signed-off-by: Yuri Nikolic <durica.nikolic@grafana.com>

* Improving TestIngester_StartReadRequest

Signed-off-by: Yuri Nikolic <durica.nikolic@grafana.com>

* Updating documentation

Signed-off-by: Yuri Nikolic <durica.nikolic@grafana.com>

* Fixing lint issues

Signed-off-by: Yuri Nikolic <durica.nikolic@grafana.com>

* Fixing documentation issues

Signed-off-by: Yuri Nikolic <durica.nikolic@grafana.com>

* Do not call tryAcquirePermit() on the push cb from tryReadAcquirePermit()

Signed-off-by: Yuri Nikolic <durica.nikolic@grafana.com>

* Rename prCircuitBreaker into ingesterCircuitBreaker

Signed-off-by: Yuri Nikolic <durica.nikolic@grafana.com>

* Rename label name path to request_type and label value write to push

Signed-off-by: Yuri Nikolic <durica.nikolic@grafana.com>

* Do not allow acquiring permit if cbs are not active

Signed-off-by: Yuri Nikolic <durica.nikolic@grafana.com>

* Fixing review findings

Signed-off-by: Yuri Nikolic <durica.nikolic@grafana.com>

* Fixing review findings

Signed-off-by: Yuri Nikolic <durica.nikolic@grafana.com>

* Fixing review findings

Signed-off-by: Yuri Nikolic <durica.nikolic@grafana.com>

---------

Signed-off-by: Yuri Nikolic <durica.nikolic@grafana.com>
(cherry picked from commit ee069d2)

Co-authored-by: Đurica Yuri Nikolić <durica.nikolic@grafana.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants