Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Backoff on reestablishing watches when Unavailable errors are encountered #9840

Merged
merged 1 commit into from
Jun 14, 2018
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 14 additions & 0 deletions clientv3/client.go
Original file line number Diff line number Diff line change
Expand Up @@ -527,6 +527,20 @@ func isHaltErr(ctx context.Context, err error) bool {
return ev.Code() != codes.Unavailable && ev.Code() != codes.Internal
}

// isUnavailableErr returns true if the given error is an unavailable error
func isUnavailableErr(ctx context.Context, err error) bool {
if ctx != nil && ctx.Err() != nil {
return false
}
if err == nil {
return false
}
ev, _ := status.FromError(err)
// Unavailable codes mean the system will be right back.
// (e.g., can't connect, lost leader)
return ev.Code() == codes.Unavailable
}

func toErr(ctx context.Context, err error) error {
if err == nil {
return nil
Expand Down
14 changes: 14 additions & 0 deletions clientv3/watch.go
Original file line number Diff line number Diff line change
Expand Up @@ -830,10 +830,13 @@ func (w *watchGrpcStream) joinSubstreams() {
}
}

var maxBackoff = 100 * time.Millisecond

// openWatchClient retries opening a watch client until success or halt.
// manually retry in case "ws==nil && err==nil"
// TODO: remove FailFast=false
func (w *watchGrpcStream) openWatchClient() (ws pb.Watch_WatchClient, err error) {
backoff := time.Millisecond
for {
select {
case <-w.ctx.Done():
Expand All @@ -849,6 +852,17 @@ func (w *watchGrpcStream) openWatchClient() (ws pb.Watch_WatchClient, err error)
if isHaltErr(w.ctx, err) {
return nil, v3rpc.Error(err)
}
if isUnavailableErr(w.ctx, err) {
// retry, but backoff
if backoff < maxBackoff {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

abstract this out as a retry func with test?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd probably save abstracting it for if we wanted to use this elsewhere... it sounded like there were more systemic retry improvements in progress, so it's possible this will get dropped in the future. I was mostly looking for something minimal we could pick to 3.2.x and 3.3.x streams to alleviate this specific hotloop.

// 25% backoff factor
backoff = backoff + backoff/4
if backoff > maxBackoff {
backoff = maxBackoff
}
}
time.Sleep(backoff)
}
}
return ws, nil
}
Expand Down