Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

transport.ErrConnDrain to client is confusing #1673

Closed
gyuho opened this issue Nov 14, 2017 · 2 comments
Closed

transport.ErrConnDrain to client is confusing #1673

gyuho opened this issue Nov 14, 2017 · 2 comments
Assignees

Comments

@gyuho
Copy link
Contributor

gyuho commented Nov 14, 2017

What version of gRPC are you using?

v1.7.2

What version of Go are you using (go version)?

Go 1.9.2

What did you do?

Run etcd client integration tests on gRPC dial path

(TestSetEndpointAndPut , https://github.com/coreos/etcd/blob/6f077bd74c5714dfa9221d8f3cb84f856e36ea42/clientv3/integration/dial_test.go#L193-L205)

Here's the code path that's being tested

  1. notify addresses A to gRPC balancer
  2. gRPC connects to A
  3. etcd client switches endpoints to B
    • notify B to gRPC balancer
    • tearDown(errConnDrain) on A

      grpc-go/clientconn.go

      Lines 1119 to 1130 in cda8a00

      func (ac *addrConn) tearDown(err error) {
      ac.cancel()
      ac.mu.Lock()
      ac.curAddr = resolver.Address{}
      defer ac.mu.Unlock()
      if err == errConnDrain && ac.transport != nil {
      // GracefulClose(...) may be executed multiple times when
      // i) receiving multiple GoAway frames from the server; or
      // ii) there are concurrent name resolver/Balancer triggered
      // address removal and GoAway.
      ac.transport.GracefulClose()
      }
    • *http2Client returns transport.ErrStreamDrain in
      if t.state == draining {
      t.mu.Unlock()
      return nil, ErrStreamDrain
      }

What did you expect to see?

transport.ErrConnClosing

What did you see instead?

transport.ErrConnDrain (most of the time, we get transport.ErrConnClosing but this still returns to our client intermittently)

This is confusing, because I thought, with latest balancer changes (in v1.7.x), gRPC does not return connection drain error to clients (see #1649). Now our clients are getting transport.ErrConnDrain.

And the error message the server stops accepting new RPCs is confusing, since this is an error returned from *http2Client, not from server.

What's more confusing is now we have 8ff8683#diff-3081f1aaaf958d7d5e00ccec7a62ea6fR413 that might have fixed this issue.

Are transport.ErrConnDrain errors to clients expected?

c.f. etcd-io/etcd#8859 and etcd-io/etcd#8869

Thanks.

/cc @xiang90

@xiang90
Copy link
Contributor

xiang90 commented Dec 14, 2017

@dfawley @menghanl

any update on this one?

@menghanl
Copy link
Contributor

menghanl commented Feb 1, 2018

The draining message was added to support server graceful close. The error message was valid at that time. It was reused when an address is removed by a balancer, ant the message became invalid and confusing in this.
I filed #1844 to fix the error message. But it would be a behavior change, and since you are doing string matching in tests, you might also need to update your tests accordingly. Let me know what you think on the PR.

The fix (8ff8683#diff-3081f1aaaf958d7d5e00ccec7a62ea6fR413) you mentioned was to implement transparent retry.

With transparent retry, when an RPC fails because the picked connection is draining, the RPC will be retried one more time. Since ClientConn always tries to pick a healthy connection to send the RPC, in this retry, another connection would be picked and the RPC will succeed if the second picked connection is working. So even though you may see that the problem is "fixed", it's not. It's just made to be less likely to happen.

@menghanl menghanl closed this as completed Feb 6, 2018
@lock lock bot locked as resolved and limited conversation to collaborators Sep 26, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

4 participants