-
Notifications
You must be signed in to change notification settings - Fork 9.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
grpcproxy: fix memberlist results not update when proxy node down #15835
grpcproxy: fix memberlist results not update when proxy node down #15835
Conversation
case endpoints.Delete: | ||
delete(cp.umap, up.Endpoint.Addr) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
when op delete, up.Endpoint.Addr is not specified, can not delete the endpoint from map, use up.Key instead
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good, I think using Key is ok because it's already used in resolver.watch https://github.com/etcd-io/etcd/blob/249c0d71d4e106e0d2f5576b9638f2e8bd75ca47/client/v3/naming/resolver/resolver.go#LL97C6-L97C12
Not sure why Endpoint.Addr was used initially
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, but needs a test.
case endpoints.Add:
is covered in TestClusterProxyMemberList. But case endpoints.Delete
isn't covered at all.
case endpoints.Delete: | ||
delete(cp.umap, up.Endpoint.Addr) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good, I think using Key is ok because it's already used in resolver.watch https://github.com/etcd-io/etcd/blob/249c0d71d4e106e0d2f5576b9638f2e8bd75ca47/client/v3/naming/resolver/resolver.go#LL97C6-L97C12
Not sure why Endpoint.Addr was used initially
87355e9
to
eb54d18
Compare
tests added |
PTAL, it's ready for review @lavacat |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please cleanup the logic and add comments to any non-obvious test setup and checks.
There is also TestRegister test that uses NewWatchChannel to listen to updates, it might be easier to update. Up to you.
@@ -90,6 +129,27 @@ func (cts *clusterproxyTestServer) close(t *testing.T) { | |||
} | |||
} | |||
|
|||
func registDelayDeleteMember(lg *zap.Logger, endpoints []string, addr string, delay time.Duration, ttl int, t *testing.T) chan struct{} { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: registerMemberWithTTL
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed this func in new code, use DeleteEndpoint is better
time.Sleep(time.Second) | ||
donec := make(chan struct{}) | ||
go func() { | ||
time.Sleep(delay) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is the point this goroutine if it's just waiting? I think this can be removed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this goroutine used to delay close the etcd client, if client is closed, lease will expire soon, then the key which lease attached will delete, it is complex, use DeleteEndpoint is more convenient
} | ||
|
||
//check del member succ | ||
time.Sleep(wait) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wait
> delay
. What is the point if delay
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use delay then test case can check proxy member has successfully updated
Endpoints: endpoints, | ||
DialTimeout: 5 * time.Second, | ||
} | ||
client, err := integration2.NewClient(t, cfg) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to pass client from the test without creating a new one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, it is
eb54d18
to
2fd92e4
Compare
TestRegister only cover watch channel logic, TestClusterProxyMemberList cover member list api and watch channel. |
if len(mresp.Members) != 2 { | ||
t.Fatalf("len(mresp.Members) expected 2, got %d (%+v)", len(mresp.Members), mresp.Members) | ||
} | ||
if mresp.Members[1].ClientURLs[0] != newMemberAddr { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I think Members
is populated from a map, so order isn't guarantied. Might cause test flakiness.
Can we add a check for Members[0] and add a comment about order?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
map can not guarantee order, we should check if the new member is in mresp.Member
, codes have updated, PTAL
I wonder if anything can be done with those |
2fd92e4
to
c50e7bd
Compare
proxy watch specific prefix for members update, when register or delete members by put/delete key, it updates members by process watch resp, it is asynchronous, so use |
@@ -67,6 +71,47 @@ func TestClusterProxyMemberList(t *testing.T) { | |||
if mresp.Members[0].ClientURLs[0] != cts.caddr { | |||
t.Fatalf("mresp.Members[0].ClientURLs[0] expected %q, got %q", cts.caddr, mresp.Members[0].ClientURLs[0]) | |||
} | |||
|
|||
//test proxy member up/down |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: test proxy member add
} | ||
|
||
succ := false | ||
for _, member := range mresp.Members { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: replace with
hostname, _ := os.Hostname()
...
assert.Contains(t, mresp.Members, &pb.Member{Name: hostname, ClientURLs: []string{newMemberAddr}})
@@ -67,6 +71,47 @@ func TestClusterProxyMemberList(t *testing.T) { | |||
if mresp.Members[0].ClientURLs[0] != cts.caddr { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: assert.Contains(t, mresp.Members, &pb.Member{Name: hostname, ClientURLs: []string{cts.caddr}})
if len(mresp.Members) != 1 { | ||
t.Fatalf("len(mresp.Members) expected 1, got %d (%+v)", len(mresp.Members), mresp.Members) | ||
} | ||
if mresp.Members[0].ClientURLs[0] != cts.caddr { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: assert.Contains(t, mresp.Members, &pb.Member{Name: hostname, ClientURLs: []string{cts.caddr}})
LGTM, please use |
newMemberAddr := "127.0.0.2:6789" | ||
grpcproxy.Register(lg, cts.c, prefix, newMemberAddr, 7) | ||
// wait some time for proxy update members | ||
time.Sleep(time.Second) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I think we can switch to time.Sleep(200 * time.Millisecond)
for all waits
in this test. 1s seams too long.
I do not get the real issue. Please provide the detailed steps to reproduce the issue. |
If start grpc proxy with --resolver-prefix, memberlist will return all alive proxy nodes, when one grpc proxy node is down, it is expected to not return the down node, but it is still return Signed-off-by: yellowzf <zzhf3311@163.com>
c50e7bd
to
ca22120
Compare
step1: start a grpc-proxy cluster which contains 3 member, member1/member2/member3, use --resolver-prefix flag when member start. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
3.4 use PreKv to assign addr, no need to backport, https://github.com/etcd-io/etcd/blob/release-3.4/clientv3/naming/grpc.go#L91-L100 3.5 backport at #15907 |
If start grpc proxy with --resolver-prefix, memberlist will return all alive proxy nodes, when one grpc proxy node is down, it is expected to not return the down node, but it is still return
Please read https://github.com/etcd-io/etcd/blob/main/CONTRIBUTING.md#contribution-flow.