Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question: promote learning member on a 1 size cluster #11633

Closed
ereslibre opened this issue Feb 16, 2020 · 6 comments · Fixed by #11640
Closed

Question: promote learning member on a 1 size cluster #11633

ereslibre opened this issue Feb 16, 2020 · 6 comments · Fixed by #11640
Labels

Comments

@ereslibre
Copy link
Contributor

ereslibre commented Feb 16, 2020

etcd version: 3.4.3

Given a cluster setup with a single member,

# etcdctl member list
ab75c79a2c3090aa, started, controlplane, http://172.17.0.2:30002, http://172.17.0.2:30003, false

I added a new learner member, and started it.

Member f9d1e8f6c55eb887 added to cluster a9c95578e770c6db

ETCD_NAME="controlplane2"
ETCD_INITIAL_CLUSTER="controlplane=http://172.17.0.2:30002,controlplane2=http://172.17.0.2:30004"
ETCD_INITIAL_ADVERTISE_PEER_URLS="http://172.17.0.2:30004"
ETCD_INITIAL_CLUSTER_STATE="existing"

The member list afterwards adding the learner member, and after having started it.

# etcdctl member list
ab75c79a2c3090aa, started, controlplane, http://172.17.0.2:30002, http://172.17.0.2:30003, false
f9d1e8f6c55eb887, started, controlplane2, http://172.17.0.2:30004, http://172.17.0.2:30005, true

I see that if I now try to promote it to a voting member, I get the following error:

# etcdctl member promote f9d1e8f6c55eb887 
{"level":"warn","ts":"2020-02-16T18:41:05.652Z","caller":"clientv3/retry_interceptor.go:61","msg":"retrying of unary invoker failed","target":"endpoint://client-0d90d825-371b-4a77-b1e1-bcf99ab3fab2/127.0.0.1:2379","attempt":0,"error":"rpc error: code = Unknown desc = etcdserver: re-configuration failed due to not enough started members"}
Error: etcdserver: re-configuration failed due to not enough started members

This error looks like it's coming from IsReadyToPromoteMember, called when the strict reconfig check is enabled.

This is the error present on the only etcd instance with voting rights during that time:

2020-02-16 18:41:05.651445 W | etcdserver/membership: Reject promote member request: the number of started member (1) will be less than the quorum number of the cluster (2)
2020-02-16 18:41:05.651501 W | etcdserver: not enough started members, rejecting promote member f9d1e8f6c55eb887

I can disable the strict reconfig check, but I'm wondering if even with strict reconfiguration this should not be possible at all by design, or if it's a bug. I'm following the recommendation of only joining one learner node at a time, but seems impossible to succeed if you start with a single instance cluster.

Thank you!

@ereslibre
Copy link
Contributor Author

ereslibre commented Feb 16, 2020

At first sight looks like this case would be solved by changing

to 0, because right now IsReadyToPromoteMember thinks it has two voting members, when there is only one in reality.

I might be missing many important nuances here, though.

@jingyih
Copy link
Contributor

jingyih commented Feb 19, 2020

Looks like a bug to me. A cluster with 1 started voting member and 1 started learner member should allow promoting of that learner member.

@ereslibre
Copy link
Contributor Author

I'll open a PR.

@jingyih
Copy link
Contributor

jingyih commented Feb 19, 2020

cc @WIZARD-CXY

@jingyih
Copy link
Contributor

jingyih commented Feb 19, 2020

I'll open a PR.

@ereslibre Sounds great! I am thinking it will be helpful if we could have a unit test for that function, something similar to

func TestIsReadyToAddVotingMember(t *testing.T) {

@ereslibre
Copy link
Contributor Author

@jingyih I created #11640 to fix this issue, but as I comment on the PR I would be eager to increase coverage for the promotion logic on other parts of the code as well.

ereslibre added a commit to oneinfra/etcd that referenced this issue Feb 21, 2020
When promoting a learner member we should not count already a voting
member, but take only into account the number of existing voting
members and their current status (started, unstarted) when taking the
decision whether a learner member can be promoted.

Before this change, it was impossible to grow from a quorum N to a N+1
through promoting a learning member.

Fixes: etcd-io#11633
jingyih pushed a commit to jingyih/etcd that referenced this issue Feb 22, 2020
When promoting a learner member we should not count already a voting
member, but take only into account the number of existing voting
members and their current status (started, unstarted) when taking the
decision whether a learner member can be promoted.

Before this change, it was impossible to grow from a quorum N to a N+1
through promoting a learning member.

Fixes: etcd-io#11633
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Development

Successfully merging a pull request may close this issue.

2 participants