-
Notifications
You must be signed in to change notification settings - Fork 96
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Properly handle node_id and seed_servers #91
Comments
This isn't quite about leadership or election. Redpanda <= 22.2The special thing about nodes with seed_servers=[] is how they behave if you start them on an empty drive. Nodes with seed_servers=[] (call it a "founding node") will respond to an empty drive by creating a new cluster of one node, and waiting for other nodes to join it. Nodes with a populated seed_servers have the opposite behavior on an empty drive: they will try to join an existing cluster, and no do anything until they succeed in doing so. This becomes important in some cloud environments, where the disks aren't really persistent, and orcherstrators may be quite casual about just blowing away a node's drive, expecting the cluster to autonomously cope. This doesn't work, because if you blow away the disk from the founding node (the one with seed_servers=[]) then it won't try to rejoin the cluster: it'll start a new cluster of its own. The way the latest operator code copes with this is to only briefly have seed_servers=[] starting the cluster for the first time, start 1 node like that, let it come up, and then immediately change its seed_servers to point to its peers so that if the node is ever restarted with an empty disk, it will rejoin rather than trying to found a new cluster. Redpanda >= 22.3The changes planned for 22.3 will provide a simpler way for things like the operator to initialize a cluster: there will be a mode where there is no auto-founding of clusters, and all nodes have seed_servers populated from time zero. Then the operator calls an admin API endpoint on one of the nodes (whichever, but one of them), and that node starts the cluster: starts a controller log and allows its peers to join. On subsequent disk wipes, there's no risk of anyone founding a new cluster, because there's no admin API call asking it to. |
Thanks for these details @jcsp , this will help when this ticket gets pulled in. |
Closing this as no chart changes will be needed for the solution in 22.3. |
Really? I thought the 22.3 core code was going to be broadly backwards compatible, so without changes to the chart I'd have thought you'd still have an issue (i.e. nodes with seed_servers=[] would come up and form a cluster of 1 if you wiped their drive) |
we don't set the seed servers to an empty list: seed_servers:
- host:
address: "redpanda-0.redpanda.redpanda.svc.cluster.local."
port: 33145
- host:
address: "redpanda-1.redpanda.redpanda.svc.cluster.local."
port: 33145
- host:
address: "redpanda-2.redpanda.redpanda.svc.cluster.local."
port: 33145 |
Thanks, @jcsp, for asking the right questions. We do need to remove the |
The helm chart will always set
redpanda.seed_servers
to be[]
whereredpanda.node_id
is 0 (broker 0).I believe the issue is that broker 0 may not always be the leader after restarts or if there is a leader election, but the existing statefulset code still assumes broker 0 will be the leader (and setsSee notes below for explanation as to why this is an issue and what we can do (both now and in versions >= 22.3) to resolve.seed_servers
to[]
). There could also have been some issue with broker 0 that caused it to lose leadership, and so it could be in a state where it doesn't have a complete copy of all partitions. In this scenario, setting broker 0 to be the leader would result in data loss.Investigation is needed into what happens once a a new leader is elected and then a helm upgrade is applied.
We should also determine the leader prior to restarting the cluster for whatever reason and setCluster restart should not impactseed_server
to[]
for the appropriate broker.seed_servers
values, as they will be correctly set on all nodes, including the founding node (after the founding node is started).Work is being done to allow setting the same
seed_servers
value across all brokers in a cluster, relevant ticket here: redpanda-data/redpanda#333Also related to this, the following ticket tracks making
node_id
automatically assigned (and no longer set withinredpanda.yaml
): redpanda-data/redpanda#2793Once the above tickets have associated PRs merged, we wouldn't have to worry about handling eitherSee notes below for hownode_id
orseed_servers
in the helm chart.seed_servers
will be handled in the future. For now we ensure the leader (or founding node) initially has it set to[]
and then populate with other brokers after startup. After22.3
we can setseed_servers
for each node in the same way from the beginning.The text was updated successfully, but these errors were encountered: