Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Properly handle node_id and seed_servers #91

Closed
vuldin opened this issue Jun 21, 2022 · 6 comments
Closed

Properly handle node_id and seed_servers #91

vuldin opened this issue Jun 21, 2022 · 6 comments
Labels
enhancement New feature or request

Comments

@vuldin
Copy link
Member

vuldin commented Jun 21, 2022

The helm chart will always set redpanda.seed_servers to be [] where redpanda.node_id is 0 (broker 0). I believe the issue is that broker 0 may not always be the leader after restarts or if there is a leader election, but the existing statefulset code still assumes broker 0 will be the leader (and sets seed_servers to []). There could also have been some issue with broker 0 that caused it to lose leadership, and so it could be in a state where it doesn't have a complete copy of all partitions. In this scenario, setting broker 0 to be the leader would result in data loss. See notes below for explanation as to why this is an issue and what we can do (both now and in versions >= 22.3) to resolve.

Investigation is needed into what happens once a a new leader is elected and then a helm upgrade is applied. We should also determine the leader prior to restarting the cluster for whatever reason and set seed_server to [] for the appropriate broker. Cluster restart should not impact seed_servers values, as they will be correctly set on all nodes, including the founding node (after the founding node is started).

Work is being done to allow setting the same seed_servers value across all brokers in a cluster, relevant ticket here: redpanda-data/redpanda#333

Also related to this, the following ticket tracks making node_id automatically assigned (and no longer set within redpanda.yaml): redpanda-data/redpanda#2793

Once the above tickets have associated PRs merged, we wouldn't have to worry about handling either node_id or seed_servers in the helm chart. See notes below for how seed_servers will be handled in the future. For now we ensure the leader (or founding node) initially has it set to [] and then populate with other brokers after startup. After 22.3 we can set seed_servers for each node in the same way from the beginning.

@vuldin vuldin added the enhancement New feature or request label Jul 5, 2022
@jcsp
Copy link

jcsp commented Aug 12, 2022

This isn't quite about leadership or election.

Redpanda <= 22.2

The special thing about nodes with seed_servers=[] is how they behave if you start them on an empty drive. Nodes with seed_servers=[] (call it a "founding node") will respond to an empty drive by creating a new cluster of one node, and waiting for other nodes to join it. Nodes with a populated seed_servers have the opposite behavior on an empty drive: they will try to join an existing cluster, and no do anything until they succeed in doing so.

This becomes important in some cloud environments, where the disks aren't really persistent, and orcherstrators may be quite casual about just blowing away a node's drive, expecting the cluster to autonomously cope. This doesn't work, because if you blow away the disk from the founding node (the one with seed_servers=[]) then it won't try to rejoin the cluster: it'll start a new cluster of its own.

The way the latest operator code copes with this is to only briefly have seed_servers=[] starting the cluster for the first time, start 1 node like that, let it come up, and then immediately change its seed_servers to point to its peers so that if the node is ever restarted with an empty disk, it will rejoin rather than trying to found a new cluster.

Redpanda >= 22.3

The changes planned for 22.3 will provide a simpler way for things like the operator to initialize a cluster: there will be a mode where there is no auto-founding of clusters, and all nodes have seed_servers populated from time zero. Then the operator calls an admin API endpoint on one of the nodes (whichever, but one of them), and that node starts the cluster: starts a controller log and allows its peers to join. On subsequent disk wipes, there's no risk of anyone founding a new cluster, because there's no admin API call asking it to.

@vuldin
Copy link
Member Author

vuldin commented Aug 12, 2022

Thanks for these details @jcsp , this will help when this ticket gets pulled in.

@joejulian
Copy link
Contributor

Closing this as no chart changes will be needed for the solution in 22.3.

@jcsp
Copy link

jcsp commented Oct 11, 2022

Closing this as no chart changes will be needed for the solution in 22.3.

Really? I thought the 22.3 core code was going to be broadly backwards compatible, so without changes to the chart I'd have thought you'd still have an issue (i.e. nodes with seed_servers=[] would come up and form a cluster of 1 if you wiped their drive)

@joejulian
Copy link
Contributor

we don't set the seed servers to an empty list:

        seed_servers:
          - host:
              address: "redpanda-0.redpanda.redpanda.svc.cluster.local."
              port: 33145
          - host:
              address: "redpanda-1.redpanda.redpanda.svc.cluster.local."
              port: 33145
          - host:
              address: "redpanda-2.redpanda.redpanda.svc.cluster.local."
              port: 33145

@joejulian
Copy link
Contributor

Thanks, @jcsp, for asking the right questions. We do need to remove the config set redpanda.seed_servers from an init container when 22.3 comes out.

@joejulian joejulian reopened this Oct 12, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants