Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Seeds Driven Cluster Bootstrap #6744

Merged
merged 21 commits into from
Oct 20, 2022
Merged

Conversation

dlex
Copy link
Contributor

@dlex dlex commented Oct 12, 2022

Cover letter

This PR addresses the second part of the scope of the "Cluster Bootstrap" project. The first part was addressed by #6659.

This PR introduces a new node configuration parameter empty_seed_starts_cluster that can disable the Empty Seed Cluster Bootstrap AKA Root Driven Cluster Bootstrap (legacy) mode of bootstrapping a cluster, and allows the set of servers listed as seeds in each node configuration to start a cluster together, as soon as they form a raft group and elect a leader. The new bootstrapping mode is referred as Seeds Driven Cluster Bootstrap.

In either mode, cluster now gets cluster UUID reflected by a new controller log message, which lands second in the controller log right after the initial raft configuration message. Cluster UUID is also stored in kvstore of shard0 in every node.

In the new Seeds Driven bootstrap mode, all seed servers must be available for a cluster to be created, with identical node configurations. Afterwards, none of seed servers should try to form another new cluster ("split-brain") if their local storage is wiped out, unless all seed nodes are wiped together at the same time.

Fixes #333

Backport Required

  • not a bug fix
  • issue does not exist in previous branches
  • papercut/not impactful enough to backport
  • v22.2.x
  • v22.1.x
  • v21.11.x

UX changes

Node Configuration in redpanda.yaml

  • empty_seed_starts_cluster (default: true) to switch between the Empty Seed Cluster Bootstrap (legacy) mode, and the Seeds Driven Cluster Bootstrap mode. When disabled, it is required to be disabled in all seed nodes.
  • seed_servers are required to be identical in every seed node when in the Seeds Driven Cluster Bootstrap mode.

Release notes

Features

  • Seed driven cluster bootstrap mode. Disable empty_seed_starts_cluster to use it. That will allow the set of servers listed as seeds to start a cluster together. All seed servers must be available for a cluster to be created, with identical node configurations. Afterwards, none of seed servers will try to form another new cluster if their local storage is wiped out, unless all seed nodes are wiped together at the same time. Cluster now gets cluster UUID reflected by a new controller log message, and stored in kvstore.

Improvements

  • Configurations of all nodes across the cluster can be identical
  • Cluster is bootstrapped with all its seed servers
  • Wiped out seed cluster members will not start their own new cluster

@dlex dlex self-assigned this Oct 12, 2022
@dlex dlex added the kind/enhance New feature or request label Oct 12, 2022
@dlex
Copy link
Contributor Author

dlex commented Oct 13, 2022

Force-push update:

src/v/cluster/types.h Outdated Show resolved Hide resolved
@mmaslankaprv
Copy link
Member

Can you explaing what SDCB and ESCB mean ? Maybe adding this to cover letter would help

@dlex
Copy link
Contributor Author

dlex commented Oct 17, 2022

force-push: rebased onto dev, addressed review feedback, made node_id assignment aware of results of cluster_bootstrap_info discovery, added support for the new command to log_viewer.

cluster_test_fixure now supports the empty_seed_starts_cluster parameter
@dlex
Copy link
Contributor Author

dlex commented Oct 19, 2022

force push: review comments addressed, cluster_uuid handling refactored (stored in storage::api, i/o done only once where needed). A bug preventing a wiped seed to rejoin fixed.

dlex and others added 9 commits October 19, 2022 03:51
bootstrap_backend operates on credentail_storage for that.
In security_frontend, maybe_create_bootstrap_user() is changed into
get_bootstrap_user_creds_from_env(), and the actual bootstrap user
creation is not done anymore.
Client: parallel querying of all seed_servers w/o timeout, until
results are obtained from all peer seed servers.
Verify that both versions and configurations match
Server: supply data
initial_seed_brokers() now returns a future
cluster_discovery caches if cluster_uuid is present
is_cluster_founder() to determine if the node should be starting a cluster
Cluster founders need to be able to elect a leader so they can decide
which node replicates the cluster_bootstrap_cmd, and do that before
controller starts.
This commit adds options to the RedpandaService class to:
- change the set of seed servers
- change whether or not node idx 1 is deemed the root node
Peer seed nodes discovered throught cluster_discovery before the cluster
is bootstrapped are required to be at the same latest logical versions
as the local. Therefore as the cluster is initialized, versions of seed
nodes can safely be assumed to be at the latest. That enables the auto
node_id feature that is essential for cluster bootstrap.
@dlex
Copy link
Contributor Author

dlex commented Oct 19, 2022

force push: fix a linter error

cluster_discovery::get_cluster_founder_node_id() {
if (config::node().empty_seed_starts_cluster()) {
if (config::node().seed_servers().empty()) {
return node_id{0};
co_return node_id{0};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we validate that in this case node_id configured in redpanda.yml matches the one returned from here ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nope, I agree that it should be added

@andrwng andrwng mentioned this pull request Oct 20, 2022
6 tasks
@dlex dlex merged commit 355adde into redpanda-data:dev Oct 20, 2022
andrwng added a commit to andrwng/redpanda that referenced this pull request Oct 21, 2022
The original implementation of seeds-driven bootstrap had pieces of the
various validation required for bootstrap strewn about startup. Because
of this, I found it difficult to reason about what validations are done
when, and what work may be duplicated, while reading through the code.

This commit puts all validations up front so we determine whether we are
a founder immediately and use a cached value thereafter.

This is mostly addressing review comments on redpanda-data#6744.
andrwng added a commit to andrwng/redpanda that referenced this pull request Oct 21, 2022
The original implementation of seeds-driven bootstrap had pieces of the
various validation required for bootstrap strewn about startup. Because
of this, I found it difficult to reason about what validations are done
when, and what work may be duplicated, while reading through the code.

This commit puts all validations up front so we determine whether we are
a founder immediately and use a cached value thereafter.

This is mostly addressing review comments on redpanda-data#6744.
andrwng added a commit to andrwng/redpanda that referenced this pull request Oct 24, 2022
The original implementation of seeds-driven bootstrap had pieces of the
various validation required for bootstrap strewn about startup. Because
of this, I found it difficult to reason about what validations are done
when, and what work may be duplicated, while reading through the code.

This commit puts all validations up front so we determine whether we are
a founder immediately and use a cached value thereafter.

This is mostly addressing review comments on redpanda-data#6744.
andrwng added a commit to andrwng/redpanda that referenced this pull request Oct 24, 2022
The original implementation of seeds-driven bootstrap had pieces of the
various validation required for bootstrap strewn about startup. Because
of this, I found it difficult to reason about what validations are done
when, and what work may be duplicated, while reading through the code.

This commit puts all validations up front so we determine whether we are
a founder immediately and use a cached value thereafter.

This is mostly addressing review comments on redpanda-data#6744.
andrwng added a commit to andrwng/redpanda that referenced this pull request Oct 24, 2022
The original implementation of seeds-driven bootstrap had pieces of the
various validation required for bootstrap strewn about startup. Because
of this, I found it difficult to reason about what validations are done
when, and what work may be duplicated, while reading through the code.

This commit puts all validations up front so we determine whether we are
a founder immediately and use a cached value thereafter.

This is mostly addressing review comments on redpanda-data#6744.
andrwng added a commit to andrwng/redpanda that referenced this pull request Oct 25, 2022
The original implementation of seeds-driven bootstrap had pieces of the
various validation required for bootstrap strewn about startup. Because
of this, I found it difficult to reason about what validations are done
when, and what work may be duplicated, while reading through the code.

This commit puts all validations up front so we determine whether we are
a founder immediately and use a cached value thereafter.

This is mostly addressing review comments on redpanda-data#6744.
andrwng added a commit to andrwng/redpanda that referenced this pull request Oct 25, 2022
The original implementation of seeds-driven bootstrap had pieces of the
various validation required for bootstrap strewn about startup. Because
of this, I found it difficult to reason about what validations are done
when, and what work may be duplicated, while reading through the code.

This commit puts all validations up front so we determine whether we are
a founder immediately and use a cached value thereafter.

This is mostly addressing review comments on redpanda-data#6744.
andrwng added a commit to andrwng/redpanda that referenced this pull request Oct 25, 2022
The original implementation of seeds-driven bootstrap had pieces of the
various validation required for bootstrap strewn about startup. Because
of this, I found it difficult to reason about what validations are done
when, and what work may be duplicated, while reading through the code.

This commit puts all validations up front so we determine whether we are
a founder immediately and use a cached value thereafter.

This is mostly addressing review comments on redpanda-data#6744.
andrwng added a commit to andrwng/redpanda that referenced this pull request Oct 25, 2022
The original implementation of seeds-driven bootstrap had pieces of the
various validation required for bootstrap strewn about startup. Because
of this, I found it difficult to reason about what validations are done
when, and what work may be duplicated, while reading through the code.

This commit puts all validations up front so we determine whether we are
a founder immediately and use a cached value thereafter.

This is mostly addressing review comments on redpanda-data#6744.
@@ -73,7 +73,8 @@ feature_manager::feature_manager(

) {}

ss::future<> feature_manager::start() {
ss::future<>
feature_manager::start(std::vector<model::node_id>&& cluster_founder_nodes) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please take by value

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/redpanda kind/enhance New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

redpanda: cluster will not form without a node with an empty seed server list
5 participants