Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Init Cluster UUID in upgraded clusters #7079

Merged
merged 9 commits into from
Nov 9, 2022

Conversation

dlex
Copy link
Contributor

@dlex dlex commented Nov 3, 2022

Cover letter

When an existing cluster is upgraded to 22.3, we want it to have the bootstrap_cluster_cmd in the controller log, as well as a cluster UUID assigned. This way going forward we will be able to rely on cluster UUID as a mandatory property of any existing cluster.

This PR also adds an admin API endpoint v1/cluster/uuid so that it is possible to query the cluster UUID a node belongs to.

Fixes #333

Backport Required

  • not a bug fix
  • issue does not exist in previous branches
  • papercut/not impactful enough to backport
  • v22.2.x
  • v22.1.x
  • v21.11.x

UX changes

Admin API v1/cluster/uuid returns {"cluster_uuid": "<UUID>"} or empty if the node is not a part of a cluster yet.

Release notes

  • none

src/v/redpanda/admin_server.cc Outdated Show resolved Hide resolved
src/v/redpanda/admin_server.h Show resolved Hide resolved
src/v/redpanda/admin_server.cc Outdated Show resolved Hide resolved
src/v/redpanda/admin_server.cc Outdated Show resolved Hide resolved
tests/rptest/tests/cluster_bootstrap_test.py Outdated Show resolved Hide resolved
tests/rptest/tests/cluster_bootstrap_test.py Outdated Show resolved Hide resolved
src/v/features/feature_table.h Show resolved Hide resolved
src/v/cluster/controller.cc Outdated Show resolved Hide resolved
src/v/cluster/controller.cc Outdated Show resolved Hide resolved
src/v/cluster/controller.cc Outdated Show resolved Hide resolved
@dlex dlex force-pushed the 333_cluster-uuid-after-upgrade branch from 1f01d0c to 7cdaad1 Compare November 4, 2022 22:13
@dlex dlex marked this pull request as ready for review November 4, 2022 22:15
@dlex
Copy link
Contributor Author

dlex commented Nov 4, 2022

Force-push: rebased to a fresh dev, addressed comments

tests/rptest/tests/cluster_bootstrap_test.py Outdated Show resolved Hide resolved
tests/rptest/tests/cluster_bootstrap_test.py Outdated Show resolved Hide resolved
src/v/cluster/controller.h Outdated Show resolved Hide resolved
src/v/cluster/controller.cc Outdated Show resolved Hide resolved
src/v/cluster/controller.cc Outdated Show resolved Hide resolved
src/v/cluster/controller.cc Outdated Show resolved Hide resolved
@dlex dlex force-pushed the 333_cluster-uuid-after-upgrade branch from 7cdaad1 to f93e0c3 Compare November 5, 2022 01:21
@dlex
Copy link
Contributor Author

dlex commented Nov 5, 2022

force push: cluster_creation_hook() and create_cluster() have been refactored, the comments have been addressed

Copy link
Contributor

@andrwng andrwng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Almost good to go I think. Need to do something about the broker assert though

src/v/cluster/controller.cc Show resolved Hide resolved
src/v/cluster/controller.cc Outdated Show resolved Hide resolved
src/v/cluster/controller.cc Outdated Show resolved Hide resolved
src/v/cluster/controller.cc Outdated Show resolved Hide resolved
src/v/cluster/controller.cc Outdated Show resolved Hide resolved
@piyushredpanda piyushredpanda added this to the v22.3.1-rc5 milestone Nov 5, 2022
A `register_route` overload to register synchronous handlers
Storage to get cluster_uuid is accessed via controller
Check that cluster_uuid is not available until the entire cluster
upgraded, and after the upgrade is complete that cluster_uuid is here
New feature `seeds_driven_bootstrap_capable` to control cluster upgrade.
Non-founding raft0 leader issues `bootstrap_cluster_cmd` if no
cluster_uuid present. This is done in a detached fiber
waiting on the new feature to be enabled.
Controller log existence checks based on raft visible index have been
removed because they are prone to a race condition.
A node that is already a member of a cluster failed to apply
additional properties of bootstrap_cluster_cmd like bootstrap user
upon restarts because idempotency criterion was to check presence
of cluster_uuid value restored from kvstore. Now bootstrap_backend
has its own copy that reflects only the fact of the command application.
Also reconciliation between the two has been added.
Successful bootstrap user creation caused bootstrap_cluster_cmd
to fail early and never apply cluster_uuid. Now user_exists
condition is a warning and cluster_uuid creation is completed in
that case together with success. All other (fallback) user creation
errors are fatal.
to agree on using "cluster UUID" as the common label, and on the
levels to produce reasonable output volume at INFO.
to provide a label for them distinct from cluster UUID's one
so that it's empty if and only if we are not a cluster founder
@dlex dlex force-pushed the 333_cluster-uuid-after-upgrade branch from f93e0c3 to 903ad6f Compare November 8, 2022 19:44
@dlex
Copy link
Contributor Author

dlex commented Nov 8, 2022

force-push: fixed issues caused by race condition between controller and the raft0 consensus leading to nondeterministic results on whether controller log exists or not, and issues around bootstrap user creation; also some log factoring

@dlex dlex requested a review from andrwng November 8, 2022 19:49
andrwng
andrwng previously approved these changes Nov 8, 2022
Copy link
Contributor

@andrwng andrwng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a non-blocking nit

tests/rptest/tests/cluster_bootstrap_test.py Show resolved Hide resolved
src/v/cluster/members_manager.cc Show resolved Hide resolved
@dlex
Copy link
Contributor Author

dlex commented Nov 8, 2022

@dlex dlex merged commit efeff4f into redpanda-data:dev Nov 9, 2022
@dlex dlex deleted the 333_cluster-uuid-after-upgrade branch November 29, 2022 20:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

redpanda: cluster will not form without a node with an empty seed server list
3 participants