CORE-7000: Node ID/UUID Override #22972

oleiman · 2024-08-20T23:52:27Z

Implements on-startup node UUID and ID override via CLI options or node config, as described in 2024-08-14 - RFC - Override Node UUID on Startup.

This PR was originally meant as a fully functional proof of concept, but given the mechanical simplicity of the approach (most of the code here is tests and serdes for the new config types), it has been promoted to PR.

Closes CORE-7000
Closes CORE-6830

TODO:

determine whether this can be safely backported to at least v24.2.x, possibly farther
- My experiments (described in https://redpandadata.atlassian.net/browse/CORE-7051) suggest that this should be safe to do.

Backports Required

Release Notes

Improvements

Adds the ability to configure Node UUID and ID overrides at broker startup.

oleiman · 2024-08-21T15:09:10Z

/ci-repeat 1

vbotbuildovich · 2024-08-21T18:19:11Z

ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/53293#019175d9-8ebf-41ec-acd5-31f4de51e6dd

ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/53720#01919bb1-fba1-44bd-86a9-7052d5c39c65

ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/53839#0191a50e-cce5-401b-b716-7e5fcc09da61

ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/54169#0191d3a1-b4c7-4d85-ac8e-5eaffffd80d6

ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/54331#0191e23c-3795-498f-bea1-c8d5609976d3

oleiman · 2024-08-24T04:34:24Z

/ci-repeat 1

src/v/redpanda/application.cc

tests/rptest/tests/admin_uuid_operations_test.py

src/v/redpanda/application.cc

tests/rptest/tests/admin_uuid_operations_test.py

oleiman · 2024-08-24T04:41:04Z

@michael-redpanda - leaving this in draft pending final RFC signoff, but this is probably worth a look whenever you're ready 🙏

oleiman · 2024-08-24T05:04:19Z

/ci-repeat 1

michael-redpanda

looks really nice so far!

src/v/config/node_overrides.cc

src/v/config/node_overrides.h

src/v/redpanda/application.cc

oleiman · 2024-08-28T23:30:34Z

force push contents:

various CR cleanups
add a CLI options test to the multi-node dt case

oleiman · 2024-08-28T23:30:42Z

/ci-repeat 1

oleiman · 2024-08-30T16:35:46Z

CI Failures:

CI Failure (TimeoutError: Node docker-rp-xx draining leaderships) in BasicAuthUpgradeTest.test_upgrade_and_enable_basic_auth #10136 (unrelated, known) build

micheleRP

LGTM from docs

oleiman · 2024-09-08T20:10:20Z

force push to improve integration tests somewhat:

use RedpandaService.healthy
remove unnecessary leadership transfer

pgellert

The core logic looks good to me. I've added some comments for code improvement.

src/v/config/node_overrides.cc

src/v/redpanda/application.cc

pgellert · 2024-09-09T08:53:14Z

src/v/redpanda/application.cc

+        // NOTE(oren): what happens later if there's a node ID in config but we
+        // didn't take it? in the nullopt check down below...


Based on the code this seems fine. This will hit the else branch below. Then inside cluster_discovery::dispatch_node_uuid_registration_to_seeds and members_manager::handle_join_request we will try to register the node with the node id config::node().node_id() and either succeed or fail to get that specific node id but we will never get any other node id. So there's no need to set the node_id again in the conditional below with the nullopt check.

Yeah, that matches my reading, but I'm curious whether there's a reason for keeping it around if we know we're starting up clean. If that node ID had previously been registered against a different UUID, I think the (re)join request will always fail.

Feels like we're trading a small amount of one-shot work on startup for cyclomatic complexity downstream; possibly I'm overlooking some other bit of state that gives the distinction meaning.

At any rate, the comment is just noise!

tests/rptest/tests/admin_uuid_operations_test.py

src/v/redpanda/application.cc

tests/rptest/tests/admin_uuid_operations_test.py

oleiman · 2024-09-09T23:16:43Z

force push contents:

Use a regex for cli parsing
remove some comments
ducktape cleanup

src/v/utils/uuid.h

ztlpn · 2024-09-06T09:50:01Z

src/v/config/convert.h

+    }
+    static bool decode(const Node& node, type& rhs) {
+        auto value = node.as<std::string>();
+        auto out = [&value]() -> std::optional<model::node_uuid> {


nit: any reason for the "construct lambda then immediately call" pattern?

personal preference. "initialize this optional with the result of from_string or, if it throws, nullptr" feels a bit neater than "initialize this optional to nullptr then assign the result of from_string, unless it throws". The ID in the outer scope receives a value at exactly one code point.

ztlpn · 2024-09-09T23:40:12Z

src/v/redpanda/application.cc

@@ -2622,6 +2663,18 @@ void application::wire_up_and_start(::stop_signal& app_signal, bool test_mode) {
          "Running with already-established node ID {}",
          config::node().node_id());
        node_id = config::node().node_id().value();
+    } else if (auto id = _node_overrides.node_id(); id.has_value()) {


I think it makes sense to try to register with the cluster (next conditional branch) in this case as well. For two reasons:

it will get us fresh features and cluster config snapshots from the controller leader

it will allow us to reject erroneous configurations (e.g. if the UUID is already registered with a different id).

Not sure if all the code for this is there, but looks like at least the join RPC supports passing existing node ids.

Makes sense. I'll refactor the conditionals a little bit.

Won't we hit the problem of needing a controller leader if we try to register here? Or do we also want to change members_manager::handle_join_request to not require controller leadership when the node is trying to register with a known (node uuid, node id) pair?

redpanda/src/v/cluster/members_manager.cc

Lines 1240 to 1298 in 74223be

members_manager::handle_join_request(join_node_request const req) {

using ret_t = result<join_node_reply>;

using status_t = join_node_reply::status_code;

bool node_id_assignment_supported = _feature_table.local().is_active(

features::feature::node_id_assignment);

bool req_has_node_uuid = !req.node_uuid.empty();

if (node_id_assignment_supported && !req_has_node_uuid) {

vlog(

clusterlog.warn,

"Invalid join request for node ID {}, node UUID is required",

req.node.id());

co_return errc::invalid_request;

}

std::optional<model::node_id> req_node_id = std::nullopt;

if (req.node.id() >= 0) {

req_node_id = req.node.id();

}

if (!node_id_assignment_supported && !req_node_id) {

vlog(

clusterlog.warn,

"Got request to assign node ID, but feature not active",

req.node.id());

co_return errc::invalid_request;

}

if (

req_has_node_uuid

&& req.node_uuid.size() != model::node_uuid::type::length) {

vlog(

clusterlog.warn,

"Invalid join request, expected node UUID or empty; got {}-byte "

"value",

req.node_uuid.size());

co_return errc::invalid_request;

}

model::node_uuid node_uuid;

if (!req_node_id && !req_has_node_uuid) {

vlog(clusterlog.warn, "Node ID assignment attempt had no node UUID");

co_return errc::invalid_request;

}

ss::sstring node_uuid_str = "no node_uuid";

if (req_has_node_uuid) {

node_uuid = model::node_uuid(uuid_t(req.node_uuid));

node_uuid_str = ssx::sformat("{}", node_uuid);

}

vlog(

clusterlog.info,

"Processing node '{} ({})' join request (version {}-{})",

req.node.id(),

node_uuid_str,

req.earliest_logical_version,

req.latest_logical_version);

if (!_raft0->is_elected_leader()) {

vlog(clusterlog.debug, "Not the leader; dispatching to leader node");

// Current node is not the leader have to send an RPC to leader

// controller

co_return co_await dispatch_rpc_to_leader(

Hmm good point, maybe it won't work then.

yeah, the exact requirement we're trying to circumvent. i was wondering whether the cluster layer might short circuit somewhere (or be made to short circuit) on previously known <ID,UUID>, but I suppose all the request routing is controller leadership based 🤷

ztlpn · 2024-09-09T23:49:10Z

tests/rptest/tests/admin_uuid_operations_test.py

+        self.logger.debug(
+            f"...and decommission ghost node [{ghost_node_id}]...")
+
+        self.admin.decommission_broker(ghost_node_id, node=to_stop[0])


nit: this won't actually wait for the decom to be successful.

ztlpn · 2024-09-09T23:50:48Z

tests/rptest/tests/admin_uuid_operations_test.py

+
+        if mode == TestMode.CFG_OVERRIDE:
+            self.redpanda.restart_nodes(
+                to_stop,


Nit: we are already kind of testing that the nodes will only adopt overrides that match the current uuid, but maybe it will be more realistic to do what k8s will do and perform a full rolling restart.

Fair point. Will do

I'm beginning to think that "rolling restart" (I used this language in the RFC and the runbook) is not quite accurate. In DT, for example, we have a RollingRestarter that a) uses maintenance mode by default b) requires the cluster to be healthy before cycling each node. Presumably a k8s rolling restart looks somewhat similar.

Of course we can't meet either of those in this usage case. In DT we can fudge it with an unsafe param or something, but we may want to reconsider this framing with respect to support-facing docs.

It's worse than that, I think - in this test case, I believe a rolling restart is straightforwardly impossible, since restarting one node (with overrides) gives us a total of 2 live brokers - not enough for to form a cluster due to the presence of ghost nodes.

My intuition is that we can start the nodes concurrently (as written) as long as the number of nodes we're restarting is <= (n_nodes + n_ghosts) / 2. What matters is that the empty nodes don't form a cluster among themselves, independent of any nodes containing a complete controller log, right?

Yeah, that makes sense to me. We want to restart the minimum number of ghost nodes to form a healthy cluster and wait for them to form a healthy controller quorum before continuing to restart the remaining nodes.

It makes sense to call out in the support docs not to use decommissioning or maintenance mode during any of this.

Perhaps the most important thing here is to not restart the healthy nodes to make sure they are available throughout the whole process.

most important thing here is to not restart the healthy nodes

yest exactly. in fact, if we restart those nodes, I think they will get stuck just the same.

call out in the support docs not to use decommissioning or maintenance mode

Yup, and that they are unavailable.

Generally, I think we're back to the point where we have a stack of assumptions that look right but would benefit from a ✅ from Alexey or @mmaslankaprv when he becomes available

To enable boost::lexical_cast for program_options parsing and UUIDs in configs. Signed-off-by: Oren Leiman <oren.leiman@redpanda.com>

- config::node_id_override - config::node_override_store Includes json/yaml SerDes and unit tests Signed-off-by: Oren Leiman <oren.leiman@redpanda.com>

- node_id_overrides: std::vector<config::node_id_override> Signed-off-by: Oren Leiman <oren.leiman@redpanda.com>

Signed-off-by: Oren Leiman <oren.leiman@redpanda.com>

oleiman · 2024-09-11T00:21:18Z

force push CR changes, mostly hardening tests

tests/rptest/tests/admin_uuid_operations_test.py

pgellert · 2024-09-11T08:02:54Z

tests/rptest/tests/admin_uuid_operations_test.py

+
+        if mode == TestMode.CFG_OVERRIDE:
+            self.redpanda.restart_nodes(
+                to_stop,


Yeah, that makes sense to me. We want to restart the minimum number of ghost nodes to form a healthy cluster and wait for them to form a healthy controller quorum before continuing to restart the remaining nodes.

It makes sense to call out in the support docs not to use decommissioning or maintenance mode during any of this.

Perhaps the most important thing here is to not restart the healthy nodes to make sure they are available throughout the whole process.

tests/rptest/tests/admin_uuid_operations_test.py

Signed-off-by: Oren Leiman <oren.leiman@redpanda.com>

And wire them into the corresponding node configs. "--node-id-overrides uuid:uuid:id [uuid:uuid:id ...]" Signed-off-by: Oren Leiman <oren.leiman@redpanda.com>

and 'restart_nodes' Signed-off-by: Oren Leiman <oren.leiman@redpanda.com>

Signed-off-by: Oren Leiman <oren.leiman@redpanda.com>

pgellert

lgtm

oleiman self-assigned this Aug 20, 2024

github-actions bot added the area/redpanda label Aug 20, 2024

oleiman force-pushed the nodeuuid/core-7000/poc branch 4 times, most recently from 6b16c80 to 9d48a3f Compare August 24, 2024 04:33

oleiman commented Aug 24, 2024

View reviewed changes

src/v/redpanda/application.cc Outdated Show resolved Hide resolved

oleiman commented Aug 24, 2024

View reviewed changes

tests/rptest/tests/admin_uuid_operations_test.py Outdated Show resolved Hide resolved

oleiman commented Aug 24, 2024

View reviewed changes

src/v/redpanda/application.cc Show resolved Hide resolved

oleiman commented Aug 24, 2024

View reviewed changes

tests/rptest/tests/admin_uuid_operations_test.py Outdated Show resolved Hide resolved

oleiman requested a review from michael-redpanda August 24, 2024 04:40

oleiman force-pushed the nodeuuid/core-7000/poc branch from 9d48a3f to 1f9aff4 Compare August 24, 2024 05:03

michael-redpanda reviewed Aug 27, 2024

View reviewed changes

oleiman force-pushed the nodeuuid/core-7000/poc branch from 1f9aff4 to e0bea3a Compare August 28, 2024 23:27

oleiman requested a review from michael-redpanda August 29, 2024 22:20

oleiman force-pushed the nodeuuid/core-7000/poc branch from e0bea3a to 0e8c804 Compare August 30, 2024 19:24

oleiman marked this pull request as ready for review August 30, 2024 19:36

oleiman requested a review from a team as a code owner August 30, 2024 19:36

oleiman requested a review from mmaslankaprv August 30, 2024 19:40

oleiman changed the title ~~CORE-7000: Node ID/UUID Override Proof of Concept~~ CORE-7000: Node ID/UUID Override Aug 30, 2024

micheleRP previously approved these changes Aug 30, 2024

View reviewed changes

michael-redpanda requested a review from bashtanov September 4, 2024 16:24

michael-redpanda requested a review from ztlpn September 4, 2024 16:24

oleiman dismissed micheleRP’s stale review via 9474132 September 8, 2024 20:08

oleiman force-pushed the nodeuuid/core-7000/poc branch from 0e8c804 to 9474132 Compare September 8, 2024 20:08

github-actions bot added the area/build label Sep 8, 2024

oleiman requested a review from pgellert September 8, 2024 20:10

pgellert reviewed Sep 9, 2024

View reviewed changes

oleiman force-pushed the nodeuuid/core-7000/poc branch from 9474132 to b4656fd Compare September 9, 2024 22:56

oleiman requested a review from pgellert September 9, 2024 23:16

ztlpn reviewed Sep 9, 2024

View reviewed changes

oleiman added 4 commits September 10, 2024 17:19

uuid_t: Add istream operator and yaml conversion

9cfd280

To enable boost::lexical_cast for program_options parsing and UUIDs in configs. Signed-off-by: Oren Leiman <oren.leiman@redpanda.com>

config: Introduce node_id_override

dddbe3e

- config::node_id_override - config::node_override_store Includes json/yaml SerDes and unit tests Signed-off-by: Oren Leiman <oren.leiman@redpanda.com>

config: Node configs for uuid and id overrides

4d573fa

- node_id_overrides: std::vector<config::node_id_override> Signed-off-by: Oren Leiman <oren.leiman@redpanda.com>

app: Apply node_id_overrides at startup

d2b85f2

Signed-off-by: Oren Leiman <oren.leiman@redpanda.com>

oleiman force-pushed the nodeuuid/core-7000/poc branch from b4656fd to 0ed66ce Compare September 11, 2024 00:20

pgellert reviewed Sep 11, 2024

View reviewed changes

oleiman force-pushed the nodeuuid/core-7000/poc branch from 0ed66ce to 2b81be0 Compare September 11, 2024 16:32

oleiman added 4 commits September 11, 2024 09:42

dt/admin_uuid: Tests for ghost node issue and nodewise mitigation

c96f4ae

Signed-off-by: Oren Leiman <oren.leiman@redpanda.com>

app: Introduce override CLI options

951cc77

And wire them into the corresponding node configs. "--node-id-overrides uuid:uuid:id [uuid:uuid:id ...]" Signed-off-by: Oren Leiman <oren.leiman@redpanda.com>

dt/rp: Adds extra CLI args to RedpandaService.start_redpanda

afe076d

and 'restart_nodes' Signed-off-by: Oren Leiman <oren.leiman@redpanda.com>

dt/admin_uuid: Integration test for CLI options

ace0d61

Signed-off-by: Oren Leiman <oren.leiman@redpanda.com>

oleiman force-pushed the nodeuuid/core-7000/poc branch from 2b81be0 to ace0d61 Compare September 11, 2024 16:43

pgellert approved these changes Sep 11, 2024

View reviewed changes

michael-redpanda approved these changes Sep 11, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CORE-7000: Node ID/UUID Override #22972

CORE-7000: Node ID/UUID Override #22972

oleiman commented Aug 20, 2024 •

edited by michael-redpanda

Loading

oleiman commented Aug 21, 2024

vbotbuildovich commented Aug 21, 2024 •

edited

Loading

oleiman commented Aug 24, 2024

oleiman commented Aug 24, 2024

oleiman commented Aug 24, 2024

michael-redpanda left a comment

oleiman commented Aug 28, 2024

oleiman commented Aug 28, 2024

oleiman commented Aug 30, 2024

micheleRP left a comment

oleiman commented Sep 8, 2024

pgellert left a comment

pgellert Sep 9, 2024

oleiman Sep 9, 2024

oleiman Sep 9, 2024

oleiman commented Sep 9, 2024

ztlpn Sep 6, 2024

oleiman Sep 10, 2024 •

edited

Loading

ztlpn Sep 9, 2024

oleiman Sep 10, 2024

pgellert Sep 10, 2024

ztlpn Sep 10, 2024

oleiman Sep 10, 2024 •

edited

Loading

ztlpn Sep 9, 2024

ztlpn Sep 9, 2024

oleiman Sep 10, 2024

oleiman Sep 10, 2024

oleiman Sep 10, 2024 •

edited

Loading

pgellert Sep 11, 2024

oleiman Sep 11, 2024 •

edited

Loading

oleiman commented Sep 11, 2024

pgellert Sep 11, 2024

pgellert left a comment

		// NOTE(oren): what happens later if there's a node ID in config but we
		// didn't take it? in the nullopt check down below...

	members_manager::handle_join_request(join_node_request const req) {
	using ret_t = result<join_node_reply>;
	using status_t = join_node_reply::status_code;

	bool node_id_assignment_supported = _feature_table.local().is_active(
	features::feature::node_id_assignment);
	bool req_has_node_uuid = !req.node_uuid.empty();
	if (node_id_assignment_supported && !req_has_node_uuid) {
	vlog(
	clusterlog.warn,
	"Invalid join request for node ID {}, node UUID is required",
	req.node.id());
	co_return errc::invalid_request;
	}
	std::optional<model::node_id> req_node_id = std::nullopt;
	if (req.node.id() >= 0) {
	req_node_id = req.node.id();
	}
	if (!node_id_assignment_supported && !req_node_id) {
	vlog(
	clusterlog.warn,
	"Got request to assign node ID, but feature not active",
	req.node.id());
	co_return errc::invalid_request;
	}
	if (
	req_has_node_uuid
	&& req.node_uuid.size() != model::node_uuid::type::length) {
	vlog(
	clusterlog.warn,
	"Invalid join request, expected node UUID or empty; got {}-byte "
	"value",
	req.node_uuid.size());
	co_return errc::invalid_request;
	}
	model::node_uuid node_uuid;
	if (!req_node_id && !req_has_node_uuid) {
	vlog(clusterlog.warn, "Node ID assignment attempt had no node UUID");
	co_return errc::invalid_request;
	}

	ss::sstring node_uuid_str = "no node_uuid";
	if (req_has_node_uuid) {
	node_uuid = model::node_uuid(uuid_t(req.node_uuid));
	node_uuid_str = ssx::sformat("{}", node_uuid);
	}
	vlog(
	clusterlog.info,
	"Processing node '{} ({})' join request (version {}-{})",
	req.node.id(),
	node_uuid_str,
	req.earliest_logical_version,
	req.latest_logical_version);

	if (!_raft0->is_elected_leader()) {
	vlog(clusterlog.debug, "Not the leader; dispatching to leader node");
	// Current node is not the leader have to send an RPC to leader
	// controller
	co_return co_await dispatch_rpc_to_leader(

CORE-7000: Node ID/UUID Override #22972

Are you sure you want to change the base?

CORE-7000: Node ID/UUID Override #22972

Conversation

oleiman commented Aug 20, 2024 • edited by michael-redpanda Loading

Backports Required

Release Notes

Improvements

oleiman commented Aug 21, 2024

vbotbuildovich commented Aug 21, 2024 • edited Loading

oleiman commented Aug 24, 2024

oleiman commented Aug 24, 2024

oleiman commented Aug 24, 2024

michael-redpanda left a comment

Choose a reason for hiding this comment

oleiman commented Aug 28, 2024

oleiman commented Aug 28, 2024

oleiman commented Aug 30, 2024

micheleRP left a comment

Choose a reason for hiding this comment

oleiman commented Sep 8, 2024

pgellert left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

oleiman commented Sep 9, 2024

Choose a reason for hiding this comment

oleiman Sep 10, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

oleiman Sep 10, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

oleiman Sep 10, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

oleiman Sep 11, 2024 • edited Loading

Choose a reason for hiding this comment

oleiman commented Sep 11, 2024

Choose a reason for hiding this comment

pgellert left a comment

Choose a reason for hiding this comment

oleiman commented Aug 20, 2024 •

edited by michael-redpanda

Loading

vbotbuildovich commented Aug 21, 2024 •

edited

Loading

oleiman Sep 10, 2024 •

edited

Loading

oleiman Sep 10, 2024 •

edited

Loading

oleiman Sep 10, 2024 •

edited

Loading

oleiman Sep 11, 2024 •

edited

Loading