Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CORE-7000: Node ID/UUID Override #22972

Open
wants to merge 8 commits into
base: dev
Choose a base branch
from

Conversation

oleiman
Copy link
Member

@oleiman oleiman commented Aug 20, 2024

Implements on-startup node UUID and ID override via CLI options or node config, as described in 2024-08-14 - RFC - Override Node UUID on Startup.

This PR was originally meant as a fully functional proof of concept, but given the mechanical simplicity of the approach (most of the code here is tests and serdes for the new config types), it has been promoted to PR.

Closes CORE-7000
Closes CORE-6830

TODO:

Backports Required

  • none - not a bug fix
  • none - this is a backport
  • none - issue does not exist in previous branches
  • none - papercut/not impactful enough to backport
  • v24.2.x
  • v24.1.x
  • v23.3.x

Release Notes

Improvements

  • Adds the ability to configure Node UUID and ID overrides at broker startup.

@oleiman oleiman self-assigned this Aug 20, 2024
@oleiman
Copy link
Member Author

oleiman commented Aug 21, 2024

/ci-repeat 1

@oleiman oleiman force-pushed the nodeuuid/core-7000/poc branch 4 times, most recently from 6b16c80 to 9d48a3f Compare August 24, 2024 04:33
@oleiman
Copy link
Member Author

oleiman commented Aug 24, 2024

/ci-repeat 1

@oleiman
Copy link
Member Author

oleiman commented Aug 24, 2024

@michael-redpanda - leaving this in draft pending final RFC signoff, but this is probably worth a look whenever you're ready 🙏

@oleiman
Copy link
Member Author

oleiman commented Aug 24, 2024

/ci-repeat 1

Copy link
Contributor

@michael-redpanda michael-redpanda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks really nice so far!

src/v/config/node_overrides.cc Outdated Show resolved Hide resolved
src/v/config/node_overrides.cc Show resolved Hide resolved
src/v/config/node_overrides.cc Show resolved Hide resolved
src/v/config/node_overrides.cc Outdated Show resolved Hide resolved
src/v/config/node_overrides.cc Outdated Show resolved Hide resolved
src/v/config/node_overrides.cc Outdated Show resolved Hide resolved
src/v/config/node_overrides.cc Outdated Show resolved Hide resolved
src/v/config/node_overrides.h Show resolved Hide resolved
src/v/config/node_overrides.h Outdated Show resolved Hide resolved
src/v/redpanda/application.cc Outdated Show resolved Hide resolved
@oleiman
Copy link
Member Author

oleiman commented Aug 28, 2024

force push contents:

  • various CR cleanups
  • add a CLI options test to the multi-node dt case

@oleiman
Copy link
Member Author

oleiman commented Aug 28, 2024

/ci-repeat 1

@oleiman
Copy link
Member Author

oleiman commented Aug 30, 2024

@oleiman oleiman marked this pull request as ready for review August 30, 2024 19:36
@oleiman oleiman requested a review from a team as a code owner August 30, 2024 19:36
@oleiman oleiman changed the title CORE-7000: Node ID/UUID Override Proof of Concept CORE-7000: Node ID/UUID Override Aug 30, 2024
micheleRP
micheleRP previously approved these changes Aug 30, 2024
Copy link

@micheleRP micheleRP left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM from docs

@oleiman
Copy link
Member Author

oleiman commented Sep 8, 2024

force push to improve integration tests somewhat:

  • use RedpandaService.healthy
  • remove unnecessary leadership transfer

Copy link
Contributor

@pgellert pgellert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The core logic looks good to me. I've added some comments for code improvement.

src/v/config/node_overrides.cc Outdated Show resolved Hide resolved
src/v/redpanda/application.cc Outdated Show resolved Hide resolved
Comment on lines 2644 to 2645
// NOTE(oren): what happens later if there's a node ID in config but we
// didn't take it? in the nullopt check down below...
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on the code this seems fine. This will hit the else branch below. Then inside cluster_discovery::dispatch_node_uuid_registration_to_seeds and members_manager::handle_join_request we will try to register the node with the node id config::node().node_id() and either succeed or fail to get that specific node id but we will never get any other node id. So there's no need to set the node_id again in the conditional below with the nullopt check.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that matches my reading, but I'm curious whether there's a reason for keeping it around if we know we're starting up clean. If that node ID had previously been registered against a different UUID, I think the (re)join request will always fail.

Feels like we're trading a small amount of one-shot work on startup for cyclomatic complexity downstream; possibly I'm overlooking some other bit of state that gives the distinction meaning.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At any rate, the comment is just noise!

tests/rptest/tests/admin_uuid_operations_test.py Outdated Show resolved Hide resolved
tests/rptest/tests/admin_uuid_operations_test.py Outdated Show resolved Hide resolved
tests/rptest/tests/admin_uuid_operations_test.py Outdated Show resolved Hide resolved
src/v/redpanda/application.cc Outdated Show resolved Hide resolved
tests/rptest/tests/admin_uuid_operations_test.py Outdated Show resolved Hide resolved
@oleiman
Copy link
Member Author

oleiman commented Sep 9, 2024

force push contents:

  • Use a regex for cli parsing
  • remove some comments
  • ducktape cleanup

src/v/utils/uuid.h Outdated Show resolved Hide resolved
}
static bool decode(const Node& node, type& rhs) {
auto value = node.as<std::string>();
auto out = [&value]() -> std::optional<model::node_uuid> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: any reason for the "construct lambda then immediately call" pattern?

Copy link
Member Author

@oleiman oleiman Sep 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

personal preference. "initialize this optional with the result of from_string or, if it throws, nullptr" feels a bit neater than "initialize this optional to nullptr then assign the result of from_string, unless it throws". The ID in the outer scope receives a value at exactly one code point.

@@ -2622,6 +2663,18 @@ void application::wire_up_and_start(::stop_signal& app_signal, bool test_mode) {
"Running with already-established node ID {}",
config::node().node_id());
node_id = config::node().node_id().value();
} else if (auto id = _node_overrides.node_id(); id.has_value()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it makes sense to try to register with the cluster (next conditional branch) in this case as well. For two reasons:

  • it will get us fresh features and cluster config snapshots from the controller leader
  • it will allow us to reject erroneous configurations (e.g. if the UUID is already registered with a different id).

Not sure if all the code for this is there, but looks like at least the join RPC supports passing existing node ids.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. I'll refactor the conditionals a little bit.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Won't we hit the problem of needing a controller leader if we try to register here? Or do we also want to change members_manager::handle_join_request to not require controller leadership when the node is trying to register with a known (node uuid, node id) pair?

members_manager::handle_join_request(join_node_request const req) {
using ret_t = result<join_node_reply>;
using status_t = join_node_reply::status_code;
bool node_id_assignment_supported = _feature_table.local().is_active(
features::feature::node_id_assignment);
bool req_has_node_uuid = !req.node_uuid.empty();
if (node_id_assignment_supported && !req_has_node_uuid) {
vlog(
clusterlog.warn,
"Invalid join request for node ID {}, node UUID is required",
req.node.id());
co_return errc::invalid_request;
}
std::optional<model::node_id> req_node_id = std::nullopt;
if (req.node.id() >= 0) {
req_node_id = req.node.id();
}
if (!node_id_assignment_supported && !req_node_id) {
vlog(
clusterlog.warn,
"Got request to assign node ID, but feature not active",
req.node.id());
co_return errc::invalid_request;
}
if (
req_has_node_uuid
&& req.node_uuid.size() != model::node_uuid::type::length) {
vlog(
clusterlog.warn,
"Invalid join request, expected node UUID or empty; got {}-byte "
"value",
req.node_uuid.size());
co_return errc::invalid_request;
}
model::node_uuid node_uuid;
if (!req_node_id && !req_has_node_uuid) {
vlog(clusterlog.warn, "Node ID assignment attempt had no node UUID");
co_return errc::invalid_request;
}
ss::sstring node_uuid_str = "no node_uuid";
if (req_has_node_uuid) {
node_uuid = model::node_uuid(uuid_t(req.node_uuid));
node_uuid_str = ssx::sformat("{}", node_uuid);
}
vlog(
clusterlog.info,
"Processing node '{} ({})' join request (version {}-{})",
req.node.id(),
node_uuid_str,
req.earliest_logical_version,
req.latest_logical_version);
if (!_raft0->is_elected_leader()) {
vlog(clusterlog.debug, "Not the leader; dispatching to leader node");
// Current node is not the leader have to send an RPC to leader
// controller
co_return co_await dispatch_rpc_to_leader(

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm good point, maybe it won't work then.

Copy link
Member Author

@oleiman oleiman Sep 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, the exact requirement we're trying to circumvent. i was wondering whether the cluster layer might short circuit somewhere (or be made to short circuit) on previously known <ID,UUID>, but I suppose all the request routing is controller leadership based 🤷

self.logger.debug(
f"...and decommission ghost node [{ghost_node_id}]...")

self.admin.decommission_broker(ghost_node_id, node=to_stop[0])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this won't actually wait for the decom to be successful.


if mode == TestMode.CFG_OVERRIDE:
self.redpanda.restart_nodes(
to_stop,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: we are already kind of testing that the nodes will only adopt overrides that match the current uuid, but maybe it will be more realistic to do what k8s will do and perform a full rolling restart.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair point. Will do

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm beginning to think that "rolling restart" (I used this language in the RFC and the runbook) is not quite accurate. In DT, for example, we have a RollingRestarter that a) uses maintenance mode by default b) requires the cluster to be healthy before cycling each node. Presumably a k8s rolling restart looks somewhat similar.

Of course we can't meet either of those in this usage case. In DT we can fudge it with an unsafe param or something, but we may want to reconsider this framing with respect to support-facing docs.

Copy link
Member Author

@oleiman oleiman Sep 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's worse than that, I think - in this test case, I believe a rolling restart is straightforwardly impossible, since restarting one node (with overrides) gives us a total of 2 live brokers - not enough for to form a cluster due to the presence of ghost nodes.

My intuition is that we can start the nodes concurrently (as written) as long as the number of nodes we're restarting is <= (n_nodes + n_ghosts) / 2. What matters is that the empty nodes don't form a cluster among themselves, independent of any nodes containing a complete controller log, right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that makes sense to me. We want to restart the minimum number of ghost nodes to form a healthy cluster and wait for them to form a healthy controller quorum before continuing to restart the remaining nodes.

It makes sense to call out in the support docs not to use decommissioning or maintenance mode during any of this.

Perhaps the most important thing here is to not restart the healthy nodes to make sure they are available throughout the whole process.

Copy link
Member Author

@oleiman oleiman Sep 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

most important thing here is to not restart the healthy nodes

yest exactly. in fact, if we restart those nodes, I think they will get stuck just the same.

call out in the support docs not to use decommissioning or maintenance mode

Yup, and that they are unavailable.

Generally, I think we're back to the point where we have a stack of assumptions that look right but would benefit from a ✅ from Alexey or @mmaslankaprv when he becomes available

To enable boost::lexical_cast for program_options parsing and UUIDs in configs.

Signed-off-by: Oren Leiman <oren.leiman@redpanda.com>
- config::node_id_override
- config::node_override_store

Includes json/yaml SerDes and unit tests

Signed-off-by: Oren Leiman <oren.leiman@redpanda.com>
- node_id_overrides: std::vector<config::node_id_override>

Signed-off-by: Oren Leiman <oren.leiman@redpanda.com>
Signed-off-by: Oren Leiman <oren.leiman@redpanda.com>
@oleiman
Copy link
Member Author

oleiman commented Sep 11, 2024

force push CR changes, mostly hardening tests

tests/rptest/tests/admin_uuid_operations_test.py Outdated Show resolved Hide resolved

if mode == TestMode.CFG_OVERRIDE:
self.redpanda.restart_nodes(
to_stop,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that makes sense to me. We want to restart the minimum number of ghost nodes to form a healthy cluster and wait for them to form a healthy controller quorum before continuing to restart the remaining nodes.

It makes sense to call out in the support docs not to use decommissioning or maintenance mode during any of this.

Perhaps the most important thing here is to not restart the healthy nodes to make sure they are available throughout the whole process.

tests/rptest/tests/admin_uuid_operations_test.py Outdated Show resolved Hide resolved
Signed-off-by: Oren Leiman <oren.leiman@redpanda.com>
And wire them into the corresponding node configs.

"--node-id-overrides uuid:uuid:id [uuid:uuid:id ...]"

Signed-off-by: Oren Leiman <oren.leiman@redpanda.com>
and 'restart_nodes'

Signed-off-by: Oren Leiman <oren.leiman@redpanda.com>
Signed-off-by: Oren Leiman <oren.leiman@redpanda.com>
Copy link
Contributor

@pgellert pgellert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants