-
Notifications
You must be signed in to change notification settings - Fork 579
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cluster: implement "Feature manager" #2938
Conversation
c881268
to
0f109f6
Compare
6bde190
to
5aa1aa1
Compare
5aa1aa1
to
9900b37
Compare
@mmaslankaprv would appreciate your thoughts on the structure of the controller bits. I've ended up with a table/backend/frontend separation which feels a bit heavyweight for what this is actually doing, but maybe it's the cost of doing business. |
just wanted to say this is awesome. |
|
||
// Bitmask only used at runtime: if we run out of bits for features | ||
// just use a bigger one. Do not serialize this as a bitmask anywhere. | ||
uint64_t _active_features_mask{0}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
std::bitset is reasonable too if you want to avoid the manual masking effort
Tests use wait_for_controller_leadership as a utility to wait for all the cluster setup to be done before proceeding. New infrastructure in config_manager and feature_manager leads to some async execution of config writes to the controller log in the background, which confuses some tests that expect all controller writes to be done before they start. Extend wait_for_controller_leadership to wait for these writes to raft0 before proceeding.
Signed-off-by: John Spray <jcs@vectorized.io>
For services (like feature manager) that would like to peek at node health reports as they come in. Signed-off-by: John Spray <jcs@vectorized.io>
Signed-off-by: John Spray <jcs@vectorized.io>
Signed-off-by: John Spray <jcs@vectorized.io>
Signed-off-by: John Spray <jcs@vectorized.io>
Signed-off-by: John Spray <jcs@vectorized.io>
This replaces `join` provides clean encoding versioning, and carries a cluster_version to enable servers to refuse join requests from incompatible versions. Signed-off-by: John Spray <jcs@vectorized.io>
Only use of old one is now in handling incoming RPCs from old versions. This means that new-versioned redpanda will only be able to join new-versioned clusters. That would only impact someone who tried to join an old version to a newer cluster, or someone tryin to join an old version to a cluster in the middle of a rolling upgrade. Signed-off-by: John Spray <jcs@vectorized.io>
When a subsystem wants to check for a feature during startup, it is convenient to do so via a future, to avoid awkward races between initialization of the feature table via raft0 replay, and initialization of other subsystems.
...to only enable central config if the feature is active. Signed-off-by: John Spray <jcs@vectorized.io>
Where the feature table specifies a cluster-wide logical version, do not permit older nodes to join. Where it does not, do not permit nodes older than the current node to join.
Signed-off-by: John Spray <jcs@vectorized.io>
Signed-off-by: John Spray <jcs@vectorized.io>
This will happen before we are in a position to check features, but that's okay. If the cache of cluster configuration settings doesn't exist, we fall back to redpanda.yml. Signed-off-by: John Spray <jcs@vectorized.io>
This is an integration testing hook. It is more invasive than I would like, but pretty simple and hopefully obvious to anyone encountering this what is going on. This is NOT for use in the field, and is intentionally undocumented.
This is used for driving the __REDPANDA_LOGICAL_VERSION testing hook for the feature manager.
Signed-off-by: John Spray <jcs@vectorized.io>
Old clusters use encoding version 0, new clusters use encoding version 1 and include the logical version.
This was trying to log current_exception() as if we were in a catch{} block, but it's a future handler.
85785bb
to
4d3c5df
Compare
Retrying CI on a failure of nodes_decommissioning_test (#3878) |
Cover letter
This is a subset of the feature manager design here https://docs.google.com/document/d/1QvHcyIK-aQLILLVAlOE0S1qA1s68ufkZxw3PmIJtGYg/edit# -- not enabling manual toggling of features or storing those individual feature states, but just storing+updating the overall cluster logical version and using an internal mapping of version to available features.
There are broadly 3 pieces to this:
cluster_version
)Fixes: #3704
Features
v1/features
admin API endpoint is added, which can be used by automation scripts to query an internal logical cluster version, and feature flags for newly added functionality.Improvements