-
Notifications
You must be signed in to change notification settings - Fork 573
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Operator: add support for downscaling #5019
Commits on Jun 16, 2022
-
operator: split readyReplicas from replicas and add currentReplicas
Now status fields have the following behaviour: - replicas: reflects StatefulSet status replicas (no longer readyReplicas) - readyReplicas: reflects StatefulSet status readyReplicas - currentReplicas: managed by the operator to dynamically change the current number of replicas, to gradually match user expectations
Configuration menu - View commit details
-
Copy full SHA for 2711976 - Browse repository at this point
Copy the full SHA 2711976View commit details -
operator: add decommissioningNode status field
When decommissioning a node, the field is populated with the ordinal number of the node being decommissioned. In case of recommission, it also indicates the node being currently recommissioned.
Configuration menu - View commit details
-
Copy full SHA for 8b3559d - Browse repository at this point
Copy the full SHA 8b3559dView commit details -
operator: change webhook to allow decommissioning
The replicas field can freely change and the controller will make sure that nodes are properly decommissioned. The only remaining restriction is that replicas cannot be 0 or nil.
Configuration menu - View commit details
-
Copy full SHA for 0131182 - Browse repository at this point
Copy the full SHA 0131182View commit details -
operator: move types to their own package to avoid dependency loop
This allows the pkg/resources package to use the admin API internal interface.
Configuration menu - View commit details
-
Copy full SHA for d6dac46 - Browse repository at this point
Copy the full SHA d6dac46View commit details -
Configuration menu - View commit details
-
Copy full SHA for e9ecea7 - Browse repository at this point
Copy the full SHA e9ecea7View commit details -
operator: allow scoping internal admin API to specific nodes
This allows to get local information from brokers, such as the local configuration.
Configuration menu - View commit details
-
Copy full SHA for f8a7bcd - Browse repository at this point
Copy the full SHA f8a7bcdView commit details -
operator: remove stack trace from logs when delay is requested
This produced a stacktrace in the logs, while waiting for a condition.
Configuration menu - View commit details
-
Copy full SHA for 1913822 - Browse repository at this point
Copy the full SHA 1913822View commit details -
Configuration menu - View commit details
-
Copy full SHA for 33c1034 - Browse repository at this point
Copy the full SHA 33c1034View commit details -
operator: add scale handler to properly decommission and recommission…
… nodes This adds a handler that correctly manages upscaling and downscaling the cluster, decommissioning nodes wheh needed. The handler uses `status.currentReplicas` to signal the amount of replicas that all subcontrollers should materialize. When a cluster is downscaled, the handler first tries to decommission the last node via admin API, then decreases the value of `status.currentReplicas`, to remove the node only when the cluster allows it. In case the cluster refuses to decommission a node (e.g. min replicas on a topic higher than the desired number of nodes), the user can increase `spec.replicas` to trigger a recommission of the node.
Configuration menu - View commit details
-
Copy full SHA for 4d74d4f - Browse repository at this point
Copy the full SHA 4d74d4fView commit details -
operator: implement progressive initialization to let node 0 create i…
…nitial raft group This tries to solve the problem with empty seed_servers on node 0. With this change, all fresh clusters will be initially set to 1 replica (via `status.currentReplicas`), until a cluster is created and the operator can verify it via admin API. Then the cluster is scaled to the number of instances desired by the user. After the cluster is initialized, and for the entire lifetime of the cluster, the `seed_servers` property will be populated with the full list of available servers, in every node of the cluster. This overcomes redpanda-data#333. Previously, node 0 was always forced to have an empty seed_servers property, but this caused problems when it lost the data dir, as it tried to create a brand-new cluster. With this change, even if node 0 loses the data dir, the seed_servers property will always point to other nodes, so it will try to join the existing cluster.
Configuration menu - View commit details
-
Copy full SHA for 5d62c76 - Browse repository at this point
Copy the full SHA 5d62c76View commit details -
operator: consider draining field when checking maintenance mode status
Since nodes are auto-draining as part of their shutdown hooks, it happens that when maintenance mode is activated for a decommissioned node, no process is really started. We just exit if that is the case.
Configuration menu - View commit details
-
Copy full SHA for d48e957 - Browse repository at this point
Copy the full SHA d48e957View commit details -
Configuration menu - View commit details
-
Copy full SHA for 2b02d34 - Browse repository at this point
Copy the full SHA 2b02d34View commit details -
Configuration menu - View commit details
-
Copy full SHA for 3b7b77e - Browse repository at this point
Copy the full SHA 3b7b77eView commit details -
operator: disable maintenance mode hook on node 0 when starting up a …
…fresh cluster This allows to have predictable initial cluster formation. When the cluster is first created, it's composed of a single node. On single-node clusters, we should not activate maintenance mode, because, otherwise, a restart of the node will make it drain leadership and the cluster will not form. On the counter-side, enabling maintenance mode when the cluster scales to multiple instances currently causes a restart of node 0. This will be solved when implementing dynamic hooks.
Configuration menu - View commit details
-
Copy full SHA for e9bba61 - Browse repository at this point
Copy the full SHA e9bba61View commit details -
Configuration menu - View commit details
-
Copy full SHA for 90812c1 - Browse repository at this point
Copy the full SHA 90812c1View commit details -
operator: fix maintenance mode activation on decommissioning node (wo…
…rkaround for redpanda-data#4999) When a node is shutdown after decommission, the maintenance mode hooks will trigger. While the process has no visible effect on partitions, it leaves the cluster in an inconsistent state, so that other nodes cannot enter maintenance mode. We force reset the flag with this change.
Configuration menu - View commit details
-
Copy full SHA for a9efa0a - Browse repository at this point
Copy the full SHA a9efa0aView commit details -
Configuration menu - View commit details
-
Copy full SHA for b1e3fac - Browse repository at this point
Copy the full SHA b1e3facView commit details -
operator: mark downscaling as alpha feature and add a startup flag
We should enable downscaling as feature gate when issue with reusable node IDs is fixed.
Configuration menu - View commit details
-
Copy full SHA for 5bd42df - Browse repository at this point
Copy the full SHA 5bd42dfView commit details