Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Operator: add support for downscaling #5019

Merged
merged 18 commits into from
Jun 21, 2022
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
18 commits
Select commit Hold shift + click to select a range
2711976
operator: split readyReplicas from replicas and add currentReplicas
nicolaferraro Jun 13, 2022
8b3559d
operator: add decommissioningNode status field
nicolaferraro Jun 13, 2022
0131182
operator: change webhook to allow decommissioning
nicolaferraro May 27, 2022
d6dac46
operator: move types to their own package to avoid dependency loop
nicolaferraro May 26, 2022
e9ecea7
rpk: add enum for membership status
nicolaferraro Jun 8, 2022
f8a7bcd
operator: allow scoping internal admin API to specific nodes
nicolaferraro Jun 8, 2022
1913822
operator: remove stack trace from logs when delay is requested
nicolaferraro May 27, 2022
33c1034
operator: enable decommission API functions in internal admin API
nicolaferraro Jun 14, 2022
4d74d4f
operator: add scale handler to properly decommission and recommission…
nicolaferraro Jun 15, 2022
5d62c76
operator: implement progressive initialization to let node 0 create i…
nicolaferraro Jun 13, 2022
d48e957
operator: consider draining field when checking maintenance mode status
nicolaferraro Jun 3, 2022
2b02d34
operator: add controller tests for scaling
nicolaferraro Jun 10, 2022
3b7b77e
operator: add kuttl test for decommission
nicolaferraro Jun 3, 2022
e9bba61
operator: disable maintenance mode hook on node 0 when starting up a …
nicolaferraro Jun 9, 2022
90812c1
operator: add documentation to the scale handler
nicolaferraro Jun 9, 2022
a9efa0a
operator: fix maintenance mode activation on decommissioning node (wo…
nicolaferraro Jun 10, 2022
b1e3fac
operator: make decommission wait interval configurable and add jitter
nicolaferraro Jun 15, 2022
5bd42df
operator: mark downscaling as alpha feature and add a startup flag
nicolaferraro Jun 16, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 18 additions & 0 deletions src/go/k8s/apis/redpanda/v1alpha1/cluster_webhook.go
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,11 @@ const (
defaultSchemaRegistryPort = 8081
)

// AllowDownscalingInWebhook controls the downscaling alpha feature in the Cluster custom resource.
// Downscaling is not stable since nodeIDs are currently not reusable, so adding to a cluster a node
// that has previously been decommissioned can cause issues.
Comment on lines +46 to +48
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be good to describe in commit message or in this comment what consequences might happen if someone will downscale the cluster while Kafka clients are still connect.

cc @jcsp @mmaslankaprv

var AllowDownscalingInWebhook = false

type resourceField struct {
resources *corev1.ResourceRequirements
path *field.Path
Expand Down Expand Up @@ -177,6 +182,8 @@ func (r *Cluster) ValidateUpdate(old runtime.Object) error {

allErrs = append(allErrs, r.validateScaling()...)
alenkacz marked this conversation as resolved.
Show resolved Hide resolved

allErrs = append(allErrs, r.validateDownscaling(oldCluster)...)

allErrs = append(allErrs, r.validateKafkaListeners()...)

allErrs = append(allErrs, r.validateAdminListeners()...)
Expand Down Expand Up @@ -227,6 +234,17 @@ func (r *Cluster) validateScaling() field.ErrorList {
return allErrs
}

func (r *Cluster) validateDownscaling(old *Cluster) field.ErrorList {
var allErrs field.ErrorList
if !AllowDownscalingInWebhook && old.Spec.Replicas != nil && r.Spec.Replicas != nil && *r.Spec.Replicas < *old.Spec.Replicas {
allErrs = append(allErrs,
field.Invalid(field.NewPath("spec").Child("replicas"),
r.Spec.Replicas,
"downscaling is an alpha feature: set --allow-downscaling in the controller parameters to enable it"))
}
return allErrs
}

func (r *Cluster) validateAdminListeners() field.ErrorList {
var allErrs field.ErrorList
externalAdmin := r.AdminAPIExternal()
Expand Down
1 change: 1 addition & 0 deletions src/go/k8s/main.go
Original file line number Diff line number Diff line change
Expand Up @@ -70,6 +70,7 @@ func main() {
flag.StringVar(&configuratorTag, "configurator-tag", "latest", "Set the configurator tag")
flag.StringVar(&configuratorImagePullPolicy, "configurator-image-pull-policy", "Always", "Set the configurator image pull policy")
flag.DurationVar(&decommissionWaitInterval, "decommission-wait-interval", 8*time.Second, "Set the time to wait for a node decommission to happen in the cluster")
flag.BoolVar(&redpandav1alpha1.AllowDownscalingInWebhook, "allow-downscaling", false, "Allow to reduce the number of replicas in existing clusters (alpha feature)")

opts := zap.Options{
Development: true,
Expand Down
12 changes: 5 additions & 7 deletions src/go/k8s/tests/e2e/decommission/00-assert.yaml
Original file line number Diff line number Diff line change
@@ -1,10 +1,8 @@
apiVersion: redpanda.vectorized.io/v1alpha1
kind: Cluster
metadata:
name: decommissioning
status:
replicas: 3
currentReplicas: 3
apiVersion: kuttl.dev/v1beta1
kind: TestAssert
commands:
- command: kubectl rollout status deployment redpanda-controller-manager -n redpanda-system
- command: hack/wait-for-webhook-ready.sh
---

apiVersion: kuttl.dev/v1beta1
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
apiVersion: kuttl.dev/v1beta1
kind: TestStep
commands:
# Set the --allow-downscaling=true parameter in the controller manager
- command: kubectl patch deployment redpanda-controller-manager -n redpanda-system --type='json' -p='[{"op":"add", "path":"/spec/template/spec/containers/1/args/-", "value":"--allow-downscaling=true"}]'
- command: kubectl get deployment redpanda-controller-manager -n redpanda-system -o json
16 changes: 5 additions & 11 deletions src/go/k8s/tests/e2e/decommission/01-assert.yaml
Original file line number Diff line number Diff line change
@@ -1,16 +1,10 @@
apiVersion: v1
kind: Pod
apiVersion: redpanda.vectorized.io/v1alpha1
kind: Cluster
metadata:
labels:
job-name: wait-for-3-brokers
name: decommissioning
status:
containerStatuses:
- name: curl
state:
terminated:
message: |
3
phase: Succeeded
replicas: 3
currentReplicas: 3
---

apiVersion: kuttl.dev/v1beta1
Expand Down
15 changes: 11 additions & 4 deletions src/go/k8s/tests/e2e/decommission/02-assert.yaml
Original file line number Diff line number Diff line change
@@ -1,9 +1,16 @@
apiVersion: redpanda.vectorized.io/v1alpha1
kind: Cluster
apiVersion: v1
kind: Pod
metadata:
name: decommissioning
labels:
job-name: wait-for-3-brokers
status:
replicas: 2
containerStatuses:
- name: curl
state:
terminated:
message: |
3
phase: Succeeded
---

apiVersion: kuttl.dev/v1beta1
Expand Down
15 changes: 4 additions & 11 deletions src/go/k8s/tests/e2e/decommission/03-assert.yaml
Original file line number Diff line number Diff line change
@@ -1,16 +1,9 @@
apiVersion: v1
kind: Pod
apiVersion: redpanda.vectorized.io/v1alpha1
kind: Cluster
metadata:
labels:
job-name: wait-for-2-brokers
name: decommissioning
status:
containerStatuses:
- name: curl
state:
terminated:
message: |
2
phase: Succeeded
replicas: 2
---

apiVersion: kuttl.dev/v1beta1
Expand Down
29 changes: 29 additions & 0 deletions src/go/k8s/tests/e2e/decommission/04-assert.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
apiVersion: v1
kind: Pod
metadata:
labels:
job-name: wait-for-2-brokers
status:
containerStatuses:
- name: curl
state:
terminated:
message: |
2
phase: Succeeded
---

apiVersion: kuttl.dev/v1beta1
kind: TestAssert
collectors:
- type: pod
selector: app.kubernetes.io/name=redpanda
tail: -1
- type: pod
namespace: redpanda-system
selector: control-plane=controller-manager
tail: -1
- type: command
command: kubectl get clusters -o jsonpath={@} -n $NAMESPACE
- type: command
command: kubectl get pods -o jsonpath={@} -n $NAMESPACE
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
apiVersion: kuttl.dev/v1beta1
kind: TestStep
commands:
# Downscaling will be set to the default value
- command: kubectl patch deployment redpanda-controller-manager -n redpanda-system --type='json' -p='[{"op":"remove", "path":"/spec/template/spec/containers/1/args/4"}]'
22 changes: 22 additions & 0 deletions src/go/k8s/tests/e2e/decommission/06-assert.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
apiVersion: kuttl.dev/v1beta1
kind: TestAssert
commands:
- command: kubectl rollout status deployment redpanda-controller-manager -n redpanda-system
- command: hack/wait-for-webhook-ready.sh

---

apiVersion: kuttl.dev/v1beta1
kind: TestAssert
collectors:
- type: pod
selector: app.kubernetes.io/name=redpanda
tail: -1
- type: pod
namespace: redpanda-system
selector: control-plane=controller-manager
tail: -1
- type: command
command: kubectl get clusters -o jsonpath={@} -n $NAMESPACE
- type: command
command: kubectl get pods -o jsonpath={@} -n $NAMESPACE