Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Operator: Scaling up a cluster triggers rolling restart #7313

Closed
0x5d opened this issue Nov 16, 2022 · 2 comments
Closed

Operator: Scaling up a cluster triggers rolling restart #7313

0x5d opened this issue Nov 16, 2022 · 2 comments
Labels
area/k8s kind/bug Something isn't working

Comments

@0x5d
Copy link
Contributor

0x5d commented Nov 16, 2022

Version & Environment

Redpanda version: (use rpk version): v22.3.1-rc4

What went wrong?

Increasing the cluster replicas triggers a rolling restart, which means that new Redpanda pods get scheduled on existing pods' nodes.
E.g. in an N node cluster scaled up to M nodes:

  • Operator triggers a rolling restart
  • Pod 0 is deleted (restarted)
  • Pod N+1 is scheduled in Node 0
  • Pod 0, which has an affinity for Node 0, becomes unschedulable

What should have happened instead?

Existing pods shouldn't be restarted, and new pods should be scheduled on available nodes.

How to reproduce the issue?

  1. Deploy an N-broker Redpanda cluster on an M-node k8s cluster using the operator.
  2. Edit the cluster CR, increasing the replicas from N to M
  3. Monitor the pods in the redpanda namespace (kubectl get pods -n redpanda -w)
  4. Watch a rolling restart be attempted, with a new pod being scheduled on node 0, and then pod 0 becoming unschedulable due to a persistent volume conflict.

Additional information

Please attach any relevant logs, backtraces, or metric charts.
Deleting the pod scheduled on pod 0's node allows the rolling restart to continue, but of course a pod inevitably becomes unschedulable in the end:

redpanda@ip-172-16-1-162:~$ kubectl get po -n redpanda -w
NAME                                       READY   STATUS        RESTARTS   AGE
rp-juan-1111-0                             0/1     Pending       0          22m
rp-juan-1111-1                             1/1     Running       0          93m
rp-juan-1111-2                             1/1     Running       0          93m
rp-juan-1111-3                             1/1     Terminating   0          79m
sasl-user-creation-first-superuser-pvwv7   0/1     Completed     0          93m
rp-juan-1111-3                             0/1     Terminating   0          79m
rp-juan-1111-3                             0/1     Terminating   0          79m
rp-juan-1111-3                             0/1     Terminating   0          79m
rp-juan-1111-3                             0/1     Pending       0          0s
rp-juan-1111-3                             0/1     Pending       0          0s
rp-juan-1111-0                             0/1     Pending       0          22m
rp-juan-1111-0                             0/1     Init:0/1      0          22m
rp-juan-1111-0                             0/1     PodInitializing   0          22m
@0x5d 0x5d added kind/bug Something isn't working area/k8s labels Nov 16, 2022
@0x5d
Copy link
Contributor Author

0x5d commented Nov 16, 2022

There's a beginning of a fix here: #4964

@joejulian
Copy link
Contributor

This doesn't happen now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/k8s kind/bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants