Pod gets stuck in an infinite loop for any config change in a 3 node redpanda cluster on K8s (EKS) setup #6648

jeevanks · 2022-10-06T11:42:16Z

Version & Environment

Environment setup:

Redpanda version: v21.11.11 (rev ace82c7)
EKS with 3 nodes
Using redpanda operator (helm install procedure) + K8s (redpanda-cluster.yaml similar to one-node-cluster.yaml example)
Kubernetes (use kubectl version): v1.22.13

What went wrong?

I have a 3 node redpanda cluster with the pod and topic replication factor 3, whenever I change some config in the redpanda-cluster.yaml, e.g., change the cpu spec, etc. and do kubectl apply -f redpanda-cluster.yaml -n redpanda, one of the pods: redpanda-cluster-0 goes into an infinite loop of PodInitializing -> Waiting -> Terminating. The others are fine. I have played around with replication factor 2 and 4, but the result is still the same.

$ kubectl get pods -n redpanda
NAME                 READY   STATUS        RESTARTS   AGE
redpanda-cluster-0   0/2     Terminating   0          7s
redpanda-cluster-1   2/2     Running       0          14m
redpanda-cluster-2   2/2     Running       0          48m

Ps. doing kubectl delete and kubectl apply again version works as expected with this version (not with v22.2.x), but not really favourable.

What should have happened instead?

redpanda-cluster-0 should be green with status 'Running' in a few minutes.

How to reproduce the issue?

Create an EKS cluster with three nodes
Create a redpanda-cluster.yaml file with the below values:

apiVersion: redpanda.vectorized.io/v1alpha1
kind: Cluster
metadata:
  name: redpanda-cluster
spec:
  image: "vectorized/redpanda"
  version: "v21.11.11"
  annotations: 
    managed: "true"
  replicas: 3
  resources:
    requests:
      cpu: 1
      memory: 2Gi
    limits:
      cpu: 2
      memory: 2Gi
  enableSasl: true
  superUsers:
  - username: admin
  configuration:
    rpcServer:
      port: 33145
    kafkaApi:
    - port: 9092
    pandaproxyApi:
    - port: 8082
    schemaRegistry:
      port: 8081
    adminApi:
    - port: 9644
    developerMode: false
  additionalConfiguration:
    default_topic_replications: "3"

export VERSION="v21.11.11"
kubectl apply -k https://github.com/redpanda-data/redpanda/src/go/k8s/config/crd?ref=$VERSION
helm repo update
helm install \
  redpanda-operator \
  redpanda/redpanda-operator \
  --namespace redpanda-system \
  --create-namespace \
  --version $VERSION
kubectl create namespace redpanda
kubectl apply -n redpanda -f redpanda-cluster.yaml

Change the values in redpanda-cluster.yaml, e.g.,

  resources:
    requests:
      cpu: 1
      memory: 2Gi
    limits:
      **cpu: 1**
      memory: 2Gi
…

Additional information

Logs from the redpanda-cluster-0 pod in the redpanda container:

Failed to load logs: container "redpanda" in pod "redpanda-cluster-0" is waiting to start: PodInitializing
Reason: BadRequest (400)

Logs from the redpanda-operator pod in the manager container:

2022-10-06T11:02:09.871Z	INFO	controllers.redpanda.Cluster	Finished reconcile loop for redpanda/redpanda-cluster	{"redpandacluster": "redpanda/redpanda-cluster"}
2022-10-06T11:02:15.963Z	INFO	controllers.redpanda.Cluster	Starting reconcile loop for redpanda/redpanda-cluster	{"redpandacluster": "redpanda/redpanda-cluster"}
2022-10-06T11:02:16.005Z	INFO	controllers.redpanda.Cluster	Resource redpanda-cluster (PodDisruptionBudget) changed, updating. Diff: {"spec":{"selector":{"matchLabels":{"app.kubernetes.io/component":"redpanda","app.kubernetes.io/instance":"redpanda-cluster","app.kubernetes.io/name":"redpanda"}}}}	{"redpandacluster": "redpanda/redpanda-cluster", "Kind": ""}
2022-10-06T11:02:16.194Z	INFO	controllers.redpanda.Cluster	Running update	{"redpandacluster": "redpanda/redpanda-cluster", "Kind": "", "resource name": "redpanda-cluster"}
2022-10-06T11:02:16.212Z	INFO	controllers.redpanda.Cluster	Changes in Pod definition other than activeDeadlineSeconds, configurator and Redpanda container name. Deleting pod	{"redpandacluster": "redpanda/redpanda-cluster", "Kind": "", "pod-name": "redpanda-cluster-0", "patch": "eyJzcGVjIjp7InNlY3VyaXR5Q29udGV4dCI6eyJmc0dyb3VwIjoxMDF9fX0="}
2022-10-06T11:02:16.584Z	ERROR	controllers.redpanda.Cluster	wait for pod restart	{"redpandacluster": "redpanda/redpanda-cluster", "error": "RequeueAfterError wait for pod restart"}
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
	/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.9.7/pkg/internal/controller/controller.go:253
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2
	/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.9.7/pkg/internal/controller/controller.go:214

Something orthogonal

I am trying to upgrade the redpanda version from v21.11.11 to a newer version and that's when I stumbled upon this issue. I have tried this #3150 (comment), yet one of the pods enters this infinite loop.

The text was updated successfully, but these errors were encountered:

joejulian · 2023-10-25T22:03:14Z

This is fixed with the Redpanda resource. The Cluster resource is deprecated.

jeevanks added the kind/bug Something isn't working label Oct 6, 2022

mmedenjak added area/k8s community labels Oct 18, 2022

joejulian closed this as completed Oct 25, 2023

This issue was closed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pod gets stuck in an infinite loop for any config change in a 3 node redpanda cluster on K8s (EKS) setup #6648

Pod gets stuck in an infinite loop for any config change in a 3 node redpanda cluster on K8s (EKS) setup #6648

jeevanks commented Oct 6, 2022

joejulian commented Oct 25, 2023

Pod gets stuck in an infinite loop for any config change in a 3 node redpanda cluster on K8s (EKS) setup #6648

Pod gets stuck in an infinite loop for any config change in a 3 node redpanda cluster on K8s (EKS) setup #6648

Comments

jeevanks commented Oct 6, 2022

Version & Environment

What went wrong?

What should have happened instead?

How to reproduce the issue?

Additional information

Something orthogonal

joejulian commented Oct 25, 2023