Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pod gets stuck in an infinite loop for any config change in a 3 node redpanda cluster on K8s (EKS) setup #6648

Closed
jeevanks opened this issue Oct 6, 2022 · 1 comment
Labels
area/k8s community kind/bug Something isn't working

Comments

@jeevanks
Copy link

jeevanks commented Oct 6, 2022

Version & Environment

Environment setup:

  • Redpanda version: v21.11.11 (rev ace82c7)
  • EKS with 3 nodes
  • Using redpanda operator (helm install procedure) + K8s (redpanda-cluster.yaml similar to one-node-cluster.yaml example)
  • Kubernetes (use kubectl version): v1.22.13

What went wrong?

I have a 3 node redpanda cluster with the pod and topic replication factor 3, whenever I change some config in the redpanda-cluster.yaml, e.g., change the cpu spec, etc. and do kubectl apply -f redpanda-cluster.yaml -n redpanda, one of the pods: redpanda-cluster-0 goes into an infinite loop of PodInitializing -> Waiting -> Terminating. The others are fine. I have played around with replication factor 2 and 4, but the result is still the same.

$ kubectl get pods -n redpanda
NAME                 READY   STATUS        RESTARTS   AGE
redpanda-cluster-0   0/2     Terminating   0          7s
redpanda-cluster-1   2/2     Running       0          14m
redpanda-cluster-2   2/2     Running       0          48m

Ps. doing kubectl delete and kubectl apply again version works as expected with this version (not with v22.2.x), but not really favourable.

What should have happened instead?

redpanda-cluster-0 should be green with status 'Running' in a few minutes.

How to reproduce the issue?

  1. Create an EKS cluster with three nodes

  2. Create a redpanda-cluster.yaml file with the below values:

apiVersion: redpanda.vectorized.io/v1alpha1
kind: Cluster
metadata:
  name: redpanda-cluster
spec:
  image: "vectorized/redpanda"
  version: "v21.11.11"
  annotations: 
    managed: "true"
  replicas: 3
  resources:
    requests:
      cpu: 1
      memory: 2Gi
    limits:
      cpu: 2
      memory: 2Gi
  enableSasl: true
  superUsers:
  - username: admin
  configuration:
    rpcServer:
      port: 33145
    kafkaApi:
    - port: 9092
    pandaproxyApi:
    - port: 8082
    schemaRegistry:
      port: 8081
    adminApi:
    - port: 9644
    developerMode: false
  additionalConfiguration:
    default_topic_replications: "3"
export VERSION="v21.11.11"
kubectl apply -k https://github.com/redpanda-data/redpanda/src/go/k8s/config/crd?ref=$VERSION
helm repo update
helm install \
  redpanda-operator \
  redpanda/redpanda-operator \
  --namespace redpanda-system \
  --create-namespace \
  --version $VERSION
kubectl create namespace redpanda
kubectl apply -n redpanda -f redpanda-cluster.yaml
  1. Change the values in redpanda-cluster.yaml, e.g.,
  resources:
    requests:
      cpu: 1
      memory: 2Gi
    limits:
      **cpu: 1**
      memory: 2Gi
…

Additional information

Logs from the redpanda-cluster-0 pod in the redpanda container:

Failed to load logs: container "redpanda" in pod "redpanda-cluster-0" is waiting to start: PodInitializing
Reason: BadRequest (400)

Logs from the redpanda-operator pod in the manager container:

2022-10-06T11:02:09.871Z	INFO	controllers.redpanda.Cluster	Finished reconcile loop for redpanda/redpanda-cluster	{"redpandacluster": "redpanda/redpanda-cluster"}
2022-10-06T11:02:15.963Z	INFO	controllers.redpanda.Cluster	Starting reconcile loop for redpanda/redpanda-cluster	{"redpandacluster": "redpanda/redpanda-cluster"}
2022-10-06T11:02:16.005Z	INFO	controllers.redpanda.Cluster	Resource redpanda-cluster (PodDisruptionBudget) changed, updating. Diff: {"spec":{"selector":{"matchLabels":{"app.kubernetes.io/component":"redpanda","app.kubernetes.io/instance":"redpanda-cluster","app.kubernetes.io/name":"redpanda"}}}}	{"redpandacluster": "redpanda/redpanda-cluster", "Kind": ""}
2022-10-06T11:02:16.194Z	INFO	controllers.redpanda.Cluster	Running update	{"redpandacluster": "redpanda/redpanda-cluster", "Kind": "", "resource name": "redpanda-cluster"}
2022-10-06T11:02:16.212Z	INFO	controllers.redpanda.Cluster	Changes in Pod definition other than activeDeadlineSeconds, configurator and Redpanda container name. Deleting pod	{"redpandacluster": "redpanda/redpanda-cluster", "Kind": "", "pod-name": "redpanda-cluster-0", "patch": "eyJzcGVjIjp7InNlY3VyaXR5Q29udGV4dCI6eyJmc0dyb3VwIjoxMDF9fX0="}
2022-10-06T11:02:16.584Z	ERROR	controllers.redpanda.Cluster	wait for pod restart	{"redpandacluster": "redpanda/redpanda-cluster", "error": "RequeueAfterError wait for pod restart"}
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
	/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.9.7/pkg/internal/controller/controller.go:253
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2
	/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.9.7/pkg/internal/controller/controller.go:214

Something orthogonal

I am trying to upgrade the redpanda version from v21.11.11 to a newer version and that's when I stumbled upon this issue. I have tried this #3150 (comment), yet one of the pods enters this infinite loop.

@jeevanks jeevanks added the kind/bug Something isn't working label Oct 6, 2022
@joejulian
Copy link
Contributor

This is fixed with the Redpanda resource. The Cluster resource is deprecated.

This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/k8s community kind/bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants