updating the redpanda operator shouldn't restart/upgrade statefulsets for clusters with pinned versions #3150

flokli · 2021-12-03T18:31:14Z

Version & Environment

redpanda-operator v21.9.6 upgraded to v21.10.2.

What went wrong?

I upgraded the redpanda-operator (with the helm chart).

My redpanda cluster CRs explicitly set a pinned redpanda version (via spec.version).

After the upgrade of redpanda-operator, I realized the redpanda pods being restarted. Upon further inspection, I realized the pod got restarted, due to the vectorized/configurator image getting bumped to v21.10.2.

What should have happened instead?

I'd assume redpanda pods to not get upgraded on a bump of the operator, if a specific redpanda version is specified. I would have expected vectorized/configurator to stay at v21.9.6, like the main image.

How to reproduce the issue?

Deploy an older version of redpanda-operator
Create a Cluster resource with that version explicitly set
Upgrade redpanda-operator
Observe pods getting restarted

Additional information

Maybe related: #3023

The text was updated successfully, but these errors were encountered:

rkruze · 2021-12-06T16:34:09Z

Thank you for this. Updating the operator can cause restarts which we need to document. If a config change occurs for Redpanda it will force a restart of the cluster.

flokli · 2021-12-08T22:18:30Z

Yes, indeed this should be documented.

However, the current behaviour means /every/ upgrade of the operator will cause restarts, as it updates the pod template to use the operator version for vectorized/configurator, even though an explicit redpanda version is specified (via spec.version).

I doubt that's intended behaviour - Is there even a guarantee old redpanda images work with a more recent configurator?

I'd expect the operator to keep vectorized/configurator in the same version as the vectorized/redpanda image…

alenkacz · 2021-12-09T07:45:16Z

@flokli currently this is the case, we can do better job at documenting which version triggers restart and which does not. Can I ask what is the use case for you updating the operator and not updating the redpanda version?

Internally we always try to update both at the same time, which is also what we currently test for - that those two versions are aligned. To prevent multiple restarts we leverage the managed annotation so before you update your operator, you can remove all redpandas from active management by this annotation - then the operator won't touch it. https://github.com/vectorizedio/redpanda/blob/91ae813c79e5754808489a30769b0187efc75e86/src/go/k8s/controllers/redpanda/cluster_controller.go#L100 After you adjust all version you just remove that annotation again.

flokli · 2021-12-14T18:10:51Z

Imagine having a Kubernetes cluster with multiple redpanda clusters in different namespaces, and updating the redpanda version there in a controlled fashion.

As soon as we update the operator, it'll restart all clusters (as it updates the configurator image). I'd expect the operator to not upgrade (parts of) these clusters, but stay at the old configurator, and then individually update them (both configurator and main payload) once I update the spec.version field in the redpanda CR.

alenkacz · 2021-12-14T18:34:06Z

@flokli yep, I understand that problem, we have the same in our cloud environment. Unfortunately within an operator that's still under active development it's very hard to prevent these, because of the way the reconcile loop works in kubernetes. Other operators have these as well.

The way we do it in cloud is that we:

always try to keep those versions aligned (we always try to update redpanda along with the operator)
when we do that, we leverage the annotation I mentioned https://github.com/vectorizedio/redpanda/blob/91ae813c79e5754808489a30769b0187efc75e86/src/go/k8s/controllers/redpanda/cluster_controller.go#L100
we first remove all redpandas from active management of the operator (using that annotation)
we update the operator
and then one by one we update each redpanda version and remove that annotation and let it restart

flokli · 2022-01-02T10:08:14Z

Understood you have the same problem. But what's done by the configurator image, and does it need to be aligned with the version of the operator, and not the desired cluster version?

Simply using spec.Version for the configurator image too should be an easy way to prevent these kind of restarts, with possibly incompatible halfway upgrades.

alenkacz · 2022-01-05T15:16:09Z

If I understand you correctly, you're saying that with configurator tag being here 2848b59#diff-8074ab08efa0940c749c32af395271faf34280c16da67864c5334d6c0f7f1596R64 you always trigger restart of the cluster because you redeploy operator with the new tag specified and it rolls out redpanda because it changed. Using always spec.Version would certainly help with that, you're right.

On the other hand, you have to keep in mind, that updating operator trigger restarts for other reasons as well (as I said, this is due to the fact how reconciliation loop works in kubernetes - if we e.g. change something in statefulset or some other resource, it might trigger restart of all clusters anyway and it's not really easy to prevent that - we'll try to minimize those though). So I think for truly production clusters I would always anyway just because of that chance that it CAN happen always follow the upgrade procedure I outlined above.

flokli added the kind/bug Something isn't working label Dec 3, 2021

mmedenjak added the community label Jul 16, 2022

jeevanks mentioned this issue Oct 6, 2022

Pod gets stuck in an infinite loop for any config change in a 3 node redpanda cluster on K8s (EKS) setup #6648

Closed

RafalKorepta added the area/k8s label Nov 14, 2022

joejulian closed this as completed Oct 25, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

updating the redpanda operator shouldn't restart/upgrade statefulsets for clusters with pinned versions #3150

updating the redpanda operator shouldn't restart/upgrade statefulsets for clusters with pinned versions #3150

flokli commented Dec 3, 2021

rkruze commented Dec 6, 2021

flokli commented Dec 8, 2021 •

edited

Loading

alenkacz commented Dec 9, 2021

flokli commented Dec 14, 2021

alenkacz commented Dec 14, 2021

flokli commented Jan 2, 2022

alenkacz commented Jan 5, 2022

updating the redpanda operator shouldn't restart/upgrade statefulsets for clusters with pinned versions #3150

updating the redpanda operator shouldn't restart/upgrade statefulsets for clusters with pinned versions #3150

Comments

flokli commented Dec 3, 2021

Version & Environment

What went wrong?

What should have happened instead?

How to reproduce the issue?

Additional information

rkruze commented Dec 6, 2021

flokli commented Dec 8, 2021 • edited Loading

alenkacz commented Dec 9, 2021

flokli commented Dec 14, 2021

alenkacz commented Dec 14, 2021

flokli commented Jan 2, 2022

alenkacz commented Jan 5, 2022

flokli commented Dec 8, 2021 •

edited

Loading