Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

updating the redpanda operator shouldn't restart/upgrade statefulsets for clusters with pinned versions #3150

Closed
flokli opened this issue Dec 3, 2021 · 7 comments
Labels
area/k8s community kind/bug Something isn't working

Comments

@flokli
Copy link
Contributor

flokli commented Dec 3, 2021

Version & Environment

redpanda-operator v21.9.6 upgraded to v21.10.2.

What went wrong?

I upgraded the redpanda-operator (with the helm chart).

My redpanda cluster CRs explicitly set a pinned redpanda version (via spec.version).

After the upgrade of redpanda-operator, I realized the redpanda pods being restarted. Upon further inspection, I realized the pod got restarted, due to the vectorized/configurator image getting bumped to v21.10.2.

What should have happened instead?

I'd assume redpanda pods to not get upgraded on a bump of the operator, if a specific redpanda version is specified. I would have expected vectorized/configurator to stay at v21.9.6, like the main image.

How to reproduce the issue?

  1. Deploy an older version of redpanda-operator
  2. Create a Cluster resource with that version explicitly set
  3. Upgrade redpanda-operator
  4. Observe pods getting restarted

Additional information

Maybe related: #3023

@flokli flokli added the kind/bug Something isn't working label Dec 3, 2021
@rkruze
Copy link
Contributor

rkruze commented Dec 6, 2021

Thank you for this. Updating the operator can cause restarts which we need to document. If a config change occurs for Redpanda it will force a restart of the cluster.

@flokli
Copy link
Contributor Author

flokli commented Dec 8, 2021

Yes, indeed this should be documented.

However, the current behaviour means /every/ upgrade of the operator will cause restarts, as it updates the pod template to use the operator version for vectorized/configurator, even though an explicit redpanda version is specified (via spec.version).

I doubt that's intended behaviour - Is there even a guarantee old redpanda images work with a more recent configurator?

I'd expect the operator to keep vectorized/configurator in the same version as the vectorized/redpanda image…

@alenkacz
Copy link
Contributor

alenkacz commented Dec 9, 2021

@flokli currently this is the case, we can do better job at documenting which version triggers restart and which does not. Can I ask what is the use case for you updating the operator and not updating the redpanda version?

Internally we always try to update both at the same time, which is also what we currently test for - that those two versions are aligned. To prevent multiple restarts we leverage the managed annotation so before you update your operator, you can remove all redpandas from active management by this annotation - then the operator won't touch it. https://github.com/vectorizedio/redpanda/blob/91ae813c79e5754808489a30769b0187efc75e86/src/go/k8s/controllers/redpanda/cluster_controller.go#L100 After you adjust all version you just remove that annotation again.

@flokli
Copy link
Contributor Author

flokli commented Dec 14, 2021

Imagine having a Kubernetes cluster with multiple redpanda clusters in different namespaces, and updating the redpanda version there in a controlled fashion.

As soon as we update the operator, it'll restart all clusters (as it updates the configurator image). I'd expect the operator to not upgrade (parts of) these clusters, but stay at the old configurator, and then individually update them (both configurator and main payload) once I update the spec.version field in the redpanda CR.

@alenkacz
Copy link
Contributor

@flokli yep, I understand that problem, we have the same in our cloud environment. Unfortunately within an operator that's still under active development it's very hard to prevent these, because of the way the reconcile loop works in kubernetes. Other operators have these as well.

The way we do it in cloud is that we:

@flokli
Copy link
Contributor Author

flokli commented Jan 2, 2022

Understood you have the same problem. But what's done by the configurator image, and does it need to be aligned with the version of the operator, and not the desired cluster version?

Simply using spec.Version for the configurator image too should be an easy way to prevent these kind of restarts, with possibly incompatible halfway upgrades.

@alenkacz
Copy link
Contributor

alenkacz commented Jan 5, 2022

If I understand you correctly, you're saying that with configurator tag being here 2848b59#diff-8074ab08efa0940c749c32af395271faf34280c16da67864c5334d6c0f7f1596R64 you always trigger restart of the cluster because you redeploy operator with the new tag specified and it rolls out redpanda because it changed. Using always spec.Version would certainly help with that, you're right.

On the other hand, you have to keep in mind, that updating operator trigger restarts for other reasons as well (as I said, this is due to the fact how reconciliation loop works in kubernetes - if we e.g. change something in statefulset or some other resource, it might trigger restart of all clusters anyway and it's not really easy to prevent that - we'll try to minimize those though). So I think for truly production clusters I would always anyway just because of that chance that it CAN happen always follow the upgrade procedure I outlined above.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/k8s community kind/bug Something isn't working
Projects
None yet
Development

No branches or pull requests

6 participants