Upgrading to v2.0.1 safely #4160
-
Hello, We are currently working on upgrading out fluxcd versions to v2.0.1 from v0.37.0. We have 2 development clusters and this process went smoothly on one of the clusters but on the other cluster, all of our workloads were deleted during the upgrade. We have typically upgraded our clusters using the following command:
We then commit and push these updated manifests to the repo and allow flux to update itself. However, having migrated to GitHub recently we wanted to migrate to the
As mentioned above, this seemed to work without issue on our dev cluster, but after running this process on our pre prod cluster, all resources built by kustomization files in clusters/pre-prod where deleted. Thanks to Flux and GitOps we were able to recover this pretty quickly but this is obviously something that we cannot risk when we upgrade our production cluster. Please can you advise how we should approach this? Thankfully, we do have other non prod clusters to further test approaches on so would appreciate any suggestions. Kind regards |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 3 replies
-
If you still have the commit that deleted everything in pre-prod, it should be pretty trivial to inspect it with
I'd start looking here though. The commands above definitely have potential to delete your whole cluster, this isn't a procedure I would have advised you on in this way. If you had This bit of trivia might explain what was the difference: in a Flux Kustomization, if your When the Flux kustomize controller finds you pointed it at a path and the path is missing, it assumes this is a mistake. It won't just start deleting everything. But if you point it at an empty directory, it doesn't assume it's a mistake. It will go ahead and delete everything. I would strongly advise putting pre-prod back the way it was when you started, then role-playing and running back through exactly what steps you did, until you understand just how this happens (in order to ensure you can avoid the same problem in production!) See this guide also, from the FAQ on safely moving resources: https://fluxcd.io/flux/faq/#how-can-i-safely-move-resources-from-one-dir-to-another In case you know why you need to move those resources around, and you don't want to delve into the reasons why it's needed, it could simply be that "prune" is enabled in your pre-prod environment and it's not enabled in the dev environment. I would recommend having |
Beta Was this translation helpful? Give feedback.
-
Thanks again for the reply. I made sure that the namespace was fully terminated before continuing (we have a script to force termination). And nope, we don't have multiple versions of flux running in the cluster. I am going to look into setting prune to false for our remaining tests. I agree this isn't a long term solution but we will set this back to true once we have upgraded the versions and migrated to using |
Beta Was this translation helpful? Give feedback.
Thanks again for the reply.
I made sure that the namespace was fully terminated before continuing (we have a script to force termination).
And nope, we don't have multiple versions of flux running in the cluster.
I am going to look into setting prune to false for our remaining tests. I agree this isn't a long term solution but we will set this back to true once we have upgraded the versions and migrated to using
flux bootstrap github
going forward.