Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

prune should respect dependsOn in a reverse order #301

Open
paschdan opened this issue Mar 22, 2021 · 14 comments
Open

prune should respect dependsOn in a reverse order #301

paschdan opened this issue Mar 22, 2021 · 14 comments

Comments

@paschdan
Copy link

When 2 kustomizations exists, where one dependsOn the other, then the pruning should happen in reversed ordering:

Usecase:

kustomization operator contains CRD and an operator for this CRD
kustomization usage contains a CR from the CRD, where a finalizer from the operator is existing

# operator.kustomization.yaml
apiVersion: kustomize.toolkit.fluxcd.io/v1beta1
kind: Kustomization
metadata:
  name: operator
  namespace: flux-system
spec:
  interval: 10m0s
  sourceRef:
    kind: GitRepository
    name: flux-system
  path: ./apps/operator
  prune: true
  validation: client
# usage.kustomization.yaml
apiVersion: kustomize.toolkit.fluxcd.io/v1beta1
kind: Kustomization
metadata:
  name: usage
  namespace: flux-system
spec:
  interval: 10m0s
  sourceRef:
    kind: GitRepository
    name: flux-system
  path: ./apps/usage
  dependsOn:
    - operator 
  prune: true
  validation: client

When then a commit is existing, where usage.kustomization.yaml and operator.kustomization.yaml is deleted, the garbage collection should respect the dependsOn in a reverse order:

first the usage.kustomization.yaml should be deleted (with e.g. a namespace, which when pruned deletes all custom resources, and thus as the operator.kustomization.yaml is not deleted at this point in time finalizers can be applied)
and then it can delete the operator.kustomization.yaml, as no depending kustomization is there anymore.

currently the operator.kustomization.yaml and the usage.kustomization.yaml will be deleted, which may result in not finishable finalizers on resources in usage.kustomization.yaml because the operator may be gone.

@stealthybox
Copy link
Member

Succinctly, I think the logic here is:
On delete, in the Kustomization finalizer, check the in-controller dependency graph to see if I have any children.
If so, block/requeue indefinitely until there are no children + maybe add a status condition showing that we are waiting for dependent children to be deleted.

This specifically needs to be done in the finalizer for the Kustomization that's being dependent on.

@stefanprodan
Copy link
Member

stefanprodan commented Mar 26, 2021

@stealthybox we could update the Ready condition to make the deadlocks visible to users as the ready condition is showed on flux/kubectl get.

@stealthybox
Copy link
Member

stealthybox commented Mar 27, 2021

Since this Finalizer behavior depends on the lifecycle of other objects, it could cause a state where an administrator or parent team's repository becomes undesirably unreconcilable, simply because of another Team's repo.

This would be improved with the fluxcd/flux2#582 proposal however which would provide an RBAC solution to revoking access to dependsOn!!

We should consider what the workflow or options could be for a team to forcefully indicate that their Kustomization should be deleted regardless of the consequences on dependent Kustomizations.

Perhaps we don't want to support that.


Option B

When a Kustomization contains a CRD, make the Finalizer block until there are no more dependent CR's in the entire cluster.

This seems reasonable but might be hard to query for across all Namespaces. Maybe there's a performant mechanism?

Then a simple Boolean could opt-out, specifically for CRD's

This doesn't cover other types of dependencies.


Technically, it's possible (but complicated) for both of these behaviors to be implemented.

@stealthybox
Copy link
Member

Regarding general Kubernetes objects:
It's possible to describe objects that are dependent on each other in Kubernetes atomically, because reconcile loops will eventually converge as dependencies are realized.

However, it is very difficult to atomically delete these resources, because deleting in the wrong order can cause issues.

fluxcd/helm-controller#270 explores some strategies to manage these needs using annotations, Finalizers, and Admission Control.

I suspect that this is a limitation of Kubernetes API design, since dependencies are not required information.
Inferring every kind of dependency sounds intractable.
We will likely need mechanisms for users to annotate objects and opt into this behavior if they want atomic deletes that respect some sort of order.

For Kustomizations and HelmReleases, we have dependsOn which is quality information.
There may be some heuristic things we can do (such as GVK-delete-order or simple lookups) for common cases regarding sub-resources.

Perhaps there is an already existing community component that can enforce ordered deletes between arbitrary in a generic, non-flux specific way.
If we need build a solution to this, it could become independently valuable.

@stealthybox
Copy link
Member

Client-side CLI removal using the user's privileged credentials could help users reconcile undesired removal situations.

@artem-nefedov
Copy link

This caused problems for me as well. Had to write custom logic in wrapper script to handle deletion correctly (which kinda contradicts gitops deployment principle).

@stefanprodan
Copy link
Member

This caused problems for me as well. Had to write custom logic in wrapper script to handle deletion correctly (which kinda contradicts gitops deployment principle).

You can do this via Git, delete the dependants first and commit the changes, Flux will finalise them accordingly. Then delete the top Kustomization and commit it.

@artem-nefedov
Copy link

@stefanprodan My case is: We do blue/green k8s cluster upgrades. Instead of upgrading k8s cluster in-place, we bring up second cluster with the same apps deployed, switch traffic to it, then delete the original cluster, thus ensuring safe upgrades with zero downtime and ability to instantly roll back or abort upgrade in case of problems. GitOps controller (flux) provides a way to ensure that new clusters has the same apps (both clusters monitor the same git repository), so deleting by git commits is not possible.

What actually causes problem: We install Istio operator controller with one Kustomization, then install IstioOperator resource with another Kustomization that depends on first one. Creation works fine, but deletion ignores the order and deletes operator controller before Istio control plane/ingress is deleted, which causes deletion of Istio to hand indefinitely. And we cannot just ignore Istio and delete the cluster anyways, because Istio creates a service of LoadBalancer type, which creates an actual load balancer, which will be orphaned if not deleted in cluster correctly.

Current workaround: patch entrypoint Kustomization in old cluster to exclude IstioOperator resource Kustomization, wait till Istio operator controller actually deletes Istio, then delete entrypoint Kustomization.

@artem-nefedov
Copy link

Are there any plans to get this worked on?
Weave tf-controller implements dependency handling in a very similar fashion, and it does in fact handles deletion in the correct order, which makes me think it should be possible to do in kustomize-controller too.

@sgerace
Copy link

sgerace commented Oct 6, 2023

Has anyone developed any workarounds for this behavior? We're running into the same issue when trying to cleanup resources deployed within a vcluster. vcluster creates a secret in the cluster that is used in subsequent HelmReleases to deploy things into the vcluster, similar to the following:

apiVersion: helm.toolkit.fluxcd.io/v2beta1
kind: HelmRelease
metadata:
  name: example-release
  namespace: example-namespace
spec:
  interval: 5m0s
  dependsOn:
    - name: vcluster
  targetNamespace: default
  storageNamespace: default
  kubeConfig:
    secretRef:
      name: vc-vcluster
      key: config
  chart:
    spec:
      ...

Notice the kubeConfig.secretRef that references the secret created by vcluster. Creation works just fine, the issue is that when the flux configuration is destroyed, it seems that the vs-vcluster secret is often destroyed before the example-release is destroyed. Once the secret is gone, the reconciler is unable to connect to the vcluster (because it no longer exists) and thus any attempt to determine if example-release exists will fail. This places the example-release HelmRelease in an unrecoverable state and requires someone to manually go into the HelmRelease and remove the finalizer added by flux.

It is worth noting that in our case, first removing all HelmReleases except for the vcluster and then removing the flux configuration (as proposed by @stefanprodan) is really inconvenient due to the amount of automation that we in place, so we'd prefer to have the resources cleaned up properly by simply deleting the flux configuration.

@artem-nefedov
Copy link

@sgerace We made a workaround, but it's not really suitable for general case. In our case, Kustomizations are installed from HelmRelease (for reasons unrelated to this issue), and we made a custom pre-delete helm hook to handle deletion in proper order.

@sgerace
Copy link

sgerace commented Oct 6, 2023

@artem-nefedov, that's interesting, I'm wondering if you'd be willing to share any of the details around the pre-delete helm hook that you've developed? That's the direction we've started investigating, so any information you might be able to provide would help us come up with a suitable workaround for our situation.

@artem-nefedov
Copy link

artem-nefedov commented Oct 6, 2023

@sgerace We have multiple Kustomizations installed from a single helm chart via HelmRelease. The chart has a pre-delete hook Job (meaning it runs before any actual object is deleted by helm). The logic inside the Job is pretty simple: it deletes some Kustomizations that should be deleted first based on label selector, then sleeps for some time and exits. After that, remaining Kustomizations are deleted by helm itself.

Unrelated to the topic, but also worth mentioning that we also have post-install and post-upgrade hooks that wait for successful reconciliation of Kustomizations, because helm-controller can't perform health checks on custom resources (even ones that are part of flux suite). This can be circumvented if you instead define healthChecks in Kustomization that creates HelmRelease.

@beasteers
Copy link

beasteers commented Apr 1, 2024

I wonder if a temporary workaround would be to support a manual depends-on annotation for child resources and the reconciler could just do a quick topological sort before deleting. Even an (opt-in?) topological sort at the level of kustomizations could be enough to fix this bug. At least assuming the garbage collector could accumulate all of the resources across kustomizations before pruning.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants