-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Cope when syncing a CRD and resources using the CRD #1825
Comments
How about making the namespacer smarter about CRDs and extract the scope from the |
Also, in case we don't this already, |
we are really blocked by this. Is there any plan at the moment about this issue ? thanks |
I think this could be work-around by having a separate flux instance that managed special things, like CRDs and potentially other non-namespaces things, like actual Namespaces and ClusterRoles/ClusterRoleBindings. |
I suppose this is kind of "prioritising of special class of resources", albeit it's not an implicit one. So I'm wondering if an implicit priority sync loop would be a feature to consider? User could configure it via annotation.
Seems also a plausible alternative to explore. It may be beneficial to have regardless of whether we add priority sync loop or whatnot. |
We don't know at that point if the CRD will be accepted by the API server.
https://github.com/weaveworks/flux/blob/master/cluster/kubernetes/sync.go#L374 and ff. |
But we know what scope it will have if accepted, which is the only way that resources belonging to the CRD are going to be created anyways. In other words (honest question) why is it important to know whether the CRD will be accepted in advance? |
Does the sequence of "git-path" options define an order in which flux will parse an apply manifests ? I think i'd rather have a sequence of paths inside a single repo / single fluxd instance than 2 separate instances. (or more) |
So you can make it correct, I think, but it does start to get complicated. Are there any problems with just omitting resources for which you can't determine the namespacedness? It will take two goes, starting from scratch, for custom resources to be created -- I'd count that as a disadvantage. |
No; it parses all the files, then sorts them according to static dependencies among kinds of resource (namespaces go before things that have a namespace, for example). |
Not really, since we should use the cluster's definition if the repo doesn't contain one.
It shouldn't, since the cluster CRD should be updated according to what's in the repo anyways. Regardless, I don't see rejecting as a bad thing, since we should use what's in the repo as the source of truth.
Yep, that's a disadvantage. Another disadvantage (and for me that's the main one) is that this approach uses what's in the cluster as the source of truth. In the same way as you suggest above, it could be that the user is changing the scope of the CRD in git and you trust what's in the cluster instead. In general I am against prioritizing what's in the cluster since, after all, GitOps is about treating what's in git as the source of truth. (The opposite already seems to be the case in some parts of Flux, but let's not get into that now :) ). |
I mean if the CRD isn't in the cluster, and it fails to be applied, then the custom resources that refer to it will also fail to be applied. (i.e., this does not cause an incorrect state).
Try it.
The rejecting is fine -- what's bad is that we'll now assume the wrong thing about the custom resources, since we'll be looking at a CRD that will fail to be applied, and custom resources that may not fail to be applied -- and may be given the wrong namespace.
We're implicitly using the cluster as the source of truth for API endpoints every time we apply manifests. It looks different with CRDs because they are both resources and definitions. But it's not really -- we still need to consult the cluster API endpoints, the same as for other resources. When we ask about the scope of API resources, the reason we do it is because we want to know what will happen when we apply the manifests. It's the cluster that knows what will happen, not the files in git. The specific situation where the user has changed the scope of a CRD and all the resources that refer to it in git is problematic either way: the CRD will fail to apply, then the custom resources will (probably) succeed. Neither solution fixes that; I'm not sure what to do about it other than advise people to avoid it.
In general, so am I. But here I think it is most prudent to check with the API server about what is the case, since that's what is going to be processing the manifests, rather than consulting git about what might be the case. |
I agree. In fact I think that's correct. What's the alternative?
Touché. I haven't tried it but, from how sure you are, I guess that applying changes to a CRD doesn't work if resources from the CRD already exist? (Maybe this is another case for
I agree, but by only asking the cluster, we will only know what will happen in its current state, which IMO should be just one of the inputs to successfully bringing the cluster to the state desired by the user as per the git repo. That said, let's just be practical. Since this is blocking @primeroz and you have way more experience with Flux than me, I am happy to rest my case if we reach a practical solution which works. Maybe we are already worrying about CRD scope changes too much but, imagine a single commit which changes the scope of the CRD and a few resources in that CRD at once. From our conversation above, I am not sure your solution (or mine) works. It would be good to come up with a design which solves that too, but maybe we can treat it separately. |
I think we've ended up determining that
|
In case it's not obvious to others: downgrading flux from 1.11.0 to 1.10.1 is a valid workaround. |
If you add a CRD as well as resources that use the CRD, in one commit (or in commits since the last sync), fluxd will stop being able to sync.
What I think is happening is that the namespacer will attempt to look up the scope of the custom resources, and since the definition has not yet been created, it will fail. At present this will stop the sync, and fluxd will not be able to make progress.
One solution that suggests itself is to treat a namespace lookup as a failure for the resource in question only, and otherwise proceed. That way, the CRD will still be created, and in the next sync, the lookup will succeed.
The text was updated successfully, but these errors were encountered: