Cope when syncing a CRD and resources using the CRD #1825

squaremo · 2019-03-14T10:31:39Z

If you add a CRD as well as resources that use the CRD, in one commit (or in commits since the last sync), fluxd will stop being able to sync.

What I think is happening is that the namespacer will attempt to look up the scope of the custom resources, and since the definition has not yet been created, it will fail. At present this will stop the sync, and fluxd will not be able to make progress.

One solution that suggests itself is to treat a namespace lookup as a failure for the resource in question only, and otherwise proceed. That way, the CRD will still be created, and in the next sync, the lookup will succeed.

2opremio · 2019-03-14T19:17:26Z

What I think is happening is that the namespacer will attempt to look up the scope of the custom resources

One solution that suggests itself is to treat a namespace lookup as a failure for the resource in question only, and otherwise proceed

How about making the namespacer smarter about CRDs and extract the scope from the spec in the CRD definition if the CRD still doesn't exist in the cluster?

2opremio · 2019-03-14T19:19:23Z

Also, in case we don't this already, CustomResourceDefinitions should be created first and deleted last.

primeroz · 2019-03-21T09:45:53Z

we are really blocked by this. Is there any plan at the moment about this issue ?

thanks

errordeveloper · 2019-03-21T10:03:52Z

I think this could be work-around by having a separate flux instance that managed special things, like CRDs and potentially other non-namespaces things, like actual Namespaces and ClusterRoles/ClusterRoleBindings.
Generally, it seems like there is a class of resources that maybe worse prioritising. Just a thought.

errordeveloper · 2019-03-21T10:13:04Z

One solution that suggests itself is to treat a namespace lookup as a failure for the resource in question only, and otherwise proceed. That way, the CRD will still be created, and in the next sync, the lookup will succeed.

I suppose this is kind of "prioritising of special class of resources", albeit it's not an implicit one. So I'm wondering if an implicit priority sync loop would be a feature to consider? User could configure it via annotation.

How about making the namespacer smarter about CRDs and extract the scope from the spec in the CRD definition if the CRD still doesn't exist in the cluster?

Seems also a plausible alternative to explore. It may be beneficial to have regardless of whether we add priority sync loop or whatnot.

squaremo · 2019-03-21T10:32:22Z

How about making the namespacer smarter about CRDs and extract the scope from the spec in the CRD definition if the CRD still doesn't exist in the cluster?

We don't know at that point if the CRD will be accepted by the API server.

in case we don't this already, CustomResourceDefinitions should be created first and deleted last.

https://github.com/weaveworks/flux/blob/master/cluster/kubernetes/sync.go#L374 and ff.

2opremio · 2019-03-21T10:37:43Z

We don't know at that point if the CRD will be accepted by the API server.

But we know what scope it will have if accepted, which is the only way that resources belonging to the CRD are going to be created anyways. In other words (honest question) why is it important to know whether the CRD will be accepted in advance?

primeroz · 2019-03-21T10:47:51Z

I think this could be work-around by having a separate flux instance that managed special things, like CRDs and potentially other non-namespaces things, like actual Namespaces and ClusterRoles/ClusterRoleBindings.
Generally, it seems like there is a class of resources that maybe worse prioritising. Just a thought.

Does the sequence of "git-path" options define an order in which flux will parse an apply manifests ?

I think i'd rather have a sequence of paths inside a single repo / single fluxd instance than 2 separate instances. (or more)

squaremo · 2019-03-21T10:52:33Z

If the CRD doesn't exist, and it doesn't get created, then the custom resources it's based on will also fail to apply.
If it does exist, and the namespacedness is different to that given in the manifest file, then the API server will reject the change; in the meantime, the custom resources will have been assigned (or not assigned) a namespace incorrectly. You would have to make looking for the CRD in a file contingent on failing to find it in the API server first.

So you can make it correct, I think, but it does start to get complicated.

Are there any problems with just omitting resources for which you can't determine the namespacedness? It will take two goes, starting from scratch, for custom resources to be created -- I'd count that as a disadvantage.

squaremo · 2019-03-21T10:53:48Z

Does the sequence of "git-path" options define an order in which flux will parse an apply manifests ?

No; it parses all the files, then sorts them according to static dependencies among kinds of resource (namespaces go before things that have a namespace, for example).

2opremio · 2019-03-21T11:08:22Z

If the CRD doesn't exist, and it doesn't get created, then the custom resources it's based on will also fail to apply.

Not really, since we should use the cluster's definition if the repo doesn't contain one.

If it does exist, and the namespacedness is different to that given in the manifest file, then the API server will reject the change

It shouldn't, since the cluster CRD should be updated according to what's in the repo anyways.

Regardless, I don't see rejecting as a bad thing, since we should use what's in the repo as the source of truth.

Are there any problems with just omitting resources for which you can't determine the namespacedness? It will take two goes, starting from scratch, for custom resources to be created -- I'd count that as a disadvantage.

Yep, that's a disadvantage. Another disadvantage (and for me that's the main one) is that this approach uses what's in the cluster as the source of truth.

In the same way as you suggest above, it could be that the user is changing the scope of the CRD in git and you trust what's in the cluster instead.

In general I am against prioritizing what's in the cluster since, after all, GitOps is about treating what's in git as the source of truth. (The opposite already seems to be the case in some parts of Flux, but let's not get into that now :) ).

squaremo · 2019-03-21T11:46:06Z

If the CRD doesn't exist, and it doesn't get created, then the custom resources it's based on will also fail to apply.

Not really, since we should use the cluster's definition if the repo doesn't contain one.

I mean if the CRD isn't in the cluster, and it fails to be applied, then the custom resources that refer to it will also fail to be applied. (i.e., this does not cause an incorrect state).

If it does exist, and the namespacedness is different to that given in the manifest file, then the API server will reject the change

It shouldn't, since the cluster CRD should be updated according to what's in the repo anyways.

Try it.

Regardless, I don't see rejecting as a bad thing, since we should use what's in the repo as the source of truth.

The rejecting is fine -- what's bad is that we'll now assume the wrong thing about the custom resources, since we'll be looking at a CRD that will fail to be applied, and custom resources that may not fail to be applied -- and may be given the wrong namespace.

Are there any problems with just omitting resources for which you can't determine the namespacedness? It will take two goes, starting from scratch, for custom resources to be created -- I'd count that as a disadvantage.

Yep, that's a disadvantage. Another disadvantage (and for me that's the main one) is that this approach uses what's in the cluster as the source of truth.
In the same way as you suggest above, it could be that the user is changing the scope of the CRD in git and you trust what's in the cluster instead.

We're implicitly using the cluster as the source of truth for API endpoints every time we apply manifests. It looks different with CRDs because they are both resources and definitions. But it's not really -- we still need to consult the cluster API endpoints, the same as for other resources.

When we ask about the scope of API resources, the reason we do it is because we want to know what will happen when we apply the manifests. It's the cluster that knows what will happen, not the files in git.

The specific situation where the user has changed the scope of a CRD and all the resources that refer to it in git is problematic either way: the CRD will fail to apply, then the custom resources will (probably) succeed. Neither solution fixes that; I'm not sure what to do about it other than advise people to avoid it.

In general I am against prioritizing what's in the cluster since, after all, GitOps is about treating what's in git as the source of truth. (The opposite already seems to be the case in some parts of Flux, but let's not get into that now :) ).

In general, so am I. But here I think it is most prudent to check with the API server about what is the case, since that's what is going to be processing the manifests, rather than consulting git about what might be the case.

2opremio · 2019-03-21T14:38:52Z

I mean if the CRD isn't in the cluster, and it fails to be applied, then the custom resources that refer to it will also fail to be applied. (i.e., this does not cause an incorrect state).

I agree. In fact I think that's correct. What's the alternative?

Try it.

Touché. I haven't tried it but, from how sure you are, I guess that applying changes to a CRD doesn't work if resources from the CRD already exist? (Maybe this is another case for apply --force, CC @hiddeco )

When we ask about the scope of API resources, the reason we do it is because we want to know what will happen when we apply the manifests. It's the cluster that knows what will happen, not the files in git.

I agree, but by only asking the cluster, we will only know what will happen in its current state, which IMO should be just one of the inputs to successfully bringing the cluster to the state desired by the user as per the git repo.

That said, let's just be practical. Since this is blocking @primeroz and you have way more experience with Flux than me, I am happy to rest my case if we reach a practical solution which works.

Maybe we are already worrying about CRD scope changes too much but, imagine a single commit which changes the scope of the CRD and a few resources in that CRD at once. From our conversation above, I am not sure your solution (or mine) works. It would be good to come up with a design which solves that too, but maybe we can treat it separately.

squaremo · 2019-03-28T10:50:54Z

I think we've ended up determining that

waiting for the next time around (my mooted solution), and looking in the repo for a CRD if it's not in the API (@2opremio's solution) are morally the same
neither covers the situation in which someone changes the scope of a CRD and the custom resources, in the same commit

ianmiell · 2019-04-01T12:13:58Z

In case it's not obvious to others: downgrading flux from 1.11.0 to 1.10.1 is a valid workaround.

squaremo added bug ☠ high user impact labels Mar 14, 2019

squaremo self-assigned this Mar 21, 2019

2opremio mentioned this issue Mar 28, 2019

flux getting resource not found for API error when the resource kind is a [resource]List #1860

Closed

2opremio mentioned this issue Mar 28, 2019

Obtain scope of CRD instances from its manifest as a fallback #1876

Merged

hiddeco mentioned this issue Apr 1, 2019

Can't apply cert-manager yaml #1881

Closed

2opremio closed this as completed in #1876 Apr 1, 2019

brantb mentioned this issue Apr 15, 2019

Cope when syncing a helm-provided CRD and resources using the CRD #1941

Closed

drewz-aware mentioned this issue May 18, 2021

Cope when syncing a CRD and resources using the CRD fluxcd/flux2#1425

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cope when syncing a CRD and resources using the CRD #1825

Cope when syncing a CRD and resources using the CRD #1825

squaremo commented Mar 14, 2019

2opremio commented Mar 14, 2019 •

edited

Loading

2opremio commented Mar 14, 2019

primeroz commented Mar 21, 2019

errordeveloper commented Mar 21, 2019 •

edited

Loading

errordeveloper commented Mar 21, 2019

squaremo commented Mar 21, 2019

2opremio commented Mar 21, 2019 •

edited

Loading

primeroz commented Mar 21, 2019

squaremo commented Mar 21, 2019

squaremo commented Mar 21, 2019

2opremio commented Mar 21, 2019 •

edited

Loading

squaremo commented Mar 21, 2019

2opremio commented Mar 21, 2019 •

edited

Loading

squaremo commented Mar 28, 2019

ianmiell commented Apr 1, 2019

Cope when syncing a CRD and resources using the CRD #1825

Cope when syncing a CRD and resources using the CRD #1825

Comments

squaremo commented Mar 14, 2019

2opremio commented Mar 14, 2019 • edited Loading

2opremio commented Mar 14, 2019

primeroz commented Mar 21, 2019

errordeveloper commented Mar 21, 2019 • edited Loading

errordeveloper commented Mar 21, 2019

squaremo commented Mar 21, 2019

2opremio commented Mar 21, 2019 • edited Loading

primeroz commented Mar 21, 2019

squaremo commented Mar 21, 2019

squaremo commented Mar 21, 2019

2opremio commented Mar 21, 2019 • edited Loading

squaremo commented Mar 21, 2019

2opremio commented Mar 21, 2019 • edited Loading

squaremo commented Mar 28, 2019

ianmiell commented Apr 1, 2019

2opremio commented Mar 14, 2019 •

edited

Loading

errordeveloper commented Mar 21, 2019 •

edited

Loading

2opremio commented Mar 21, 2019 •

edited

Loading

2opremio commented Mar 21, 2019 •

edited

Loading

2opremio commented Mar 21, 2019 •

edited

Loading