Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Helm chart] Should CRDs be placed in the special crds/ directory? #3665

Closed
antoninbas opened this issue Apr 19, 2022 · 9 comments · Fixed by #3935
Closed

[Helm chart] Should CRDs be placed in the special crds/ directory? #3665

antoninbas opened this issue Apr 19, 2022 · 9 comments · Fixed by #3935
Assignees

Comments

@antoninbas
Copy link
Contributor

This is related to #2641.

We now have a chart for Antrea, which is currently used to generate the Antrea deployment manifests. In this chart, CRDs are currently treated as regular templates. Helm 3 introduces "special treatment" for CRDs, with the ability to place CRD definitions (as plain YAML, not templated) in a special crds/ directory. When CRDs are defined this way, they will be installed before other resources (in case these other resources include CRs corresponding to these CRDs). CRDs defined this way will also never be deleted (to avoid accidental deletion of user-defined CRs) and will also never be upgraded (in case the chart author didn't ensure that the upgrade was backwards-compatible). The rationale for all of this is described in details in this Helm community document: https://github.com/helm/community/blob/main/hips/hip-0011.md

Some projects using Helm charts as their main installation mechanism (e.g. Istio) are following "best practices" for Helm 3 and placing CRDs into the crds/ directory. This means that when upgrading Istio, CRDs must be upgraded first and separately, by applying a CRDs manifest: https://istio.io/latest/docs/setup/upgrade/helm/. IMO, this defeats a bit the purpose of using Helm, since you have to apply a separate YAML...

cert-manager does not use the crds/ directory at all. Users have 2 options:

  • install & upgrade CRDs manually
  • use the installCRDs=true value when installing / upgrading the Helm chart, which will treat CRDs as regular templated resources

Other projects have a more customized installation mechanism: Cilium CRDs are not installed by Helm, but by Cilium at runtime.

I am having trouble deciding what the "best" approach is here, so I would like some feedback. It seems we have the following options:

  1. keep treating CRDs as regular templated resources: this matches the current manifest-based installation mechanism (CRDs are upgraded when applying a new manifest, CRDs are deleted when deleting the Antrea manifest), but may be surprising for Helm users and lead to unfortunate events when we remove a deprecated API
  2. use the Helm 3 "special treatment" for CRDs and use the crds/ directory (same as Istio): this has caveats as described above, but could be considered the safest approach
  3. use a more custom / hybrid approach like cert-manager

We may want to select the best approach possible upfront and stick to it, as switching approaches across versions could lead to other issues. No matter which approach we choose, it will need to be documented.

The second approach is possibly the most common approach for Helm charts with CRDs and is well-documented by Helm itself: https://helm.sh/docs/chart_best_practices/custom_resource_definitions/#method-1-let-helm-do-it-for-you. However, I am having trouble accepting that this is the "best" approach, given the extra step for upgrading.

@antoninbas
Copy link
Contributor Author

@hangyan @tnqn for opinions

@antoninbas antoninbas changed the title [Helm chart] Should CRDs be placed in the special crds/ directory. [Helm chart] Should CRDs be placed in the special crds/ directory? Apr 19, 2022
@hangyan
Copy link
Member

hangyan commented Apr 20, 2022

Well i think I'm against solution 1, because it's problematic,some other reasons i can recall:

  1. The normal creation step for resources in templates wont wait for the discovery api to be refreshed. This is critical to the crd resource, because we need that. crds/ solution can handle this.
  2. I'm not sure how the current helm' upgrade method works, it's kind of complicated, i think it's tree-way patch (consider new, old, existing resources). Maybe we need to do a few round of test (including compatible and non-compatible crd changes, if there is existing crds) then we can actually confirm this solution works.

solution 2 is not ideal, but it's what the community recommended, and it's the safest option. I don't have strong opinion to follow solution 2, maybe some extra yaml/script is needed, i'm okay with that. solution 2 with extra docs should works too.

I think the benefit comes from using helm to do the template is pretty good already, not sure how many customers will actually prefer helm charts instead single yaml, so i'm okay with don't' spend too much time on the charts release, it's kind of best-effort.

Also, i have found this repo: https://github.com/banzaicloud/crd-updater, it also contains some details explain for this topic and provided a interesting solution, use helm chart dependence.

@tnqn
Copy link
Member

tnqn commented Apr 20, 2022

The information I get from https://helm.sh/docs/chart_best_practices/custom_resource_definitions is that Helm adds special handling for CRDs because some charts have CRDs and CRs at the same time, and separating CRDs from chart can ensure CRDs are always installed first? If I get it correctly, I think we don't have such issue, right? We have some pre-defined CRs (the static Tiers) but we chose to install them at runtime, just like Cilium chose to install CRDs at runtime. If we stick to this strategy, will keeping CRDs as regular templated resources be a problem?

And there is method 2 in the best practice: https://helm.sh/docs/chart_best_practices/custom_resource_definitions/#method-2-separate-charts, which just keep CRDs and CRs in different charts. If we don't even have CRs in chart, does it mean keeping CRDs and all other resources in single chart meets the best practice method 2?

(CRDs are upgraded when applying a new manifest, CRDs are deleted when deleting the Antrea manifest), but may be surprising for Helm users and lead to unfortunate events when we remove a deprecated API

I'm not sure if I get it. When a Helm user deletes a chart, you mean they normally don't want CRDs and CRs installed by the chart to be removed? don't charts using method 2 have same problem?

@antoninbas
Copy link
Contributor Author

@hangyan

The normal creation step for resources in templates wont wait for the discovery api to be refreshed. This is critical to the crd resource, because we need that. crds/ solution can handle this.

Can you clarify this? I don't think it matters when the chart doesn't include CRs in addition top CRDs. Quan gives more details in his comment above.

I think the benefit comes from using helm to do the template is pretty good already

I agree with that. Bu I think we should still try to provide a decent chart.

@tnqn

If we add CRs as part of our chart, then it would be a no brainer and we would go with approach 1. But I think just because we do not have CRs doesn't mean that we should eliminate approach 1 automatically. There are other issues that approach 1 tries to take into account, such as resource deletion (more on that below). If you look at Istio, they still go with approach 1 even though they don't have CRs in their charts either.

I'm not sure if I get it. When a Helm user deletes a chart, you mean they normally don't want CRDs and CRs installed by the chart to be removed? don't charts using method 2 have same problem?

I think the issue they are trying to address is the following: 1) User is running Antrea version X, 2) user upgrades to Antrea version Y but is not aware that an API (CRD, but could also be a specific CRD version which is no longer supported) was removed in version Y, 3) all the CRs created by the user for that API are automatically deleted when Helm deletes the CRD, 4) user freaks out and attempts a rollback with Helm but all his CRs have been lost (since of course re-creating the CRD doesn't magically re-create the lost CRs...)

Basically they are trying to be super conservative and want to protect the user from any possibility of data loss. I think this is more about chart upgrade and less about chart deletion. BTW, an upgrade with Helm will delete stale resources (unlike applying a manifest with kubectl).

And there is method 2 in the best practice:

Yes, I didn't even include this as a possible approach because it seems like an unpractical option to me. BTW that approach seems that it was intended specifically to resolve creation order issues between CRDs and CRs, and possibly dates back to Helm v2.

Maybe the hybrid approach used by cert-manager is not bad for us. They have been using Helm for a long time and spent some time thinking about this issue: helm/helm#7735

@hangyan
Copy link
Member

hangyan commented Apr 21, 2022

If we follow a strict policy to not add related CR to the templates, the CRD discovery api refresh is not needed in that case. Although i still won't recommend to put the crds in the templates dir, as it's a unique resource compare to others. remove crds from the normal helm upgrade phase is a safe approach. I can think of anther corner case will cause data loss: If there was an incompatible crd api change (X->Y), the normal upgrade will failed. But some users may have default settings or choose to use the --force flag to do a force update, the old crd will be deleted before the new is created, thus cr data is damaged.

I think both the hybird approach or the singlecrds dir approach are good, . The end users should already be familiar with them , And i still tend to choose the single crds solution with clear upgrade instruction in extra document, as there is no perfect choice for now, and i would like to follow the helm's community's roadmap and design.

@tnqn
Copy link
Member

tnqn commented Apr 21, 2022

If there was an incompatible crd api change (X->Y), the normal upgrade will failed.

@hangyan could you explain this more? You mean helm will check whether CRD has incompatible change and fail the upgrade?

I think we are following the recommended API practice and have never made incompatible change to any API. If there is an incompatible change, we should have a new version, and would there be a problem with helm?

Is the case that CRD is removed accidently because of an incompatible change something we need to worry? It's easy to prevent by adding a upgrade test to CI, right?

When installCRDs is false in cert-manager, how does it make old CRDs not removed if they don't put them in crds dir? Or it requires consistent install/upgrade approach so manually installed CRDs will never be managed by helm?

@antoninbas
Copy link
Contributor Author

When installCRDs is false in cert-manager, how does it make old CRDs not removed if they don't put them in crds dir? Or it requires consistent install/upgrade approach so manually installed CRDs will never be managed by helm?

The second one. Users needs to run kubectl apply first to apply a YAML manifest with CRDs only, then they can run helm upgrade. That means CRDs (and therefore CRs) won't be deleted during upgrade.

@tnqn do you know what happens is a CRD manifest is updated to remove a version (i.e. the version is removed from the spec.versions list) while there are still some CRs with that version stored in the apiserver. Are these deleted automatically? I guess we don't really do this in Antrea, instead we set served and storage to false for the version, but we keep it in the CRD.

@antoninbas
Copy link
Contributor Author

For the sake of making progress on this, I have decided to go with option 2 (use the crds/ directory, with an approach similar to the Istio project). In the end, I based the decision on the following factors:

  • helm install is still straightforward (single-step process), as CRDs are installed from the chart.
  • helm upgrade requires an extra step (manual CRD upgrade from standalone YAML manifest), but an upgrade should never be a trivial process anyway (API changes, etc.) and this is the safest approach. This will be documented carefully of course.
  • most Helm users should be familiar with this pattern and what it implies.

@antoninbas
Copy link
Contributor Author

I wanted to add that I also considered sticking with option 1 and adding the "helm.sh/resource-policy": keep annotation to all CRDs. This would have ensured that CRDs were never deleted during upgrade. However:

  • A CRD update can also be dangerous (e.g. removing a version from a CRD)
  • It may be better to follow the standard Helm pattern of using crds/

antoninbas added a commit to antoninbas/antrea that referenced this issue Jun 23, 2022
For each Antrea release, we generate the correct Helm chart archive and
upload it as a release asset. The appropriate index.yaml file (Helm repo
index) will be updated appropriately (in the antrea-io/website
repository) and will be accessible through charts.antrea.io and / or
antrea.io/charts.

For antrea-io#2641

We are also moving all CRD resources from the generic templates/
directory to the special crds/ directory, in the Chart definition. This
will cause Helm to treat CRDs as special resources, and in particular
they will never be upgraded / deleted automatically by Helm. In
particular, for upgrade, users will need to apply a separate YAML (which
is being added to release assets) including all the CRD resource
definitions. This represents an extra step, but ensures that users are
aware that CRDs are being upgraded which may require some actions from
them (e.g., migrate to a new version) and needs to be done with caution.

Fixes antrea-io#3665

Signed-off-by: Antonin Bas <abas@vmware.com>
antoninbas added a commit to antoninbas/antrea that referenced this issue Jun 24, 2022
For each Antrea release, we generate the correct Helm chart archive and
upload it as a release asset. The appropriate index.yaml file (Helm repo
index) will be updated appropriately (in the antrea-io/website
repository) and will be accessible through charts.antrea.io and / or
antrea.io/charts.

For antrea-io#2641

We are also moving all CRD resources from the generic templates/
directory to the special crds/ directory, in the Chart definition. This
will cause Helm to treat CRDs as special resources, and in particular
they will never be upgraded / deleted automatically by Helm. In
particular, for upgrade, users will need to apply a separate YAML (which
is being added to release assets) including all the CRD resource
definitions. This represents an extra step, but ensures that users are
aware that CRDs are being upgraded which may require some actions from
them (e.g., migrate to a new version) and needs to be done with caution.

Fixes antrea-io#3665

Signed-off-by: Antonin Bas <abas@vmware.com>
antoninbas added a commit that referenced this issue Jun 27, 2022
* Release Antrea Helm chart archive for each Antrea release

For each Antrea release, we generate the correct Helm chart archive and
upload it as a release asset. The appropriate index.yaml file (Helm repo
index) will be updated appropriately (in the antrea-io/website
repository) and will be accessible through charts.antrea.io and / or
antrea.io/charts.

For #2641

We are also moving all CRD resources from the generic templates/
directory to the special crds/ directory, in the Chart definition. This
will cause Helm to treat CRDs as special resources, and in particular
they will never be upgraded / deleted automatically by Helm. In
particular, for upgrade, users will need to apply a separate YAML (which
is being added to release assets) including all the CRD resource
definitions. This represents an extra step, but ensures that users are
aware that CRDs are being upgraded which may require some actions from
them (e.g., migrate to a new version) and needs to be done with caution.

Fixes #3665

Signed-off-by: Antonin Bas <abas@vmware.com>

* Address review comments

Signed-off-by: Antonin Bas <abas@vmware.com>

* Fix build/charts/antrea/README.md

Signed-off-by: Antonin Bas <abas@vmware.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants