-
Notifications
You must be signed in to change notification settings - Fork 27
Conversation
generator command: 'kong-2.4.0' generator command version: 2b9dc2a
MicroK8S CI also seems to be broken at the cluster setup step; need to figure out what's broken there. |
eb34651
to
0eabaeb
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So it seems to me that we have 2 ways to proceed:
- release the chart that's compatible with 1.22 BUT requires that the user reapplies all their
Kong
CRs using the new API group (this is an important breaking change and, I'd say, therefore requires a major version bump), - try figure out if we can achieve 1.22 compatibility in a way that allows the user to upgrade transparently (which I'm not particularly optimistic about).
Unless we achieve breakthrough in the latter way, we'll probably need to go for the former (and I strongly prefer that we bump the major version in that case).
Release 0.9.0 and change the kongs CRD API group from charts.helm.k8s.io to charts.konghq.com. The new group is not in one of the protected groups established by kubernetes/enhancements#1111. This operator CRD should not use a protected group as it is not a core part of the Kubernetes project. This change makes the CRD compatible with Kubernetes >=1.22. However, it breaks compatibility with previous versions of the operator. As such, 0.9.0 has no replace version: it requires a fresh operator install and a fresh set of Kong CRs.
bc1d30d
to
a8d515a
Compare
The first option (new CRD) appears to be what others have done (see #65 (comment)). I'm still unsure on the version since a 1.x release would sorta imply more production readiness, and we've technically never moved this out of the alpha phase (though in practice it's a wrapper around the chart, and we do consider the chart production-ready). The changelog has a draft of a migration strategy. I need to test that this actually works. Tests are now failing because Ingresses aren't getting status updates, which I can replicate locally. Not sure why, since there are no controller errors. Maybe there's a behavior change where we previously set ClusterIP proxy Service IPs as Ingress load balancer status IPs, but don't any longer in 2.0? Edit: Apparently sort of. On 2.x:
On 1.x:
The latter passes the regex check because the |
Remove the Ingress status waits and add retry configuration to curl when validating the Ingress configuration. KIC 2.0+ handles status updates for non-LoadBalancer Services differently than earlier versions. Previously, KIC would set a status with a 0-length list of ingresses if the proxy Service was not type Loadbalancer, e.g. status: loadBalancer: ingress: - {} As of KIC 2.0, no status is set if the Service is not type LoadBalancer, e.g. status: loadBalancer: {} This change to the operator tests confirms that Ingress configuration was successfully applied to the proxy using requests through the proxy only. These now run immediately after the upstream Deployment becomes available, however, so the curl 200 checks now retry to account for lag before the controller ingests that configuration. --retry-all-errors retries on either 404s (Ingress not yet ingested) or 503s (Endpoints not yet added). The 404 checks are unchanged, since they should succeed immediately once the 200 checks do.
This reverts commit 8a0d10f.
Reworking the test to use curl retries did not work. We'd need https://curl.se/docs/manpage.html#--retry-all-errors is not available on the version of curl in Ubuntu 20.04: Would probably need to either use https://github.com/marketplace/actions/retry-step, rework how 2.x does no-IP status to match the 1.x behavior, or rework the curl tests to work with wait_for. |
Remove the Ingress status waits and add retry configuration to curl when validating the Ingress configuration. KIC 2.0+ handles status updates for non-LoadBalancer Services differently than earlier versions. Previously, KIC would set a status with a 0-length list of ingresses if the proxy Service was not type Loadbalancer, e.g. status: loadBalancer: ingress: - {} As of KIC 2.0, no status is set if the Service is not type LoadBalancer, e.g. status: loadBalancer: {} This change to the operator tests confirms that Ingress configuration was successfully applied to the proxy using requests through the proxy only. These now run immediately after the upstream Deployment becomes available, however, so they may run before the controller has ingested Ingress configuration or observed Endpoint updates. To account for this, the curl checks are now wrapped in wait_for to allow a reasonable amount of time for the controller to update configuration.
quasi-related PR for something I found during testing, we should include it in the release: #68 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I retract my strong preference for a major version bump - after further consideration, this breaking change (of changing the apigroup) is OK to make between 0.x, and is probably a better tradeoff.
Let's include #68 and go with v0.9.0.
Co-authored-by: Michał Flendrich <michal@flendrich.pro>
Manual testing of the upgrade steps is documented below: I wasn't able to test a complete OLM install of 0.9.0 with the SDK version we use, so that's simulated to confirm that the resulting Kong CRs result in what we expect. That shouldn't be an issue since the image has no problem reading the current CRD API (else it would presumably be unable to act on the CRs) and because we don't upload any OLM content other than the YAML CSV and CRD--anything OLM machinery parsing is presumably handled by the actual OLM install, i.e. OperatorHub.io in the real world, and it is up to date. I briefly looked at migrating to 1.0, but it's involved enough that it'd take a while to verify and would make the diff look like garbage, so in the interest of time and PR clarity, I think we should defer that. |
0.9.0 updates the chart to 2.4.0. This includes the KIC 2.0 release and CRD/admission/etc. API version updates for Kubernetes 1.22 compatibility.
It updates the operator-specific Kong CRD to apiextensions.k8s.io/v1. It still lacks a structural schema (it'd need one for the entirety of values.yaml, which would be quite an undertaking), so the entire spec has
x-kubernetes-preserve-unknown-fields: true
. I used the same strategy as we used for the KIC CRDs to upgrade it: I uploaded it to a 1.21 cluster, let Kubernetes auto-upgrade it, pulled down the upgraded version, and stripped instance-specific metadata. AFAIK we've never generated this automatically from anything beyond the initial stub version created by operator-sdk. We did use an older version of the SDK to create it, though a version created with something more recent isn't hugely different (mostly it just breaks out some metadata in the schema--the main spec part remains empty and just preserves unknown fields).This remains a draft because the Kong CRD is still not fully 1.22 compatible. We originally used
charts.helm.k8s.io
for the CRD group, and thek8s.io
suffix now has special requirements. Attempting to apply it as-is will result in this error:We presumably have no need for an approved API hostname, so it makes sense to move that elsewhere. There isn't a whole lot of guidance about how to choose a group (Operator SDK docs and Kubernetes docs just use
example.com
without additional guidance, another operator I reviewed uses the operatorhub.io domain, so I guess we probably use eithercharts.configuration.konghq.com
orcharts.konghq.com
.Changing the group does mean that prior version CRs would not automatically upgrade. Not sure if we want to automate this or just configure the metadata and write changelog instructions to indicate that you must manually create a new release using the updated CR and copy in your spec.
Lastly, similar to #64, we lack automation to push the image, and will need to build and push that manually.