Skip to content
This repository has been archived by the owner on Oct 16, 2023. It is now read-only.

Release v0.9.0 #66

Merged
merged 15 commits into from
Oct 13, 2021
Merged

Release v0.9.0 #66

merged 15 commits into from
Oct 13, 2021

Conversation

rainest
Copy link
Contributor

@rainest rainest commented Oct 8, 2021

0.9.0 updates the chart to 2.4.0. This includes the KIC 2.0 release and CRD/admission/etc. API version updates for Kubernetes 1.22 compatibility.

It updates the operator-specific Kong CRD to apiextensions.k8s.io/v1. It still lacks a structural schema (it'd need one for the entirety of values.yaml, which would be quite an undertaking), so the entire spec has x-kubernetes-preserve-unknown-fields: true. I used the same strategy as we used for the KIC CRDs to upgrade it: I uploaded it to a 1.21 cluster, let Kubernetes auto-upgrade it, pulled down the upgraded version, and stripped instance-specific metadata. AFAIK we've never generated this automatically from anything beyond the initial stub version created by operator-sdk. We did use an older version of the SDK to create it, though a version created with something more recent isn't hugely different (mostly it just breaks out some metadata in the schema--the main spec part remains empty and just preserves unknown fields).

This remains a draft because the Kong CRD is still not fully 1.22 compatible. We originally used charts.helm.k8s.io for the CRD group, and the k8s.io suffix now has special requirements. Attempting to apply it as-is will result in this error:

$ kubectl create -f /tmp/crd.yaml 
The CustomResourceDefinition "kongs.charts.helm.k8s.io" is invalid: 
* metadata.annotations[api-approved.kubernetes.io]: Required value: protected groups must have approval annotation "api-approved.kubernetes.io", see https://github.com/kubernetes/enhancements/pull/1111

We presumably have no need for an approved API hostname, so it makes sense to move that elsewhere. There isn't a whole lot of guidance about how to choose a group (Operator SDK docs and Kubernetes docs just use example.com without additional guidance, another operator I reviewed uses the operatorhub.io domain, so I guess we probably use either charts.configuration.konghq.com or charts.konghq.com.

Changing the group does mean that prior version CRs would not automatically upgrade. Not sure if we want to automate this or just configure the metadata and write changelog instructions to indicate that you must manually create a new release using the updated CR and copy in your spec.

Lastly, similar to #64, we lack automation to push the image, and will need to build and push that manually.

generator command: 'kong-2.4.0'

generator command version: 2b9dc2a
@rainest
Copy link
Contributor Author

rainest commented Oct 8, 2021

MicroK8S CI also seems to be broken at the cluster setup step; need to figure out what's broken there.

Copy link
Contributor

@mflendrich mflendrich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So it seems to me that we have 2 ways to proceed:

  • release the chart that's compatible with 1.22 BUT requires that the user reapplies all their Kong CRs using the new API group (this is an important breaking change and, I'd say, therefore requires a major version bump),
  • try figure out if we can achieve 1.22 compatibility in a way that allows the user to upgrade transparently (which I'm not particularly optimistic about).

Unless we achieve breakthrough in the latter way, we'll probably need to go for the former (and I strongly prefer that we bump the major version in that case).

Travis Raines added 4 commits October 11, 2021 14:31
Release 0.9.0 and change the kongs CRD API group from charts.helm.k8s.io
to charts.konghq.com.

The new group is not in one of the protected groups established by
kubernetes/enhancements#1111. This operator CRD
should not use a protected group as it is not a core part of the
Kubernetes project.

This change makes the CRD compatible with Kubernetes >=1.22. However, it
breaks compatibility with previous versions of the operator. As such,
0.9.0 has no replace version: it requires a fresh operator install and a
fresh set of Kong CRs.
@rainest
Copy link
Contributor Author

rainest commented Oct 11, 2021

The first option (new CRD) appears to be what others have done (see #65 (comment)). I'm still unsure on the version since a 1.x release would sorta imply more production readiness, and we've technically never moved this out of the alpha phase (though in practice it's a wrapper around the chart, and we do consider the chart production-ready).

The changelog has a draft of a migration strategy. I need to test that this actually works.

Tests are now failing because Ingresses aren't getting status updates, which I can replicate locally. Not sure why, since there are no controller errors. Maybe there's a behavior change where we previously set ClusterIP proxy Service IPs as Ingress load balancer status IPs, but don't any longer in 2.0?

Edit:

Apparently sort of. On 2.x:

status:
  loadBalancer: {}

On 1.x:

status:
  loadBalancer:
    ingress:
    - {}

The latter passes the regex check because the ingress object is set, albeit to nothing. Not really sure which is more correct--neither really indicates anything of interest.

Travis Raines added 2 commits October 12, 2021 13:31
Remove the Ingress status waits and add retry configuration to curl when
validating the Ingress configuration.

KIC 2.0+ handles status updates for non-LoadBalancer Services
differently than earlier versions. Previously, KIC would set a status
with a 0-length list of ingresses if the proxy Service was not type
Loadbalancer, e.g.

status:
  loadBalancer:
    ingress:
    - {}

As of KIC 2.0, no status is set if the Service is not type LoadBalancer,
e.g.

status:
  loadBalancer: {}

This change to the operator tests confirms that Ingress configuration
was successfully applied to the proxy using requests through the proxy
only. These now run immediately after the upstream Deployment becomes
available, however, so the curl 200 checks now retry to account for lag
before the controller ingests that configuration. --retry-all-errors
retries on either 404s (Ingress not yet ingested) or 503s (Endpoints not
yet added).

The 404 checks are unchanged, since they should succeed immediately once
the 200 checks do.
@rainest
Copy link
Contributor Author

rainest commented Oct 12, 2021

Reworking the test to use curl retries did not work. We'd need --retry-all-errors to make retry take action on 404s.

https://curl.se/docs/manpage.html#--retry-all-errors is not available on the version of curl in Ubuntu 20.04:
https://packages.ubuntu.com/search?keywords=curl

Would probably need to either use https://github.com/marketplace/actions/retry-step, rework how 2.x does no-IP status to match the 1.x behavior, or rework the curl tests to work with wait_for.

Travis Raines added 3 commits October 13, 2021 10:21
Remove the Ingress status waits and add retry configuration to curl when
validating the Ingress configuration.

KIC 2.0+ handles status updates for non-LoadBalancer Services
differently than earlier versions. Previously, KIC would set a status
with a 0-length list of ingresses if the proxy Service was not type
Loadbalancer, e.g.

status:
  loadBalancer:
    ingress:
    - {}

As of KIC 2.0, no status is set if the Service is not type LoadBalancer,
e.g.

status:
  loadBalancer: {}

This change to the operator tests confirms that Ingress configuration
was successfully applied to the proxy using requests through the proxy
only. These now run immediately after the upstream Deployment becomes
available, however, so they may run before the controller has ingested
Ingress configuration or observed Endpoint updates. To account for this,
the curl checks are now wrapped in wait_for to allow a reasonable amount
of time for the controller to update configuration.
@shaneutt
Copy link
Contributor

quasi-related PR for something I found during testing, we should include it in the release: #68

Copy link
Contributor

@mflendrich mflendrich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I retract my strong preference for a major version bump - after further consideration, this breaking change (of changing the apigroup) is OK to make between 0.x, and is probably a better tradeoff.

Let's include #68 and go with v0.9.0.

olm/0.9.0/kong.v0.9.0.clusterserviceversion.yaml Outdated Show resolved Hide resolved
olm/0.9.0/kong.v0.9.0.clusterserviceversion.yaml Outdated Show resolved Hide resolved
@rainest rainest marked this pull request as ready for review October 13, 2021 21:07
@rainest rainest requested a review from a team as a code owner October 13, 2021 21:07
@rainest
Copy link
Contributor Author

rainest commented Oct 13, 2021

Manual testing of the upgrade steps is documented below:

log.txt
opertest.tar.gz

I wasn't able to test a complete OLM install of 0.9.0 with the SDK version we use, so that's simulated to confirm that the resulting Kong CRs result in what we expect. That shouldn't be an issue since the image has no problem reading the current CRD API (else it would presumably be unable to act on the CRs) and because we don't upload any OLM content other than the YAML CSV and CRD--anything OLM machinery parsing is presumably handled by the actual OLM install, i.e. OperatorHub.io in the real world, and it is up to date.

I briefly looked at migrating to 1.0, but it's involved enough that it'd take a while to verify and would make the diff look like garbage, so in the interest of time and PR clarity, I think we should defer that.

@rainest rainest merged commit f4b6ba6 into main Oct 13, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants