diff --git a/enhancements/machine-config/manage-boot-images.md b/enhancements/machine-config/manage-boot-images.md index 9017f56bc8e..c83bc0193ee 100644 --- a/enhancements/machine-config/manage-boot-images.md +++ b/enhancements/machine-config/manage-boot-images.md @@ -13,7 +13,7 @@ approvers: api-approvers: - "@joelspeed" creation-date: 2023-10-16 -last-updated: 2022-12-11 +last-updated: 2024-01-23 tracking-link: - https://issues.redhat.com/browse/MCO-589 see-also: @@ -31,7 +31,7 @@ This is a proposal to manage bootimages via the `Machine Config Operator`(MCO), For `MachineSet` managed clusters, the end goal is to create an automated mechanism that can: - update the boot images references in `MachineSets` to the latest in the payload image -- ensure stub ignition referenced in each `Machinesets` is in spec 3 format +- ensure stub Ignition config referenced in each `Machinesets` is in spec 3 format For clusters that are not managed by `MachineSets`, the end goal is to create a document(KB or otherwise) that a cluster admin would follow to update their boot images. @@ -43,30 +43,30 @@ Currently, bootimage references are [stored](https://github.com/openshift/instal - podman [[1](https://issues.redhat.com/browse/OCPBUGS-9969)] - skopeo [[1](https://issues.redhat.com/browse/OCPBUGS-3621)] -Additionally, the stub secret [referenced](https://github.com/openshift/installer/blob/1ca0848f0f8b2ca9758493afa26bf43ebcd70410/pkg/asset/machines/gcp/machines.go#L197) in the `MachineSet` is also not managed. This stub is used by the ignition binary in firstboot to auth and consume content from the `machine-config-server`(MCS). The content served includes the actual ignition configuration and the target OCI format RHCOS image. The ignition binary now does first boot provisioning based on this, then hands off to the `machine-config-daemon`(MCD) first boot service to do the reboot into the target OCI format RHCOS image. +Additionally, the stub Ignition config [referenced](https://github.com/openshift/installer/blob/1ca0848f0f8b2ca9758493afa26bf43ebcd70410/pkg/asset/machines/gcp/machines.go#L197) in the `MachineSet` is also not managed. This stub is used by the ignition binary in firstboot to auth and consume content from the `machine-config-server`(MCS). The content served includes the actual Ignition configuration and the target OCI format RHCOS image. The ignition binary now does first boot provisioning based on this, then hands off to the `machine-config-daemon`(MCD) first boot service to do the reboot into the target OCI format RHCOS image. -There has been [a previous effort](https://github.com/openshift/machine-config-operator/pull/1792) to manage the stub secret. It was [reverted](https://github.com/openshift/machine-config-operator/pull/2126) and then [brought back](https://github.com/openshift/machine-config-operator/pull/2827#issuecomment-996156872) just for bare metal clusters. For other platforms, the `*-managed` stub secrets still get generated by the MCO, but are not injected into the `MachineSet`. The proposal plans to utilize these unused `*-managed` stub secrets, but it is important to note that this stub secret is generated(and synced) by the MCO and will ignore/override any user customizations to the stub secret. This limitation will be mentioned in the documentation, and a later release will provide support for user customization of the stub secret, either via API or a workaround thorugh additional documentation. This should not be an issue for the majority of users as they very rarely customize the stub secret. +There has been [a previous effort](https://github.com/openshift/machine-config-operator/pull/1792) to manage the stub Ignition config. It was [reverted](https://github.com/openshift/machine-config-operator/pull/2126) and then [brought back](https://github.com/openshift/machine-config-operator/pull/2827#issuecomment-996156872) just for bare metal clusters. For other platforms, the `*-managed` stubs still get generated by the MCO, but are not injected into the `MachineSet`. The proposal plans to utilize these unused `*-managed` stubs, but it is important to note that this stub is generated(and synced) by the MCO and will ignore/override any user customizations to the original stub Ignition config. This limitation will be mentioned in the documentation, and a later release will provide support for user customization of the stub, either via API or a workaround thorugh additional documentation. This should not be an issue for the majority of users as they very rarely customize the original stub Ignition config. -In certain long lived clusters, the MCS TLS cert contained within the above ignition configuration may be out of date. Example issue [here](https://issues.redhat.com/browse/OCPBUGS-1817). While this has been partly solved [MCO-642](https://issues.redhat.com/browse/MCO-642) (which allows the user to manually rotate the cert) it would be very beneficial for the MCO to actively manage this TLS cert and take this concern away from the user. +In certain long lived clusters, the MCS TLS cert contained within the above Ignition configuration may be out of date. Example issue [here](https://issues.redhat.com/browse/OCPBUGS-1817). While this has been partly solved [MCO-642](https://issues.redhat.com/browse/MCO-642) (which allows the user to manually rotate the cert) it would be very beneficial for the MCO to actively manage this TLS cert and take this concern away from the user. ### User Stories -* As an Openshift engineer, having nodes boot up on an unsupported OCP version is a security liability. By having nodes directly boot on the release payload image, it helps me avoid tracking incompatibilities across OCP release versions and shore up technical debt(see issues linked above). +* As an Openshift engineer, having nodes boot up on an unsupported OCP version is a security liability. By having nodes boot on the latest OCP supported boot image for a given OCP release, there will be less of a skew with the release payload image. This helps me avoid tracking incompatibilities across OCP release versions and shore up technical debt(see issues linked above). * As a cluster administrator, having to keep track of a "boot" vs "live" image for a given cluster is not intuitive or user friendly. In the worst case scenario, I will have to reset a cluster(or do a lot of manual steps with rh-support in recovering the node) simply to be able to scale up nodes after an upgrade. If I'm managing a `MachineSet` managed cluster, once opted in, this feature will be a "switch on and forget" mechanism for me. If I'm managing a non `Machineset` managed cluster, this would provide me with documentation that I could follow after an upgrade to ensure my cluster has the latest bootimages. ### Goals -The MCO will take over management of the boot image references and the stub ignition. The installer is still responsible for creating the `MachineSet` at cluster bring-up of course, but once cluster installation is complete the MCO will ensure that boot images are in sync with the latest payload. From the user standpoint, this should cause less compatibility issues as nodes will no longer need to pivot to a different version of rhcos during node scaleup. +The MCO will take over management of the boot image references and the stub Ignition configuration. The installer is still responsible for creating the `MachineSet` at cluster bring-up, but once cluster installation is complete the MCO will ensure that boot images are in sync with the latest payload. From the user standpoint, this should cause less compatibility issues as nodes will no longer need to pivot to a different version of RHCOS during node scaleup. -This should not interfere with existing workflows such as Hive and ArgoCD. As this is an opt-in mechanism, the cluster admin will be protected against such scenarios of accidental "reconciliation". +This should not interfere with existing workflows such as Hive and ArgoCD. As this is an opt-in mechanism, the cluster admin will be protected against such scenarios of accidental "reconciliation" and for additional safety, the MSBIC will also ensure that machinesets that have a valid OwnerReference will be excluded from boot image updates. ### Non-Goals - The new subcontroller is only intended to support clusters that use MachineSet backed node scaling. This is meant to be a user opt-in feature, and if the user wishes to keep their boot images static it will let them do so. - This does not intend to solve [booting into custom pools](https://issues.redhat.com/browse/MCO-773). - This does not target Hypershift, as [it does not use machinesets](https://github.com/openshift/hypershift/blob/32309b12ae6c5d4952357f4ad17519cf2424805a/hypershift-operator/controllers/nodepool/nodepool_controller.go#L2168). -- This does not target [ControlPlaneMachineSets](https://docs.openshift.com/container-platform/4.14/machine_management/control_plane_machine_management/cpmso-about.html). +- This does not target [ControlPlaneMachineSets](https://docs.openshift.com/container-platform/4.14/machine_management/control_plane_machine_management/cpmso-about.html). This is considered future work and will be tracked by [MCO-773](https://issues.redhat.com/browse/MCO-1007). ## Proposal @@ -76,6 +76,7 @@ __Overview__ - Before processing a MachineSet, the MSBIC will check if the following conditions are satisfied: - `ManagedBootImages` feature gate is active - The cluster and/or the machineset is opted-in to boot image updates. + - The machineset does not have a valid owner reference. (eg. Hive and other managed machineset workflows) - The golden configmap is verified to be in sync with the current version of the MCO. The MCO will "stamp"(annotate) the golden configmap with the new version of the MCO after atleast 1 node has succesfully completed an update to the new OCP image. This helps prevent `machinesets` being updated too soon at the end of a cluster upgrade, before the MCO itself has updated and has had a chance to roll out the new OCP image to the cluster. If any of the above checks fail, the MSBIC will exit out of the sync. @@ -189,8 +190,8 @@ It is important to note that InfrastructureMachineTemplate is different per plat Based on the observation above, here is a rough outline of what CAPI support would require: - CAPI backed MachineSet detection, so the MSBIC knows when to invoke the CAPI path - If a boot image update is required, create a new `InfrastructureMachineTemplate` by cloning the existing and updating the boot image reference within. The name of the new `InfrastructureMachineTemplate` object will be generated by hashing the template content. This is consistent with the current CAPI approach to naming new objects. -- Updating the ignition stub in `bootstrap.dataSecretName` to the managed stub secret(`*-managed`) if needed. -- CAPI backed MachineSet patching +- Updating the Ignition stub in `bootstrap.dataSecretName` to the managed stub secret(`*-managed`) if needed. +- CAPI backed MachineSet patching. Once patching is successfully completed, the original `InfrastructureMachineTemplate` can be garbage collected. Much of the existing design regarding architecture & platform detection, opt-in, degradation and storing boot image history can remain the same. @@ -199,56 +200,125 @@ When [MachineDeployments](https://cluster-api.sigs.k8s.io/developer/architecture ### API Extensions #### Opt-in Mechanism +This proposal introduces a new CR in the MCO operator API, `ManagedBootImages` which encloses an array of `MachineManager` objects. A `MachineManager` object contains the resource type of the machine management object that is being opted-in, the API group of that object and a union discriminant object of the type `MachineManagerSelector`. This object `MachineManagerSelector` encloses: -This proposal will introduce a discriminated union in [operator types](https://github.com/openshift/api/blob/master/operator/v1/types_machineconfiguration.go) for the MCO, `ManagedBootImageConfig` which has two fields: +- The union discriminator, `Mode`, can be set to three values : All, Partial and None. +- Partial: This is a label selector that will be used by users to opt-in a custom selection of machine resources. When the Mode is set to Partial mode, all machinesets in the selector list would be considered enrolled for updates. For all other values of Mode, this selector does not exist. -- `Mode` This is a string enum which can have three values: - - `Enabled` - All `Machinesets` will be enrolled for boot image updates. - - `CustomConfig` - `Machinesets` matched with the label selector will be enrolled for boot image updates. - - `Disabled` - No `Machinesets` will be enrolled for boot image updates. -- `CustomConfig` This is struct which encloses a label selector that will be used by machineset objects to opt-in. - -Here are some YAML examples that describes operators in each of these modes: -##### Enabled -``` -apiVersion: operator.openshift.io/v1 -kind: MachineConfiguration -metadata: - name: default - labels: -spec: - managedBootImageConfig: - mode: Enabled -``` -##### Disabled ``` -apiVersion: operator.openshift.io/v1 -kind: MachineConfiguration -metadata: - name: default - labels: -spec: - managedBootImageConfig: - mode: Disabled +type ManagedBootImages struct { + // machineManagers is an array of machineManager objects. + // The MCO will watch for changes to this list and register/de-register machine management resources from boot image updates. + // An entry in this list consists of the resource type, the API group that the resource belongs to and a selection filter + // on the resources. + // + // Warning: Only one entry is permitted per unique pair of resource/API group. The label selector provided within MachineManager + // can be used for further customization if required. + // + // +optional + // +listType=map + // +listMapKey=resource + // +listMapKey=apiGroup + MachineManagers []MachineManager `json:"machineManagers"` +} + +// MachineManager contains identifying information of a machine management resource(eg. a machineset) that will be +// registered for boot image updates. This is likely to evolve as support for more machine management resources are added. +type MachineManager struct { + // resource is the machine management resource's type. + // + // The following values are accepted: + // - MachineSets: The machine manager will only register resources of the type MachineSet, which may belong to MachineAPI or ClusterAPI. + // + // +kubebuilder:validation:Required + Resource MachineManagerMachineSetsResourceType `json:"resource"` + // apiGroup is name of the APIGroup that the machine management resource belongs to. + // + // The following values are accepted: + // - MachineAPI: The machine manager will only register resources that belong to MachineAPI APIGroup. + // + // +kubebuilder:validation:Required + APIGroup MachineManagerMachineSetsAPIGroupType `json:"apiGroup"` + // selection allows granular control of the machine management resources that will be registered for boot image updates. + // + // +kubebuilder:validation:Required + Selection MachineManagerSelector `json:"selection"` +} + +// +kubebuilder:validation:XValidation:rule="has(self.mode) && self.mode == 'Partial' ? has(self.partial) : !has(self.partial)",message="Partial is required when type is partial, and forbidden otherwise" +// +union +type MachineManagerSelector struct { + // mode is a union discriminator for MachineManagerSelector and can have three possible values. + // - All: All resources specified by the parent MachineManager are registered for boot image updates. + // - None: No resources specified by the parent MachineManager are registered for boot image updates. + // - Partial: resources specified by the parent MachineManager are registered for boot image updates only if they match with the label selector. + // +unionDiscriminator + // +kubebuilder:validation:Required + Mode MachineManagerSelectorMode `json:"mode"` + + // partial provides a label selector that can be used to match machine management resources. + // Only permitted when mode is set to "Partial". + // +optional + Partial *metav1.LabelSelector `json:"partial,omitempty"` +} + +// MachineManagerSelectorMode is a string enum used in the MachineManagerSelector union discriminator. +// +kubebuilder:validation:Enum:="All";"None";"Partial" +type MachineManagerSelectorMode string + +const ( + // All represents a configuration mode that registers all resources specified by the parent MachineManager for boot image updates. + All MachineManagerSelectorMode = "All" + + // None represents a configuration mode that will not register any resource specified by the parent MachineManager MachineManager + // for boot image updates. + None MachineManagerSelectorMode = "None" + + // Partial represents a configuration mode that will register resources specified by the parent MachineManager only + // if they match with the label selector. + Partial MachineManagerSelectorMode = "Partial" +) + +// MachineManagerManagedResourceType is a string enum used in the MachineManager type to describe the resource +// type to be registered. +// +kubebuilder:validation:Enum:="machinesets" +type MachineManagerMachineSetsResourceType string + +const ( + // machinesets represent the MachineSet resource type, which manage a group of machines. + // Although this could belong to a MachineAPI or a ClusterAPI, only MAPI is currently supported. + MachineSets MachineManagerMachineSetsResourceType = "machinesets" +) + +// MachineManagerManagedAPIGroupType is a string enum used in in the MachineManager type to describe the APIGroup +// of the resource type being registered. +// +kubebuilder:validation:Enum:="machine.openshift.io" +type MachineManagerMachineSetsAPIGroupType string + +const ( + // MachineAPI represent the traditional MAPI Group that a machineset may belong to. + // This feature only supports MAPI machinesets at this time. + MachineAPI MachineManagerMachineSetsAPIGroupType = "machine.openshift.io" +) ``` -##### MatchSelector +Here is a YAML snippet of what this config could look like: ``` -apiVersion: operator.openshift.io/v1 -kind: MachineConfiguration -metadata: - name: default - labels: -spec: - managedBootImageConfig: - mode: CustomConfig - CustomConfig: - machineSetSelector: - matchLabels: - machineconfiguration.openshift.io/mco-managed-machineset: "" +managedBootImages: + machineManagers: + - resource: machinesets + apiGroup: cluster.x-k8s.io + selection: + mode: Partial + partial: + matchLabels: {} + - resource: machinesets + apiGroup: machine.openshift.io + selection: + mode: All ``` -Note: While in this mode, the label added to the selector will have to be added to the `machineset` object. +The above example partially selections CAPI MachineSets and all MAPI Machinesets. Please note that for every unique pair of resource/APIGroup, only 1 entry is allowed in machineManagers. This is to avoid providing conflicting instructions for the same type of machine resourcess. The user can then use the partial label selector if further customization is required. -A [ValidatingAdmissionPolicy](https://kubernetes.io/docs/reference/access-authn-authz/validating-admission-policy/) will be implemented via an MCO manifest that will restrict updating the `ManagedBootImageConfig` object to only supported platforms(initially, just GCP). This will be updated as we phase in support for other platforms. Here is a sample policy that would do this: +A [ValidatingAdmissionPolicy](https://kubernetes.io/docs/reference/access-authn-authz/validating-admission-policy/) will be implemented via an MCO manifest that will restrict updating the `ManagedBootImages` object to only supported platforms(initially, just GCP). This will be updated as we phase in support for other platforms. Here is a sample policy that would do this: ``` apiVersion: admissionregistration.k8s.io/v1beta1 @@ -267,7 +337,7 @@ spec: operations: ["CREATE", "UPDATE"] resources: ["MachineConfiguration"] validations: - - expression: "has(object.spec.MachineBootImageConfig) && param.status.platformStatus.Type != `GCP`" + - expression: "has(object.spec.ManagedBootImages) && param.status.platformStatus.Type != `GCP`" message: "This feature is only supported on these platforms: GCP" ``` This would need an accompanying binding: @@ -285,7 +355,7 @@ spec: ``` #### Tracking boot image history -This proposal will also introduce a new CR, `BootImageHistory` for tracking boot image history. As a starting point, here is a stub type definition for this: +This is just an idea for the moment and is not planned to included when the feature initially GAs. Based on customer feedback and team capacity, this will be implemented in a later release. Boot Image History will be tracked by a new CR called `BootImageHistory`. The MCO will not directly consume from this CR. As a starting point, here is a stub type definition for this: ``` type BootImageHistory struct { @@ -298,17 +368,17 @@ type BootImageHistory struct { // BootImageHistorySpec defines the desired state of BootImageHistory type BootImageHistorySpec struct { -} - -// BootImageHistoryStatus defines the observed state of BootImageHistory -type BootImageHistoryStatus struct { // machineResourceReference contains identifying information of the machine management resource being tracked. // +kubebuilder:validation:Required + // +kubebuilder:validation:XValidation:rule="self == oldSelf",message="MachineResourceReference is immutable once set" // +required - MachineResourceReference MachineResourceReference `json:"machineResourceReference"` + MachineResourceReference MachineResourceReference `json:"machineResourceReference"`} + +// BootImageHistoryStatus defines the observed state of BootImageHistory +type BootImageHistoryStatus struct { // details is a list of boot image history entries of the machine resource. // +optional - Details []BootImageHistoryDetail `json:"details,omitempty"` + Details []BootImageHistoryDetail `json:"details"` } type MachineResourceReference struct { @@ -316,10 +386,11 @@ type MachineResourceReference struct { // +kubebuilder:validation:Required // +required Name string `json:"name"` - // kind is the machine management resource's kind + // resource is the machine management resource's type + // Example: "machineset", "machinedeployment"etc. // +kubebuilder:validation:Required // +required - Kind string `json:"kind"` + Resource string `json:"resource"` // apiGroup is name of the APIGroup that the machine management resource belongs to. This is for disambiguating // between Cluster API and Machine API backed resources. // +kubebuilder:validation:Required @@ -332,9 +403,12 @@ type BootImageHistoryDetail struct { // updateTime records the timestamp at which the update took place. // +required UpdateTime metav1.Time `json:"updatedTime"` - // bootImageRef records the new boot image reference to which the update took place. + // bootImageVersion records the RHCOS version string to which this update took place. // +required - BootImageRef string `json:"bootImageRef"` + BootImageVersion string `json:"bootImageVersion"` + // configMapGeneration records the version of the golden configmap during this update + // +required + ConfigMapGeneration int64 `json:"configMapGeneration"` } // BootImageHistoryList contains a list of BootImageHistory @@ -345,26 +419,27 @@ type BootImageHistoryList struct { } ``` -There will be one instance of this per machine management resource(which can be a MachineSet[MAPI or CAPI], MachineDeployment...etc). It will be named the same as the resource being tracked. The MSBIC is responsible for creating and updating this CR when a boot image update takes place. This CR will exist in the same namespace as the resource. +There will be one instance of this per machine management resource(which can be a MachineSet[MAPI or CAPI], MachineDeployment...etc). It will be named the in the following format: `$(name)-$(resource)`. The MSBIC is responsible for creating and updating this CR when a boot image update takes place. This CR will exist in the same namespace as the resource. YAML Example for a MAPI backed machineset scenario: ``` apiVersion: machineconfiguration.openshift.io/v1alpha1 kind: BootImageHistory metadata: - name: djoshy10-2tcqv-worker-a -spec: {} -status: + name: djoshy10-2tcqv-worker-a-mapi-machineset +spec: machineResourceReference: name: djoshy10-2tcqv-worker-a - kind: MachineSet - apiGroup: cluster.x-k8s.io/v1alpha3 + resource: MachineSet + apiGroup: machine.openshift.io +status: details: - updateTime: "2023-12-14T12:00:00Z" - bootImageRef: "projects/rhcos-cloud/global/images/rhcos-414-92-202308032115-0-gcp-x86-64" + bootImageVersion: "414.92.202308032115-0" + configMapGeneration: 2 - updateTime: "2023-12-14T14:30:00Z" - bootImageRef: "projects/rhcos-cloud/global/images/rhcos-415-92-202311241643-0-gcp-x86-64" - + bootImageVersion: "415.92.202311241643-0" + configMapGeneration: 3 ``` YAML Example for a CAPI backed machineset scenario: @@ -373,20 +448,22 @@ apiVersion: machineconfiguration.openshift.io/v1alpha1 kind: BootImageHistory metadata: name: djoshy10-2tcqv-worker-a -spec: {} -status: +spec: machineResourceReference: - name: djoshy10-2tcqv-worker-a - kind: MachineSet - apiGroup: machine.openshift.io/v1beta1 + name: djoshy10-2tcqv-worker-a-capi-machineset + resource: MachineSet + apiGroup: cluster.x-k8s.io +status: details: - updateTime: "2023-12-14T12:00:00Z" - bootImageRef: "projects/rhcos-cloud/global/images/rhcos-414-92-202308032115-0-gcp-x86-64" + bootImageVersion: "414.92.202308032115-0" + configMapGeneration: 2 - updateTime: "2023-12-14T14:30:00Z" - bootImageRef: "projects/rhcos-cloud/global/images/rhcos-415-92-202311241643-0-gcp-x86-64" + bootImageVersion: "415.92.202311241643-0" + configMapGeneration: 3 ``` -The goal of this is to provide information about the "lineage" of a machine management resource to the user. The user can then manually restore their machine management resource to an earlier state if they wish to do so by following documentation. The MCO will not directly consume from this CR. This is not planned to be part of the initial release, but more of a nice to have. +The goal of this is to provide information about the "lineage" of a machine management resource to the user. The user can then manually restore their machine management resource to an earlier state if they wish to do so by following documentation. ### Implementation Details/Notes/Constraints [optional] @@ -452,7 +529,8 @@ Additionaly, a phased approach such as the following is the proposed: #### Phase 2 - Tracking boot image history - User facing documentation for manual restoration -- User customization of ignition stub +- User customization of Ignition stub secret +- Canary testing a patched MachineSet, gated by a flag. #### Removing a deprecated feature