Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MachinePool does not automatically rollout changes to the bootstrap.configRef #6563

Closed
sbueringer opened this issue May 27, 2022 · 14 comments · Fixed by #8667
Closed

MachinePool does not automatically rollout changes to the bootstrap.configRef #6563

sbueringer opened this issue May 27, 2022 · 14 comments · Fixed by #8667
Labels
help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. kind/bug Categorizes issue or PR as related to a bug. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. triage/accepted Indicates an issue or PR is ready to be actively worked on.

Comments

@sbueringer
Copy link
Member

sbueringer commented May 27, 2022

What steps did you take and what happened:
[A clear and concise description on how to REPRODUCE the bug.]

I changed the bootstrap.configRef in a MachinePool and the MachinePool didn't rollout the change.

What did you expect to happen:

I expected the MachinePool to rollout the changes just like the MachineDeployment does.

Anything else you would like to add:
[Miscellaneous information that will assist in solving the issue.]

Some more details:

Initial MachinePool

spec:
  template:
    spec:
      bootstrap:
        configRef:
          apiVersion: bootstrap.cluster.x-k8s.io/v1beta1
          kind: KubeadmConfig
          name: k8s-upgrade-and-conformance-uiq8gp-mp-0-config-cgroupfs
          namespace: k8s-upgrade-and-conformance-hf25m6
        dataSecretName: k8s-upgrade-and-conformance-uiq8gp-mp-0-config-cgroupfs

I updated .spec.template.spec.bootstrap.configRef to k8s-upgrade-and-conformance-uiq8gp-mp-0-config and the change was not rolled out.

The cause of this is that the dataSecretName is only set by the MachinePool controller if it is empty. If I remove the dataSecretName from the MachinePool the MachinePool controller picks up the data secret of the new BootstrapConfig and the change is rolled out.

In my opinion we should update the dataSecretName when the bootstrap configref is changed. Maybe even if the ref stays the same and only the dataSecretName in the status of the referenced resource is updated, this would probably be easier to implement as we just have to always propagate the dataSecretName of the referenced BootstrapConfig.

Notes:

Environment:

  • Cluster-api version: latest

/kind bug
[One or more /area label. See https://github.com/kubernetes-sigs/cluster-api/labels?q=area for the list of labels]

@k8s-ci-robot k8s-ci-robot added the kind/bug Categorizes issue or PR as related to a bug. label May 27, 2022
@sbueringer
Copy link
Member Author

cc @CecileRobertMichon @mboersma
I think I got everything from the Slack thread

@fabriziopandini fabriziopandini added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed triage/accepted Indicates an issue or PR is ready to be actively worked on. labels Jul 29, 2022
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 27, 2022
@fabriziopandini
Copy link
Member

/triage accepted
/lifecycle frozen
cc @CecileRobertMichon this IMO is something to address before graduating MP

/help

@k8s-ci-robot
Copy link
Contributor

@fabriziopandini:
This request has been marked as needing help from a contributor.

Guidelines

Please ensure that the issue body includes answers to the following questions:

  • Why are we solving this issue?
  • To address this issue, are there any code changes? If there are code changes, what needs to be done in the code and what places can the assignee treat as reference points?
  • Does this issue have zero to low barrier of entry?
  • How can the assignee reach out to you for help?

For more details on the requirements of such an issue, please see here and ensure that they are met.

If this request no longer meets these requirements, the label can be removed
by commenting with the /remove-help command.

In response to this:

/triage accepted
/lifecycle frozen
cc @CecileRobertMichon this IMO is something to address before graduating MP

/help

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Nov 2, 2022
@sbueringer
Copy link
Member Author

@richardcase @fabriziopandini @CecileRobertMichon

Just fyi, I think this is a problem / limitation when we try to rollout changes to the BootstrapConfig of a MachinePool via ClusterClass.

Not sure what the expected UX for MachinePools is. Is the expectation that users (/ClusterClass) unset the dataSecretName when they change the bootstrap ref? I think with ClusterClass we have two options:

  • Change the behavior of MachinePools (i.e. rollout when the bootstrap ref is changed)
  • Unset the dataSecretName when bootstrap ref is changed (feels a bit strange to me).

Opinions?

@CecileRobertMichon
Copy link
Contributor

when we try to rollout changes to the BootstrapConfig of a MachinePool via ClusterClass.

AFAIK this isn't supported yet? I thought @richardcase was working on it?

But agreed, we should probably fix this.

@sbueringer
Copy link
Member Author

sbueringer commented May 10, 2023

It's not supported yet.

I just think it's probably better to fix this before we implement unsetting the dataSecretName in ClusterClass to make it work (assuming rollout after ref change is the expected UX).

Mostly wanted to know what the expected UX is.

@Jont828
Copy link
Contributor

Jont828 commented May 10, 2023

@sbueringer I'm happy to take a look at this. So at a high level, do we just want to compare the data secret name referenced in bootstrapConfig.status.dataSecretName with machinePool.spec.template.spec.bootstrap.dataSecretName?

@Jont828
Copy link
Contributor

Jont828 commented May 10, 2023

Also could you point me to where the bootstrap.configRef change roll out happens for MachineDeployments? I'm thinking we can probably mirror what it's doing there.

@sbueringer
Copy link
Member Author

@sbueringer I'm happy to take a look at this.

Thx!

So at a high level, do we just want to compare the data secret name referenced in bootstrapConfig.status.dataSecretName with machinePool.spec.template.spec.bootstrap.dataSecretName?

Roughly, yes. Today the MachinePool controller is only writing MP.spec.template.spec.bootstrap.dataSecretName when it is nil. I think it should always keep the field sync with BootstrapConfig.status.dataSecretName. The problem is here:

if m.Spec.Template.Spec.Bootstrap.DataSecretName != nil {
m.Status.BootstrapReady = true
conditions.MarkTrue(m, clusterv1.BootstrapReadyCondition)
return ctrl.Result{}, nil
}

As soon as the field is not nil reconcileBootstrap is done. I think it should always run the entire func.

Also could you point me to where the bootstrap.configRef change roll out happens for MachineDeployments? I'm thinking we can probably mirror what it's doing there.

I think MachineDeployment is too different. MachineDeployment has BootstrapConfigTemplates instead of a BootstrapConfig and it is creating a new MachineSet as soon as a BootstrapConfig ref points to another template.

@sbueringer
Copy link
Member Author

Just my opinion, would be good to get confirmation from someone who is more familiar with MachinePools (cc @CecileRobertMichon @mboersma)

@CecileRobertMichon
Copy link
Contributor

As soon as the field is not nil reconcileBootstrap is done. I think it should always run the entire func.

Had to refresh my memory on this one. I think we only want to reconcile if the bootstrap.configRef is not nil. That's because we need to preserve the case where the user passes in their own DataSecretName (as described in https://github.com/kubernetes-sigs/cluster-api/blob/main/api/v1beta1/machine_types.go#L253) and doesn't use the bootstrap controller.

More context in this slack thread: https://kubernetes.slack.com/archives/C8TSNPY4T/p1653424336790119

@Jont828
Copy link
Contributor

Jont828 commented May 15, 2023

@CecileRobertMichon @sbueringer So just to clarify we want to run these lines

// If the bootstrap config is being deleted, return early.
if !bootstrapConfig.GetDeletionTimestamp().IsZero() {
return ctrl.Result{}, nil
}
// Determine if the bootstrap provider is ready.
ready, err := external.IsReady(bootstrapConfig)
if err != nil {
return ctrl.Result{}, err
}
// Report a summary of current status of the bootstrap object defined for this machine pool.
conditions.SetMirror(m, clusterv1.BootstrapReadyCondition,
conditions.UnstructuredGetter(bootstrapConfig),
conditions.WithFallbackValue(ready, clusterv1.WaitingForDataSecretFallbackReason, clusterv1.ConditionSeverityInfo, ""),
)
if !ready {
log.V(2).Info("Bootstrap provider is not ready, requeuing")
return ctrl.Result{RequeueAfter: externalReadyWait}, nil
}
// Get and set the name of the secret containing the bootstrap data.
secretName, _, err := unstructured.NestedString(bootstrapConfig.Object, "status", "dataSecretName")
if err != nil {
return ctrl.Result{}, errors.Wrapf(err, "failed to retrieve dataSecretName from bootstrap provider for MachinePool %q in namespace %q", m.Name, m.Namespace)
} else if secretName == "" {
return ctrl.Result{}, errors.Errorf("retrieved empty dataSecretName from bootstrap provider for MachinePool %q in namespace %q", m.Name, m.Namespace)
}
m.Spec.Template.Spec.Bootstrap.DataSecretName = pointer.String(secretName)
m.Status.BootstrapReady = true
return ctrl.Result{}, nil

in the event that m.Spec.Template.Spec.Bootstrap.DataSecretName == nil or that m.Spec.Template.Spec.Bootstrap.DataSecretName != nil && m.Spec.Template.Spec.Bootstrap.ConfigRef != nil?

@sbueringer
Copy link
Member Author

sbueringer commented May 16, 2023

I think in the case that m.Spec.Template.Spec.Bootstrap.ConfigRef != nil. Doesn't matter if DataSecretName is nil or not

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. kind/bug Categorizes issue or PR as related to a bug. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants