Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

📖 Add proposal for MachinePool Machines #6088

Merged
merged 1 commit into from
May 25, 2022

Conversation

mboersma
Copy link
Contributor

@mboersma mboersma commented Feb 9, 2022

What this PR does / why we need it:

Adds a proposal for MachinePool Machines: MachinePools should be enhanced to own Machines that represent each of their replicas.

This should still be considered a draft or WIP, but any feedback or reviewing would be highly appreciated!

There is an accompanying proof-of-concept implementation for Docker in #6089.
There is an accompanying implementation of MachinePool support in cluster-autoscaler in kubernetes/autoscaler#4676.

These "MachinePool Machines" will open up the following opportunities:

  • Share common behavior for Machines in MachinePools, which otherwise would need to be implemented by each infrastructure provider.
  • Allow both Cluster API-driven and externally driven interactions with individual Machines in a MachinePool.
  • Enable MachinePools to use deployment strategies similarly to MachineDeployments.

Which issue(s) this PR fixes:

Fixes #4063

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Feb 9, 2022
@k8s-ci-robot k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Feb 9, 2022
Copy link
Member

@fabriziopandini fabriziopandini left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mboersma sorry if I get to this only now
+1 for the direction, some questions WRT to the implementation details; happy to discuss by person if this can help to clarify my comments

docs/proposals/20220209-machinepool-machines.md Outdated Show resolved Hide resolved
docs/proposals/20220209-machinepool-machines.md Outdated Show resolved Hide resolved
docs/proposals/20220209-machinepool-machines.md Outdated Show resolved Hide resolved
docs/proposals/20220209-machinepool-machines.md Outdated Show resolved Hide resolved
@sbueringer
Copy link
Member

Did another round of review, no further comments from my side, apart from the ones that are currently open.

@enxebre
Copy link
Member

enxebre commented Apr 6, 2022

dropped few more suggestions, lgtm overall I have no objections to proceed.

@CecileRobertMichon
Copy link
Contributor

lgtm pending selector field naming comment resolution

Copy link
Contributor

@CecileRobertMichon CecileRobertMichon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

squash?

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Apr 7, 2022

When a MachinePool Machine is deleted manually, the system will delete the corresponding provider-specific resource. The opposite is also true: when a provider-specific resource is deleted, the system will delete the corresponding MachinePool Machine. This happens by virtue of the infrastructureRef <-> ownerRef relationship.

In both cases, the MachinePool will notice the missing replica and create a new one in order to maintain the desired number of replicas. To scale down by removing a specific instance, that Machine should be given the "cluster.x-k8s.io/delete-machine" annotation and then the replicaCount on the MachinePool should be decremented.
Copy link

@shyamradhakrishnan shyamradhakrishnan Apr 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Who will decrement the replica count? Is it common capi controller or the individual machinepool controller?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is actually the cluster-autoscaler case, so the annotation is applied to a specific Machine and then the replicaCount is decremented by an external actor. Either a user or software like cluster-autoscaler.


#### Story U3

A cluster admin updates a MachinePool to a newer Kubernetes version and would like to configure the strategy for that deployment so that the MachinePool will progressively roll out the new version of the machines. They would like this operation to cordon and drain each node to minimize workload disruptions.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again I somehow dont understand which component will own this? Will CAPI machinepool controller do this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case the CAPI machinepool controller's existing logic should own everything. Behind this point is the idea that CAPZ has already implemented cordon + drain separately for its own MachinePool Machine implementation, but should be able to remove some of that duplicated logic now.

@shyamradhakrishnan I apologize for missing these comments for so long! Let me know what other questions you have.

Copy link
Member

@fabriziopandini fabriziopandini left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really like the idea of machine pools relying on selectors to identify machines.
However, I have some aspects I still need to properly grok:

  • Cohesistance of MachinePools with and without machines
  • Creations of a new kind of infra machines (providerMachinePoolMachine)
  • Ownership of Machines management assigned to providers

I will try to make another pass next week, but if it could help to speed up things I'm also available for a call discussing above points in person

docs/proposals/20220209-machinepool-machines.md Outdated Show resolved Hide resolved
docs/proposals/20220209-machinepool-machines.md Outdated Show resolved Hide resolved
docs/proposals/20220209-machinepool-machines.md Outdated Show resolved Hide resolved
@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Apr 20, 2022
@elmiko
Copy link
Contributor

elmiko commented May 11, 2022

agreed at the 11 May 2022 meeting, we are starting a 1 week lazy consensus today. if there are no objections it will merge on the 18th.

Copy link
Contributor

@CecileRobertMichon CecileRobertMichon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

overall lgtm, I like the UX consistency + being able to reuse the same InfraMachine Kind revisions as those will make adoption easier for more providers

just a few minor comments from the re-read

docs/proposals/20220209-machinepool-machines.md Outdated Show resolved Hide resolved
docs/proposals/20220209-machinepool-machines.md Outdated Show resolved Hide resolved
docs/proposals/20220209-machinepool-machines.md Outdated Show resolved Hide resolved
Copy link
Member

@sbueringer sbueringer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few nits from my side.

I'm unfortunately not that familiar with MachinePools so I have a hard time reviewing this proposal, but no objections from my side.

Would be great to get a review from some CAPA folks
(cc @richardcase @sedefsavas @pydctw)

docs/proposals/20220209-machinepool-machines.md Outdated Show resolved Hide resolved
docs/proposals/20220209-machinepool-machines.md Outdated Show resolved Hide resolved
docs/proposals/20220209-machinepool-machines.md Outdated Show resolved Hide resolved
Copy link
Contributor

@CecileRobertMichon CecileRobertMichon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label May 19, 2022
@fabriziopandini
Copy link
Member

/lgtm
looking forward to seeing this effort moving, I really appreciate the effort to make MP consistent with the rest of CAPI

@sbueringer
Copy link
Member

Thx!
/lgtm
(sorry for the late response, couldn't find the time during KubeCon)

@CecileRobertMichon
Copy link
Contributor

/approve
/hold

Will leave the hold until EOD in case anyone wants to take another look, we're past lazy consensus and there are no objections

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label May 24, 2022
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: CecileRobertMichon

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 24, 2022
@fabriziopandini
Copy link
Member

/hold cancel

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label May 25, 2022
@k8s-ci-robot k8s-ci-robot merged commit b66cb27 into kubernetes-sigs:main May 25, 2022
@k8s-ci-robot k8s-ci-robot added this to the v1.2 milestone May 25, 2022
@mboersma mboersma deleted the mpms-proposal branch May 25, 2022 19:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Resources representing MachinePool Machines