Add Declarative Single Entity Control Plane proposal #278

enxebre · 2020-04-03T07:33:22Z

This proposal outlines a solution for declaratively managing as a single entity the compute resources that host the OCP Control Plane components.

cc @abhinavdahiya for Installer bootstrapping, @hexfusion for Etcd, @dgoodwin for Hive, @JoelSpeed for machine API, @jeremyeder @smarterclayton @derekwaynecarr @deads2k

openshift-ci-robot · 2020-04-03T07:34:49Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: enxebre

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [enxebre]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

enxebre · 2020-04-03T07:35:45Z

Not sure why this gets automatically approve label, holding to make sure automation does not merge by accident.
/hold

dgoodwin · 2020-04-03T12:52:12Z

enhancements/machine-api/control-plane.md

+#### Declarative horizontal scaling
+##### Scale out
+1 - The controller always reconciles towards expected number of replicas. This must be an odd number.
+2 - Fetch all existing control plane Machine resources by ownerRef. Adopt any other machine having a targeted label.


Just a quick note on the owner references, will we block deletion of the ControlPlane CRD entirely? I wouldn't want kube garbage collecting these machines in any scenario.

As an alternative would just relying on labels be sufficient?

Do we currently block the CRD for Machines from being deleted? Deleting this would have a similar effect right? If this was already considered there, there may be a mechanism that could be reused

jim-minter · 2020-04-03T20:18:05Z

@enxebre thanks for this! What about onboarding existing clusters? Say this lands in 4.N. Is there an adoption path from a cluster of version 4.(N-1)?

enhancements/machine-api/control-plane.md

JoelSpeed · 2020-04-06T14:37:00Z

enhancements/machine-api/control-plane.md

+### Goals
+
+- To have a declarative mechanism to manage the Control Plane as a single entity.
+- To support declarative safe horizontal scaling towards an odd number replicas for the control plane compute resources.


Suggested change

- To support declarative safe horizontal scaling towards an odd number replicas for the control plane compute resources.

- To support declarative safe horizontal scaling towards an odd number of replicas for the control plane compute resources.

Having an even number of replicas is not inherently bad.
You don't achieve any additional failure tolerance by adding the last node, but maybe there is a performance benefit.
So perhaps it is not beneficial to enforce in code.

Is it fair to say etcd docs argue strongly that an even number of replicas is inherently bad?

Although adding a node to an odd-sized cluster appears better since there are more machines, the fault tolerance is worse since exactly the same number of nodes may fail without losing quorum but there are more nodes that can fail.

during a network partition, an odd number of members guarantees that there will always be a majority partition that can continue to operate and be the source of truth when the partition ends

I'm sure they do, if one knows nothing about a given setup, then "odd" is the most reliable advice to give.

However it is possible to construct scenarios where 2N nodes is less vulnerable to outages than 2N + 1 by varying how the machines are organised/connected (ie. the number of racks).

So advise an odd number of nodes: absolutely
Enforce an odd number of nodes: my vote is no

enhancements/machine-api/control-plane.md

JoelSpeed · 2020-04-06T14:43:54Z

enhancements/machine-api/control-plane.md

+
+#### Declarative horizontal scaling
+##### Scale out
+1 - The controller always reconciles towards expected number of replicas. This must be an odd number.


Can the odd number be enforced somehow?

I would argue that it shouldn't be. Odd is just an availability driven convention but there may be other reasons (capacity) to add an extra node (making the total even).

enhancements/machine-api/control-plane.md

JoelSpeed · 2020-04-06T14:59:06Z

enhancements/machine-api/control-plane.md

+This is effectively a rolling upgrade with maxUnavailable 0 and maxSurge 1.
+
+#### Self healing (MHC + reconciling)
+1 - The Controller should always reconcile by removing Etcd members for voluntary Machine disruptions, i.e machine deletion.


MHC will delete the Machine, causing draining to happen, does this affect this at all?

I think etcd is a static pod and therefor immune to drains

I was not very clear what I was thinking here, I think I was thinking more along the lines of do we need to remove Etcd from the Etcd cluster somehow when doing this, or is there something else that will notice the member has gone and remove it from the cluster?

JoelSpeed · 2020-04-06T15:47:53Z

enhancements/machine-api/control-plane.md

+	Status ControlPlaneStatus `json:"status,omitempty"`
+}
+
+type ControlPlaneSpec struct {


Should this use a MachineTemplateSpec in the same way a MachineSet does so that they look similar + get objectmeta label copying?

why would we expose anything else than we really need at the API? Couldn't anything other than the providerSpec be generated by the controller?

My thought was that users may want to add labels to their control-plane machines, that's all we use the objectmeta for on the the MachineTemplateSpec within MachineSets right?

JoelSpeed · 2020-04-06T15:56:00Z

enhancements/machine-api/control-plane.md

+### Version Skew Strategy
+
+## Implementation History
+- "https://github.com/kubernetes-sigs/cluster-api/blob/master/docs/proposals/20191017-kubeadm-based-control-plane.md"


How similar is this to upstream? Given we want to eventually converge somewhat with upstream, does this fall in line with the assumptions for a Control Plane upstream?

This proposal is fairly aligned with the capi kubeadm control plane one. The main difference is here we don't need to manage the lifecycle of etcd via kubeadm. We just want to check its healthy as a gate for proceeding with scale operations.
See also kubernetes-sigs/cluster-api#2836

deads2k · 2020-04-06T15:56:07Z

enhancements/machine-api/control-plane.md

+## Alternatives
+
+- Same approach but different details:
+The Control Plane controller scaling logic could be built atop machineSets just like a deployment uses replica sets. The scenarios to be supported in this proposal are very specific. The flexibility that managing machineSets would provide is not needed. The Control Plane is critical and we want to favour control over flexibility here, therefore this proposes an ad-hoc controller logic to avoid unnecessary layers of complexity.


line break every 120 or every sentence (whichever is shorter) to make commenting easier.

Can you expand on why we would not re-use this existing metaphor with some admission webhook validators to restrict how far the adjustments can go? This seems be the "standard" API that existing for scaling up and down and I would have expected us to double-down on the existing metaphor instead of creating something new and special.

I would prefer we layer this on top of machinesets as it allows

the installer to define the failure domains

no new api for installer to configure the placement.

the new control-plane becomes almost similar to autoscaler with added niceness.

Aside from auto-scaling - which it seems isn't the goal of the proposal as it stands, although it makes sense to me to add it - what would be the main function of this controller if it was built on machinesets?

Say for scale-up, would this come down to choose-which-control-plane-machineset-to-scale-up and an (all Etcd members are healthy AND all owned machines have a backed ready node) safety wrapper?

Say for scale-up, would this come down to choose-which-control-plane-machineset-to-scale-up and an (all Etcd members are healthy AND all owned machines have a backed ready node) safety wrapper?

For this scale-up case, could we add additional extra layers of safety around cluster-etcd-operator's decision to add a new etcd member, such that it's safe to add new control plane compute capacity and have cluster-etcd-operator worry about the etcd operational aspects?

If we could drop that safety check and assume scale compute in/out is always completely safe this controller would come down to:

Ensure even spread of compute resources across failure domains / machineSets

Rolling upgrade for vertical resizing.

Managing the underlying MHC for safety maxUnhealthy value base on replicas.

I'd love that but not sure that'd be ever feasible for all scenarios though. Say e.g for scaling in when 1 etcd member out of 3 is unhealthy a "cattle" scale down can easily break quorum. I tried to elaborate more my etcd concerning scenarios in alternatives and proposal

cc @hexfusion for feedback on the etcd scenarios and how to possibly decouple more compute from etcd healthiness

If we can make sure any concerning scenario as in https://github.com/openshift/enhancements/pull/278/files#diff-dcd7423e408e8b740b1b9b14e53eeda1R293-R316 would be handled gracefully by etcd operator then we can drop the any etcd “pet” logic from this proposal and treat this as regular compute scaling operations.

abhinavdahiya · 2020-04-06T20:21:30Z

enhancements/machine-api/control-plane.md

+7 - Choose a failure domain.
+8 - Create new Machine object with a templated spec. Go to 1.


1- the doc is missing how this is going to be calculated.

2- I think we should layer this on top of machinesets instead of machine objects, because each machineset object already defines a failure domain.

I like the idea of using machinesets, but are they compatible with UPI?

like the idea of using machinesets, but are they compatible with UPI?

Any machine API resource is tied to the the API and controllers being operational regardless of how you bootstrap the product UPI/IPI.

I think we should layer this on top of machinesets instead of machine objects, because each machineset object already defines a failure domain.

Regardless of we layer on machines or machineSets we need something on top to decide which failure domain out of a given list to pick for the machine/machineSet. So we are able to scale out evenly. Nothing does that for you today unless we assume the machineSets would be always pre-created by something external to the controller i.e the installer.

@abhinavdahiya let me think about this a bit.

Nothing does that for you today unless we assume the machineSets would be always pre-created by something external to the controller i.e the installer.

the installer today creates workers and masters across failure domains..

for workers we have a machineset in each failure domain, and we set replicas so that they are evenly distributed.

expecting that each machineset defines a failure domain is probably a good idea imo...

abhinavdahiya · 2020-04-06T20:22:34Z

enhancements/machine-api/control-plane.md

+	// Number of desired machines. Defaults to 1.
+	// Only odd numbers are permitted.
+	Replicas *int32 `json:"replicas,omitempty"`


How and where will the installer define the failure domains

abhinavdahiya · 2020-04-06T20:23:45Z

enhancements/machine-api/control-plane.md

+	EnableAutorepair bool `json:"enableautorepair,omitempty"`
+
+	// ProviderSpec details Provider-specific configuration to use during node creation.
+	ProviderSpec ProviderSpec `json:"providerSpec"`


how is the installer supposed to define things like placement availability zone, subnet etc.. as these are defined per machine object today.

+1.

What is providerSpec in this context?

My thoughts were rather than try to dynamically build machine's from templates and several data sources that are different per-provider, the installer can populate a list of templates for each supported AZ at creation time. Basically a machineset, but the controller chooses 1 of N templates rather than just the one.

abhinavdahiya · 2020-04-06T20:28:27Z

enhancements/machine-api/control-plane.md

+
+#### Story 2
+- As an operator running User Provider Infrastructure (UPI), I want to expose my non-machine API machines and offer them to the Control Plane controller so I can have the ability to resize the control plane in a declarative, automated and seamless manner.


the doc doesn't contain information of how this will be achieved.

enhancements/machine-api/control-plane.md

abhinavdahiya · 2020-04-06T20:34:19Z

enhancements/machine-api/control-plane.md

+During the cluster upgrade for the targeted release the Machine API Operator (MAO) will let the CVO to instantiate the new CRD `controlPlane` and it will run the backing controller making this functionallity opt-in for existing clusters. The user can create an instance of the new CRD if they choose to do so.
+
+New IPI clusters deployed after the targeted release will run the `controlPlane` instance deployed by the installer out of the box.


So it think this is going to become difficult to maintain as documentation because every document would have to say..

## if you installed the cluster before 4.x ## if you installer the cluster on or after 4.x

We should atleast make/provide a migration step as part of some binary to perform this migration so that most users can move to this.
Or I think we should try the migration for ALL platforms and warn users to perform manual actions to correctly setup as part of upgrade.

The current approach of install only will be difficult in the long run.

i'm open to adding this migration to installer binary if that suits the best

beekhof · 2020-04-07T00:00:22Z

enhancements/machine-api/control-plane.md

+### Goals
+
+- To have a declarative mechanism to manage the Control Plane as a single entity.
+- To support declarative safe horizontal scaling towards an odd number replicas for the control plane compute resources.


Having an even number of replicas is not inherently bad.
You don't achieve any additional failure tolerance by adding the last node, but maybe there is a performance benefit.
So perhaps it is not beneficial to enforce in code.

enhancements/machine-api/control-plane.md

beekhof · 2020-04-07T00:12:44Z

enhancements/machine-api/control-plane.md

+
+#### Declarative horizontal scaling
+##### Scale out
+1 - The controller always reconciles towards expected number of replicas. This must be an odd number.


I would argue that it shouldn't be. Odd is just an availability driven convention but there may be other reasons (capacity) to add an extra node (making the total even).

beekhof · 2020-04-07T00:15:36Z

enhancements/machine-api/control-plane.md

+3 - Compare with expected replicas number. If expected is higher than current then:
+4 - Check all owned machines have a backed ready node.
+5 - Check all Etcd members for all owned machines are healthy via Cluster Etcd Operator status signalling.
+6 - If (NOT all Etcd members are healthy OR NOT all owned machines have a backed ready node) then controller short circuits here, log, update status and requeue. Else:


With the assumption that MHC will eventually fix that situation?
Might be worth calling out

beekhof · 2020-04-07T00:16:33Z

enhancements/machine-api/control-plane.md

+7 - Choose a failure domain.
+8 - Create new Machine object with a templated spec. Go to 1.


I like the idea of using machinesets, but are they compatible with UPI?

beekhof · 2020-04-07T00:23:33Z

enhancements/machine-api/control-plane.md

+This is effectively a rolling upgrade with maxUnavailable 0 and maxSurge 1.
+
+#### Self healing (MHC + reconciling)
+1 - The Controller should always reconcile by removing Etcd members for voluntary Machine disruptions, i.e machine deletion.


I think etcd is a static pod and therefor immune to drains

enhancements/machine-api/control-plane.md

beekhof · 2020-04-07T00:32:43Z

enhancements/machine-api/control-plane.md

+2 - Fetch all existing control plane Machine resources by ownerRef. Adopt any other machine having a targeted label.
+3 - Compare with expected replicas number. If expected is higher than current then:
+4 - Check all owned machines have a backed ready node.
+5 - Check all Etcd members for all owned machines are healthy via Cluster Etcd Operator status signalling.


It would be worth calling out the scenario when etcd has lost quorum and all API calls start failing.

beekhof · 2020-04-07T00:35:42Z

enhancements/machine-api/control-plane.md

+	- Scale from 3 to 5.
+	- Scale from 5 to 3.


Are we assuming 1 to 3 is covered by existing installation tests?

enhancements/machine-api/control-plane.md

markmc · 2020-04-07T06:51:18Z

enhancements/machine-api/control-plane.md

+
+For clarity in this doc we set the definition for "Control Plane" as "The collection of stateless and stateful processes which enable a Kubernetes cluster to meet minimum operational requirements". This includes: kube-apiserver, kube-controller-manager, kube-scheduler, kubelet and Etcd.
+
+This proposal outlines a solution for declaratively managing as a single entity the compute resources that host the OCP Control Plane components. It introduces scaling and self-healing capabilities for this compute resources while honouring inviolable Etcd expectations and with out disrupting the lifecycle of Control plane components.


A little more specifics on the proposal here would avoid "burying the lede" - e.g. "proposes a ControlPlane CR and controller who's primary function is to manage the number of control plane machines"

Would that CR allow alerts and remediation as well? (assuming no for most critical cases which imply at best we have a RO etcd).

enhancements/machine-api/control-plane.md

markmc · 2020-04-07T08:07:36Z

enhancements/machine-api/control-plane.md

+## Alternatives
+
+- Same approach but different details:
+The Control Plane controller scaling logic could be built atop machineSets just like a deployment uses replica sets. The scenarios to be supported in this proposal are very specific. The flexibility that managing machineSets would provide is not needed. The Control Plane is critical and we want to favour control over flexibility here, therefore this proposes an ad-hoc controller logic to avoid unnecessary layers of complexity.


Aside from auto-scaling - which it seems isn't the goal of the proposal as it stands, although it makes sense to me to add it - what would be the main function of this controller if it was built on machinesets?

Say for scale-up, would this come down to choose-which-control-plane-machineset-to-scale-up and an (all Etcd members are healthy AND all owned machines have a backed ready node) safety wrapper?

enhancements/machine-api/control-plane.md

enhancements/machine-api/control-plane.md

dhellmann · 2020-04-08T14:55:34Z

enhancements/machine-api/control-plane.md

+	// This will autorepair faulty machine/nodes in scenarios where quorum would not be violated.
+	// e.g 1 unhealthy out of 3.
+	// Defaults to true.
+	EnableAutorepair bool `json:"enableautorepair,omitempty"`


Why would anyone ever set this to false?

Good point. I'm thinking of unpredictable environments/scenarios where you want manual troubleshooting and therefore opt-out might be useful.

Maybe the flag should be DisableAutorepair then so the default of false will give the expected desired behavior?

abhinavdahiya · 2020-04-09T02:48:53Z

enhancements/machine-api/control-plane.md

+
+- With only machineSet + MHC:
+	- New machines start to come up.
+	- As soon as the etcd API is notified of a new member( total 4, healthy 2) the cluster


I don't think anyone other than etcd-operator should be doing this decision making. If etcd-operator thinks adding a new node if cause failure, it shouldn't add those members.

This new controller should be doing this decision making imo.

abhinavdahiya · 2020-04-09T02:51:35Z

enhancements/machine-api/control-plane.md

+	- New machines start to come up.
+	- As soon as the etcd API is notified of a new member( total 4, healthy 2) the cluster
+	loses quorum until that new member starts and joins the cluster.
+	- There's no automation mechanism for Machines to spread evenly across failure domains.


why is control-plane the only one with this problem?
currently we have one machineset for each failure domain for compute in the cluster, autoscaling compute should also benefit from this failure domain into account.

For regular nodes autoscaling has the ability to ensure even spread via balance-similar-node-groups flag.

If we ever put something on top of machineSets or have a different complementary scalable resource abstraction it could include the ability to choose failure domains dynamically for manual scaling operations as a feature.

Since we are figuring out an scalable resource for control plane here we can try to have similar functionality to balance-similar-node-groups

abhinavdahiya · 2020-04-09T02:57:43Z

enhancements/machine-api/control-plane.md

+	- Machines start to get deleted.
+	- etcd peer membership is not removed. 
+	- etcd guard blocks on drain before losing quorum. etcd remains degraded.


since machine-api always drains, and etcd operator is the one that decides the disruption.
here:
pdb is 3 out of 5
one is already down.

machine-api is asked to go down from 3->5
one machine can actually go done.
the second machine cannot because of pdb making the drain fail.
the etcd is not degraded because it still have 3 up etcd members?

enhancements/machine-api/control-plane.md

romfreiman · 2020-04-12T05:55:51Z

enhancements/machine-api/control-plane.md

+5. Check all etcd members for all owned machines are healthy via Cluster etcd Operator status signalling.
+6. If (NOT all etcd members are healthy OR NOT all owned machines have a backed ready node) then controller short circuits here, log, update status and requeue. Else:
+7. Pick oldest machine in more populated failure domain out of a candidates to be deleted list (by default all owned Machines).
+8. Remove etcd member.


isnt there a 'remove-node' api? So the remove etcd-member will be done by the CEO rather than this controller?

scaling etcd is owned by cluster-etcd-operator.

…578738

michaelgugino

Some of the basic items like "What happens when I delete all the machines" aren't really called out here. Is there any kind of race condition possible that will attempt to remove too many etcd members too quickly?

There's a lot of functionality that needs to be defined still. My preference is to create this object without scalability first, and after it matures, add scalability later if appropriate.

Some things I think we should add to this particular proposal are user stories for monitoring, metrics, and alerting; UX process for replacing a control plane machine; UX for vertically sizing existing machines.

michaelgugino · 2020-04-14T17:34:11Z

enhancements/machine-api/control-plane.md

+### User Stories [optional]
+
+#### Story 1
+- As an operator running Installer Provider Infrastructure (IPI), I want flexibility to run [large or small clusters](https://kubernetes.io/docs/setup/best-practices/cluster-large/#size-of-master-and-master-components) so I need to have the ability to resize the control plane in a declarative, automated and seamless manner.


Resizing doesn't make a lot of sense out of the box. Often times, there are AZ's we're not utilizing for the CP. Should we attempt to utilize all AZ's or just existing AZ's? Do we support resizing to 1?

michaelgugino · 2020-04-14T17:35:24Z

enhancements/machine-api/control-plane.md

+- As an operator running User Provider Infrastructure (UPI), I want to expose my non-machine API machines and offer them to the Control Plane controller so I can have the ability to resize the control plane in a declarative, automated and seamless manner.
+
+#### Story 3
+- As an operator of an OCP Dedicated Managed Platform, I want to give users flexibility to add as many workers nodes as they want or to enable autoscaling on worker nodes so I need to have ability to resize the control plane instances in a declarative, automated and seamless manner to react quickly to cluster growth.


Does this actually align with the long term needs of Dedicated? Is there an RFE or Bug to support this story?

michaelgugino · 2020-04-14T17:38:26Z

enhancements/machine-api/control-plane.md

+#### Declarative horizontal scaling
+##### Scale out
+1. The controller always reconciles towards expected number of replicas. Validation enforce this to be an odd number.
+2. Fetch all existing control plane Machine resources by ownerRef. Adopt any other machine having a targeted label e.g `node-role.kubernetes.io/master`.


What happens if we misconfigure targeted labels or users add labels to workers?

michaelgugino · 2020-04-14T17:42:24Z

enhancements/machine-api/control-plane.md

+3. Compare with expected replicas number. If expected is higher than current then:
+4. Check all owned machines have a backed ready node.
+5. Check all etcd members for all owned machines are healthy via Cluster etcd Operator status signalling.
+6. If (NOT all etcd members are healthy OR NOT all owned machines have a backed ready node) then controller short circuits here, log, update status and requeue. Else:


What happens when we're scaling from 3 to 5, 4 comes up and 5 doesn't. Are we relying on the MHC to kick in at some point? Any alerting for a specific set of conditions here?

michaelgugino · 2020-04-14T17:48:18Z

enhancements/machine-api/control-plane.md

+##### Scale out
+1. The controller always reconciles towards expected number of replicas. Validation enforce this to be an odd number.
+2. Fetch all existing control plane Machine resources by ownerRef. Adopt any other machine having a targeted label e.g `node-role.kubernetes.io/master`.
+3. Compare with expected replicas number. If expected is higher than current then:


What are we considering 'replicas' in this context? Machines?

michaelgugino · 2020-04-14T17:49:25Z

enhancements/machine-api/control-plane.md

+7. Trigger scale out workflow (starting in 4). Set "replaced" annotation. Requeue.
+This is effectively a rolling upgrade with maxUnavailable 0 and maxSurge 1.
+
+#### Self healing (MHC + reconciling)


Is MHC always mandatory if we introduce this component?

michaelgugino · 2020-04-14T17:54:43Z

enhancements/machine-api/control-plane.md

+This is effectively a rolling upgrade with maxUnavailable 0 and maxSurge 1.
+
+#### Self healing (MHC + reconciling)
+1. The Controller should always reconcile by removing etcd members for voluntary Machine disruptions, i.e machine deletion.


What if a user deletes the machine intentionally? What is this controller supposed to do?

For MachineSets, a new machine is created instantly, and as soon as a machine is drained, it's deleted.

This behavior is undesirable for control planes. If you are deleting a machine, you probably want the new machine to come online and join the cluster before actually deleting the instance in the cloud provider (for worst-case DR scenarios).

We need to work out the particular steps for the scenario of a machine has been deleted, and then the control-plane is scaled.

To this end, how do we ensure that a user gets a replacement machine in the same AZ as the machine being deleted?

michaelgugino · 2020-04-14T18:02:25Z

enhancements/machine-api/control-plane.md

+	EnableAutorepair bool `json:"enableautorepair,omitempty"`
+
+	// ProviderSpec details Provider-specific configuration to use during node creation.
+	ProviderSpec ProviderSpec `json:"providerSpec"`


+1.

What is providerSpec in this context?

My thoughts were rather than try to dynamically build machine's from templates and several data sources that are different per-provider, the installer can populate a list of templates for each supported AZ at creation time. Basically a machineset, but the controller chooses 1 of N templates rather than just the one.

michaelgugino · 2020-04-14T18:03:23Z

enhancements/machine-api/control-plane.md

+}
+
+// ProviderSpec defines the configuration to use during node creation.
+type ProviderSpec struct {


Are we declaring this to not be reconciled similar to MachineSets?

hexfusion · 2020-04-14T19:39:34Z

enhancements/machine-api/control-plane.md

+This proposes an ad-hoc CRD and controller for declaratively managing as a single entity the compute resources that host the OCP Control Plane components.
+
+This controller differs from a regular machineSet in that: 
+- It ensures that scaling operations are non disruptive for etcd:


so this would replace the need for etcd quorum-guard to enforce PDB or will this still be required?

hexfusion · 2020-04-14T19:45:10Z

enhancements/machine-api/control-plane.md

+This controller differs from a regular machineSet in that: 
+- It ensures that scaling operations are non disruptive for etcd:
+	- It scales one resource at a time.
+	- It let scaling operations proceed only when all etcd members are healthy and all the owned machines have a backed ready node.


I think we can let cluster-etcd-operator conclude when scaling etcd is appropriate. Meaning if we have 3 node cluster and are going to replace node N. cluster-etcd-operator will watch for changes to your $resource and the scale up or down based on those observations. You can't roll the next node to upgrade until etcd comes back up as per PDB. We already watch Nodes so how will this improve our observability of change?

hexfusion · 2020-04-14T19:47:29Z

enhancements/machine-api/control-plane.md

+	- It scales one resource at a time.
+	- It let scaling operations proceed only when all etcd members are healthy and all the owned machines have a backed ready node.
+- It ensure even spread of compute resources across failure domains.
+- It removes etcd membership for voluntary machine disruptions (Question: can rather etcd operator somehow handle this?).


yes cluster-etcd-operator owns etcd scaling

hexfusion · 2020-04-14T19:50:09Z

enhancements/machine-api/control-plane.md

+5. Check all etcd members for all owned machines are healthy via Cluster etcd Operator status signalling.
+6. If (NOT all etcd members are healthy OR NOT all owned machines have a backed ready node) then controller short circuits here, log, update status and requeue. Else:
+7. Pick oldest machine in more populated failure domain out of a candidates to be deleted list (by default all owned Machines).
+8. Remove etcd member.


scaling etcd is owned by cluster-etcd-operator.

hexfusion · 2020-04-14T19:57:51Z

enhancements/machine-api/control-plane.md

+
+### Risks and Mitigations
+
+During horizontal scaling operations there are sensitive scenarios like scaling from 1 to 2. As soon as the etcd API is notified of the new member the cluster loses quorum until that new member starts and joins the cluster. This must be still handled by the [Cluster etcd Operator](https://github.com/openshift/enhancements/blob/master/enhancements/etcd/cluster-etcd-operator.md#motivation) while the Control Plane controller should honour and short-circuit when it meets the etcd unhealthiness criteria as described in the workflows above.


Currently scaling from 1 to 2 only happens during bootstrap or disaster recovery. Can you outline a scenario where we would go from 1 to 2 nodes?

My guess is scaling from 1 to 3 entails a stop, however brief, at 2.

enxebre · 2020-04-15T11:22:21Z

Based on etcd-team feeback it should be safe to add/remove(coming soon) new control plane compute capacity honouring PDB and have exclusively cluster-etcd-operator to handle the etcd operational aspects orthogonally. I'll update the proposal to drop the etcd pet logic and possibly build atop machineSets

enxebre · 2020-04-23T10:36:22Z

Based on etcd-team feeback it should be safe to add/remove(coming soon) new control plane compute capacity honouring PDB and have exclusively cluster-etcd-operator to handle the etcd operational aspects orthogonally. I'll update the proposal to drop the etcd pet logic and possibly build atop machineSets

I revamped this proposal with a new PR proposing just adding machineSets and MHC for masters while etcd operator should handle all its operational aspects underneath gracefully. #292

openshift-ci-robot requested review from knobunc and sttts April 3, 2020 07:34

openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Apr 3, 2020

openshift-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Apr 3, 2020

enxebre force-pushed the control-plane branch 4 times, most recently from 90cfeff to 760416d Compare April 3, 2020 11:11

dgoodwin reviewed Apr 3, 2020

View reviewed changes

Add Declarative Single Entity Control Plane proposal

e497225

enxebre force-pushed the control-plane branch from 760416d to e497225 Compare April 3, 2020 13:38

JoelSpeed reviewed Apr 6, 2020

View reviewed changes

deads2k reviewed Apr 6, 2020

View reviewed changes

abhinavdahiya reviewed Apr 6, 2020

View reviewed changes

enhancements/machine-api/control-plane.md Show resolved Hide resolved

abhinavdahiya reviewed Apr 6, 2020

View reviewed changes

enhancements/machine-api/control-plane.md Outdated Show resolved Hide resolved

abhinavdahiya reviewed Apr 6, 2020

View reviewed changes

beekhof reviewed Apr 7, 2020

View reviewed changes

markmc reviewed Apr 7, 2020

View reviewed changes

crawford reviewed Apr 7, 2020

View reviewed changes

enhancements/machine-api/control-plane.md Outdated Show resolved Hide resolved

enhancements/machine-api/control-plane.md Outdated Show resolved Hide resolved

enhancements/machine-api/control-plane.md Show resolved Hide resolved

enxebre added 4 commits April 8, 2020 11:48

Set etcd to lowercase

195a653

Retitle to Managing Control Plane machines

c076156

Rephrase to address https://github.com/openshift/enhancements/pull/27…

360424a

…8/files#r404137803

Fix https://github.com/openshift/enhancements/pull/278/files#r404141304

8d45444

enxebre added 4 commits April 8, 2020 12:48

Addresss https://github.com/openshift/enhancements/pull/278/files#r40…

b536955

…4157100

Address https://github.com/openshift/enhancements/pull/278/files#r404…

da049ab

…577804

Fix header format

6a79335

Address https://github.com/openshift/enhancements/pull/278/files#r405…

6aec874

…006367

enxebre force-pushed the control-plane branch 2 times, most recently from 0aca4f1 to e8b7321 Compare April 8, 2020 12:26

Address https://github.com/openshift/enhancements/pull/278/files#r405…

5667cdb

…007552

enxebre force-pushed the control-plane branch from e8b7321 to 5667cdb Compare April 8, 2020 12:27

Fix header

1759954

dhellmann reviewed Apr 8, 2020

View reviewed changes

enxebre added 2 commits April 8, 2020 19:08

More clarifications

dc9f296

Add API GVR

13d2fc2

enxebre force-pushed the control-plane branch from 8b02d86 to 13d2fc2 Compare April 8, 2020 17:11

abhinavdahiya reviewed Apr 9, 2020

View reviewed changes

romfreiman reviewed Apr 12, 2020

View reviewed changes

enhancements/machine-api/control-plane.md Show resolved Hide resolved

romfreiman reviewed Apr 12, 2020

View reviewed changes

enhancements/machine-api/control-plane.md Show resolved Hide resolved

romfreiman reviewed Apr 12, 2020

View reviewed changes

enxebre added 5 commits April 13, 2020 13:21

Address openshift#278 (comment)

3b38843

Address openshift#278 (comment)

df57c2f

Fix typos

2d7021c

Address https://github.com/openshift/enhancements/pull/278/files#r404…

d3bb762

…578738

Address openshift#278 (comment)

0fa380b

michaelgugino suggested changes Apr 14, 2020

View reviewed changes

hexfusion reviewed Apr 14, 2020

View reviewed changes

enxebre mentioned this pull request Apr 23, 2020

Add Managing Control Plane machines proposal #292

Closed

enxebre closed this Apr 23, 2020

	- To support declarative safe horizontal scaling towards an odd number replicas for the control plane compute resources.
	- To support declarative safe horizontal scaling towards an odd number of replicas for the control plane compute resources.

		7 - Choose a failure domain.
		8 - Create new Machine object with a templated spec. Go to 1.


		#### Story 2
		- As an operator running User Provider Infrastructure (UPI), I want to expose my non-machine API machines and offer them to the Control Plane controller so I can have the ability to resize the control plane in a declarative, automated and seamless manner.

		During the cluster upgrade for the targeted release the Machine API Operator (MAO) will let the CVO to instantiate the new CRD `controlPlane` and it will run the backing controller making this functionallity opt-in for existing clusters. The user can create an instance of the new CRD if they choose to do so.

		New IPI clusters deployed after the targeted release will run the `controlPlane` instance deployed by the installer out of the box.


		For clarity in this doc we set the definition for "Control Plane" as "The collection of stateless and stateful processes which enable a Kubernetes cluster to meet minimum operational requirements". This includes: kube-apiserver, kube-controller-manager, kube-scheduler, kubelet and Etcd.

		This proposal outlines a solution for declaratively managing as a single entity the compute resources that host the OCP Control Plane components. It introduces scaling and self-healing capabilities for this compute resources while honouring inviolable Etcd expectations and with out disrupting the lifecycle of Control plane components.


		### Risks and Mitigations

		During horizontal scaling operations there are sensitive scenarios like scaling from 1 to 2. As soon as the etcd API is notified of the new member the cluster loses quorum until that new member starts and joins the cluster. This must be still handled by the [Cluster etcd Operator](https://github.com/openshift/enhancements/blob/master/enhancements/etcd/cluster-etcd-operator.md#motivation) while the Control Plane controller should honour and short-circuit when it meets the etcd unhealthiness criteria as described in the workflows above.

Add Declarative Single Entity Control Plane proposal #278

Add Declarative Single Entity Control Plane proposal #278

Conversation

enxebre commented Apr 3, 2020

openshift-ci-robot commented Apr 3, 2020

enxebre commented Apr 3, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jim-minter commented Apr 3, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

beekhof Apr 7, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

enxebre Apr 8, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

enxebre Apr 8, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

enxebre Apr 13, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

michaelgugino left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

enxebre commented Apr 3, 2020 •

edited

Loading

beekhof Apr 7, 2020 •

edited

Loading

enxebre Apr 8, 2020 •

edited

Loading

enxebre Apr 8, 2020 •

edited

Loading

enxebre Apr 13, 2020 •

edited

Loading