Drain not being performed for KCP machines with K8s v1.31.x #11138

fabriziopandini · 2024-09-05T08:11:27Z

What steps did you take and what happened?

This issue was detected while triaging E2E failures on #11127

What did you expect to happen?

When KCP deletes a machine (due to remediation or scale down) this is what happens:

KCP - if the deleting machine is the etcd leader, KCP forwards etcd leadership to another etcd member
KCP - delete etcd member
KCP - delete the machine
Machine controller - takes over and start draining the Node, waiting for volumes to be detached, delete the VM, deleting the node

However, special consideration applies for KCP machines with K8s v1.31.x / with kubelet talking to the local API server pod (for context, see below in the issue).

When kubelet is talking to the local API server pod, right after step 2 of the sequence above, the entire local control plane on the machine starts failing, and thus also kubelet starts to fail (it cannot react to new data from the local apiserver, because the apiserver is down).

This prevents drain at step 4 to complete properly, because the kubelet on the Node doesn't see the deletionTimstamps added to the Pods.

Why we did not catch this before?

Even if draining was not working well, machine deletion would ultimately complete, and thus our tests were passing.

This is because after some time K8s would start to consider the node unreachable; and when this happens node.kubernetes.io/unreachable:NoExecute taint is applied; one of the side effect of this taint is that pods are deleted immediately (see https://kubernetes.io/docs/concepts/scheduling-eviction/taint-and-toleration/#concepts).

Also, at a certain point Machine controller would detect the node being unreachable and go through a simplified deletion workflow.

So, node unreachable & simplified deletion workflow were hiding the issue in our CI.

Why we detected this now?

#11127, introduced a sophisticated drain test that surfaced this issue.
More specifically, while checking if a PDB blocks drain (as expected), we identified the issue of Machine deletion going through without actually draining pods.

Cluster API version

>= 1.8.0

Kubernetes version

>= 1.31.0

Anything else you would like to add?

When creating CP machines with K8s v1.31.x KCP is forcing kubeadm to use the ControlPlaneKubeletLocalMode feature gate (see #10947, kubernetes/kubernetes#125582).

With this feature gate on, on kubelet on CP nodes is talking to the local API server pod instead of with the control plane end point (which load balances traffic to all the API server instance).

Talking to the local API server pod is required to prevent K8s v1.31.x kubelet to talk to v1.30.x API servers during upgrades, because this is against the version skew policies, and even if this worked for a long time, it started failing when v1.31.x kubelet started using field selectors for spec.clusterIP, which are available only in API server v1.31.x (see #10947 for the full explanation).

Label(s) to be applied

/kind bug
One or more /area label. See https://github.com/kubernetes-sigs/cluster-api/labels?q=area for the list of labels.

sbueringer · 2024-09-05T08:11:54Z

/triage accepted

sbueringer · 2024-09-05T14:35:00Z

/assign

k8s-ci-robot added kind/bug Categorizes issue or PR as related to a bug. needs-priority Indicates an issue lacks a `priority/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Sep 5, 2024

sbueringer added priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. area/provider/control-plane-kubeadm Issues or PRs related to KCP labels Sep 5, 2024

k8s-ci-robot removed the needs-priority Indicates an issue lacks a `priority/foo` label and requires one. label Sep 5, 2024

k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Sep 5, 2024

sbueringer mentioned this issue Sep 5, 2024

🐛 KCP: remove etcd member in pre-terminate hook #11137

Merged

fabriziopandini added this to the v1.9 milestone Sep 5, 2024

k8s-ci-robot assigned sbueringer Sep 5, 2024

k8s-ci-robot closed this as completed in #11137 Sep 5, 2024

BrewTestBot mentioned this issue Sep 5, 2024

clusterctl 1.8.2 Homebrew/homebrew-core#183604

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Drain not being performed for KCP machines with K8s v1.31.x #11138

Drain not being performed for KCP machines with K8s v1.31.x #11138

fabriziopandini commented Sep 5, 2024 •

edited by sbueringer

Loading

sbueringer commented Sep 5, 2024

sbueringer commented Sep 5, 2024

Drain not being performed for KCP machines with K8s v1.31.x #11138

Drain not being performed for KCP machines with K8s v1.31.x #11138

Comments

fabriziopandini commented Sep 5, 2024 • edited by sbueringer Loading

What steps did you take and what happened?

What did you expect to happen?

Cluster API version

Kubernetes version

Anything else you would like to add?

Label(s) to be applied

sbueringer commented Sep 5, 2024

sbueringer commented Sep 5, 2024

fabriziopandini commented Sep 5, 2024 •

edited by sbueringer

Loading