Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Drain not being performed for KCP machines with K8s v1.31.x #11138

Closed
fabriziopandini opened this issue Sep 5, 2024 · 2 comments · Fixed by #11137
Closed

Drain not being performed for KCP machines with K8s v1.31.x #11138

fabriziopandini opened this issue Sep 5, 2024 · 2 comments · Fixed by #11137
Assignees
Labels
area/provider/control-plane-kubeadm Issues or PRs related to KCP kind/bug Categorizes issue or PR as related to a bug. priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Milestone

Comments

@fabriziopandini
Copy link
Member

fabriziopandini commented Sep 5, 2024

What steps did you take and what happened?

This issue was detected while triaging E2E failures on #11127

What did you expect to happen?

When KCP deletes a machine (due to remediation or scale down) this is what happens:

  1. KCP - if the deleting machine is the etcd leader, KCP forwards etcd leadership to another etcd member
  2. KCP - delete etcd member
  3. KCP - delete the machine
  4. Machine controller - takes over and start draining the Node, waiting for volumes to be detached, delete the VM, deleting the node

However, special consideration applies for KCP machines with K8s v1.31.x / with kubelet talking to the local API server pod (for context, see below in the issue).

When kubelet is talking to the local API server pod, right after step 2 of the sequence above, the entire local control plane on the machine starts failing, and thus also kubelet starts to fail (it cannot react to new data from the local apiserver, because the apiserver is down).

This prevents drain at step 4 to complete properly, because the kubelet on the Node doesn't see the deletionTimstamps added to the Pods.

Why we did not catch this before?

Even if draining was not working well, machine deletion would ultimately complete, and thus our tests were passing.

This is because after some time K8s would start to consider the node unreachable; and when this happens node.kubernetes.io/unreachable:NoExecute taint is applied; one of the side effect of this taint is that pods are deleted immediately (see https://kubernetes.io/docs/concepts/scheduling-eviction/taint-and-toleration/#concepts).

Also, at a certain point Machine controller would detect the node being unreachable and go through a simplified deletion workflow.

So, node unreachable & simplified deletion workflow were hiding the issue in our CI.

Why we detected this now?

#11127, introduced a sophisticated drain test that surfaced this issue.
More specifically, while checking if a PDB blocks drain (as expected), we identified the issue of Machine deletion going through without actually draining pods.

Cluster API version

>= 1.8.0

Kubernetes version

>= 1.31.0

Anything else you would like to add?

When creating CP machines with K8s v1.31.x KCP is forcing kubeadm to use the ControlPlaneKubeletLocalMode feature gate (see #10947, kubernetes/kubernetes#125582).

With this feature gate on, on kubelet on CP nodes is talking to the local API server pod instead of with the control plane end point (which load balances traffic to all the API server instance).

Talking to the local API server pod is required to prevent K8s v1.31.x kubelet to talk to v1.30.x API servers during upgrades, because this is against the version skew policies, and even if this worked for a long time, it started failing when v1.31.x kubelet started using field selectors for spec.clusterIP, which are available only in API server v1.31.x (see #10947 for the full explanation).

Label(s) to be applied

/kind bug
One or more /area label. See https://github.com/kubernetes-sigs/cluster-api/labels?q=area for the list of labels.

@k8s-ci-robot k8s-ci-robot added kind/bug Categorizes issue or PR as related to a bug. needs-priority Indicates an issue lacks a `priority/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Sep 5, 2024
@sbueringer sbueringer added priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. area/provider/control-plane-kubeadm Issues or PRs related to KCP labels Sep 5, 2024
@k8s-ci-robot k8s-ci-robot removed the needs-priority Indicates an issue lacks a `priority/foo` label and requires one. label Sep 5, 2024
@sbueringer
Copy link
Member

/triage accepted

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Sep 5, 2024
@fabriziopandini fabriziopandini added this to the v1.9 milestone Sep 5, 2024
@sbueringer
Copy link
Member

/assign

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/provider/control-plane-kubeadm Issues or PRs related to KCP kind/bug Categorizes issue or PR as related to a bug. priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants