Upscaling of coredns pods leads to DNS timeout errors #113080

sli720 · 2022-10-15T10:16:57Z

What happened?

Since upgrading to CentOS 9 and Kubernetes 1.24.6 (on the same hardware), sporadic DNS resolution errors occur when too many coredns pods are running at the same time. If I reduce them to <= 3, no more errors occur. Once the error occurs, you see lines in the logs of nodelocaldns pods like:

[ERROR] plugin/errors: 2 git-cache.ci.svc.cluster.local. A: select tcp 10.233.0.3:53: i/o timeout

It looks like the nodelocaldns pod sometimes can't contact the coredns pod for some reason. There are no errors seen in the logs for the coredns pods, but also for the calico pods. It also occurs with low load (CPU, network, disk) on the cluster. Could this be a bug in nodelocaldns, coredns or a wrong configuration of the /etc/resolv.conf? It is strange that it disappears when I reduce the number of coredns pods.

What did you expect to happen?

nodelocaldns pods can always contact coredns

How can we reproduce it (as minimally and precisely as possible)?

Run the nslookup command many times. Sometimes it fails, sometimes not.

❯ kubectl exec -i -t dnsutils -- nslookup nexus-service.ci.svc.cluster.local
Server:		169.254.25.10
Address:	169.254.25.10#53

** server can't find nexus-service.ci.svc.cluster.local.default.svc.cluster.local: SERVFAIL

command terminated with exit code 1

❯ kubectl exec -i -t dnsutils -- nslookup nexus-service.ci.svc.cluster.local
Server:		169.254.25.10
Address:	169.254.25.10#53

Name:	nexus-service.ci.svc.cluster.local
Address: 10.233.13.178

Anything else we need to know?

Here is the resolv.conf of the pod used to test:

nameserver 169.254.25.10
search default.svc.cluster.local svc.cluster.local cluster.local
options ndots:5

Kubernetes version

1.24.6

Cloud provider

on-premise (kubespray see kubernetes-sigs/kubespray#9328)

OS version

CentOS Stream 9 (Kernel Linux 5.14.0-160.el9.x86_64 x86_64)

Install tools

ansible through kubespray

Container runtime (CRI) and version (if applicable)

containerd 1.6.8 (also tested with docker 20.10)

Related plugins (CNI, CSI, ...) and versions (if applicable)

calico

The text was updated successfully, but these errors were encountered:

k8s-ci-robot · 2022-10-15T10:17:05Z

@sli720: This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

sli720 · 2022-10-15T10:20:46Z

/sig network

chrisohaver · 2022-10-17T12:50:55Z

Perhaps the issue is with networking to a particular node, and increasing the number of instances > 3 results in a coredns pod running on a problematic node?

sli720 · 2022-10-19T22:31:40Z

I've tried it out on specific nodes and I don't see a relation to the hardware. Everytime I increase the number of pods I get this issue.I also don't see any issues in the calico or system/kernel logs. Can I somehow debug if that issue is related to a specific node?

chrisohaver · 2022-10-20T14:34:09Z

Can I somehow debug if that issue is related to a specific node?

Perhaps try issuing TCP queries to each individual CoreDNS pod IP directly. Do they all exhibit the same degree of sporadic timeout? Or some more so than others.

Note: The forward timeout is 2 seconds in nodelocal/coredns.

khenidak · 2022-10-21T16:56:10Z

How healthy are your kube-proxies (specifically on the nodes that hosts the pod that can't read dns)?

sli720 · 2022-10-22T13:29:07Z

Perhaps try issuing TCP queries to each individual CoreDNS pod IP directly. Do they all exhibit the same degree of sporadic timeout? Or some more so than others.

I did a fast nslookup loop against all coredns pods from different hosts and it always resolved successfully. Only when using nodelocaldns between it sometimes fails on always different hosts.

sli720 · 2022-10-22T13:33:10Z

How healthy are your kube-proxies (specifically on the nodes that hosts the pod that can't read DNS)?

They never crashed so far if you mean that and in the logs there are no errors.

thockin · 2022-12-08T01:11:28Z

Do we have any updates here?

jaswanthikolla · 2023-04-23T16:15:42Z

My hypothesis on why it's happenning:

LocalCoreDNS uses CoreDNS kube-dns Cluster IP for upstream cluster.local DNS resolution. ClusterIP uses IP Table's DNAT which is subject to Conntrack race condition issues, and More number of CoreDNS pod means more number of IP Tables rules/endpoints.

So, if simultaneous connections are made ( within 2 ns) and there are multiple rules with multiple endpoints, the packets can be sent to wrong pod/node ( see race#3) . So, The probability of the sending the packet to correct pod/node decreases with more number of coredns pods.

Also, There are others who faced this issue.

aojea · 2023-04-23T21:01:12Z

on-premise (kubespray

there are no errors seen in the logs for the coredns pods, but also for the calico pods

many moving parts here ;)

sli720 · 2023-04-23T23:27:07Z

I've disabled nodelocaldns completely and scaled up the coredns pods again. No problems anymore.

jaswanthikolla · 2023-04-24T14:06:00Z

I've disabled nodelocaldns completely and scaled up the coredns pods again. No problems anymore.

It's possible that you don't have visibility into DNS errors anymore, Did you validate that? Earlier, LocalCoreDNS was central place and it's logging the error. It's interesting how that fixes the error.

chrisohaver · 2023-04-24T14:19:39Z

My hypothesis on why it's happenning:

LocalCoreDNS uses CoreDNS kube-dns Cluster IP for upstream cluster.local DNS resolution. ClusterIP uses IP Table's DNAT which is subject to Conntrack race condition issues, and More number of CoreDNS pod means more number of IP Tables rules/endpoints.

Nodelocaldns instances use TCP to forward DNS requests to the Cluster IP DNS, which would mitigate the conntrack issue - requests getting resent when sender does not get ACKs.

jaswanthikolla · 2023-04-24T21:21:22Z

which would mitigate the conntrack issue -

Yes. One case it will fail is if SYN-ACK is lost then as per this doc, it would take at least 3 seconds which is much more than CoreDNS timeout. I wonder what's the impact of that on other requests, which I asked as separate question here.

tanvp112 · 2023-04-26T03:52:39Z

which would mitigate the conntrack issue -

Yes. One case it will fail is if SYN-ACK is lost then as per this doc, it would take at least 3 seconds which is much more than CoreDNS timeout. I wonder what's the impact of that on other requests, which I asked as separate question here.

hi @jaswanthikolla , I was trying to re-produce the SYN-ACK issue, but encounter an issue like nodelocaldns always go back to coredns for name resolution. I tested by running a 1s loop doing nslookup kubernetes.default.svc.cluster.local on a node with nodelocaldns running, all goes well but as soon as I scale CoreDNS to zero the nslookup will fail immediately and nodelocaldns log will report connection refused to CoreDNS. I thought there's a 5s TTL set by CoreDNS and suppose nodelocaldns should not fail immediately and response using the cached record? Noticed once I scale CoreDNS up the resolution goes back to normal. This looks like nodelocaldns didn't really cache any result to reduce call to CoreDNS... I was using the stock nodelocaldns.yaml, beside the 3 standard environment variables to change, nothing else is changed.

sli720 added the kind/bug Categorizes issue or PR as related to a bug. label Oct 15, 2022

k8s-ci-robot added needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Oct 15, 2022

k8s-ci-robot added sig/network Categorizes an issue or PR as relevant to SIG Network. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Oct 15, 2022

sli720 changed the title ~~Sometimes namespace.svc.cluster.local is appended twice in DNS requests~~ Upscaling of coredns pods leads to DNS timeout errors Oct 16, 2022

thockin closed this as completed Dec 21, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Upscaling of coredns pods leads to DNS timeout errors #113080

Upscaling of coredns pods leads to DNS timeout errors #113080

sli720 commented Oct 15, 2022 •

edited

k8s-ci-robot commented Oct 15, 2022

sli720 commented Oct 15, 2022

chrisohaver commented Oct 17, 2022

sli720 commented Oct 19, 2022

chrisohaver commented Oct 20, 2022 •

edited

khenidak commented Oct 21, 2022

sli720 commented Oct 22, 2022

sli720 commented Oct 22, 2022

thockin commented Dec 8, 2022

jaswanthikolla commented Apr 23, 2023 •

edited

aojea commented Apr 23, 2023

sli720 commented Apr 23, 2023

jaswanthikolla commented Apr 24, 2023

chrisohaver commented Apr 24, 2023

jaswanthikolla commented Apr 24, 2023 •

edited

tanvp112 commented Apr 26, 2023 •

edited

Upscaling of coredns pods leads to DNS timeout errors #113080

Upscaling of coredns pods leads to DNS timeout errors #113080

Comments

sli720 commented Oct 15, 2022 • edited

What happened?

What did you expect to happen?

How can we reproduce it (as minimally and precisely as possible)?

Anything else we need to know?

Kubernetes version

Cloud provider

OS version

Install tools

Container runtime (CRI) and version (if applicable)

Related plugins (CNI, CSI, ...) and versions (if applicable)

k8s-ci-robot commented Oct 15, 2022

sli720 commented Oct 15, 2022

chrisohaver commented Oct 17, 2022

sli720 commented Oct 19, 2022

chrisohaver commented Oct 20, 2022 • edited

khenidak commented Oct 21, 2022

sli720 commented Oct 22, 2022

sli720 commented Oct 22, 2022

thockin commented Dec 8, 2022

jaswanthikolla commented Apr 23, 2023 • edited

aojea commented Apr 23, 2023

sli720 commented Apr 23, 2023

jaswanthikolla commented Apr 24, 2023

chrisohaver commented Apr 24, 2023

jaswanthikolla commented Apr 24, 2023 • edited

tanvp112 commented Apr 26, 2023 • edited

sli720 commented Oct 15, 2022 •

edited

chrisohaver commented Oct 20, 2022 •

edited

jaswanthikolla commented Apr 23, 2023 •

edited

jaswanthikolla commented Apr 24, 2023 •

edited

tanvp112 commented Apr 26, 2023 •

edited