-
Notifications
You must be signed in to change notification settings - Fork 6.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
nodelocaldns resolution issues #9328
Comments
Could one of these issues be related? kubernetes/dns#387 Is it possible to change nodelocaldns to only use UDP instead of TCP through kubespray? Or have the coredns pods to be scaled down for some reason? Or do I have to set single-request-reopen in the resolv.conf? etc. |
I've tried out that but it did not help
It seems only related to old alpine containers which is not the case here
If I scale down the coredns pods to only one there are no issues. I'm already running an older k8s cluster with multiple coredns pods without any issues under CentOS 7. Could this problem may be related to CentOS 9 (e.g. Kernel)? |
Could you execute the command If that's not, this procedure may help you. #9160 |
Yes they are all running. Coredns pods show no errors but nodelocaldns pods sometimes show the error:
When I scale down to 1-3 pods there are no issues but when using the DNS autoscaler (which scales up to 7 pods) the issues occurr. |
Interesting. In my last tests I found out that sometimes .default.svc.cluster.local is appended to the request and sometimes not. Could this be a bug in nodelocaldns, coredns or a wrong configuration of the /etc/resolv.conf?
Here is the resolv.conf of the pod used to test:
It is strange that it disappears when I reduce the number of coredns pods. Any idea why? |
I've opened an issue in the kubernetes project so I think this one can be closed |
For people who encounter this and got this issue page from Google: |
Environment:
Cloud provider or hardware configuration:
Self-hosted (10 Hosts)
OS (
printf "$(uname -srm)\n$(cat /etc/os-release)\n"
):CentOS Stream 9 (Kernel Linux 5.14.0-160.el9.x86_64 x86_64)
Version of Ansible (
ansible --version
):2.12.5
Version of Python (
python --version
):3.9.13
Kubespray version (commit) (
git rev-parse --short HEAD
):6dff393
Network plugin used:
calico
Version of Kubernetes:
1.24.6
The DNS cache is not always resolving internal DNS names e.g.
And just another try and it works again
I've tried it also out directly on all hosts through:
nslookup jenkins-operator-http-testing.ci.svc.cluster.local 169.254.25.10
and it sometimes fails on every host.
It is not related to the name I'm trying to resolve. Also happens with other names. The only thing I see in the nodelocaldns logs is that there is sometimes the following error that it can not connect to coredns:
[ERROR] plugin/errors: 2 jenkins-operator-http-testing.ci.svc.cluster.local. A: dial tcp 10.233.0.3:53: i/o timeout
But if I run the nslookup against coredns IP directly there is no issue. There are not issues in the coredns or calico pod logs. It is also strange that the nodelocaldns cache forgets within a few seconds and asks coredns again.
I'm not sure where to ask, therefore I've created the ticket first here.
The text was updated successfully, but these errors were encountered: