-
Notifications
You must be signed in to change notification settings - Fork 456
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DNS intermittent delays of 1/2s with nodelocaldns #387
Comments
Any logs in node-local-dns pods? There will be error messages if node-local-dns does not get a response back from coreDNS upstream within 2s. |
No errors in node-local-dns logs - I've set them to log everything.
This forwards me to the though, that maybe it is not DNS related at all (basically they are all redis calls from predis PHP library) I'm going to take another round of testing tomorrow to confirm DNS related problem and will close this issue if not. |
maybe the app (jaeger) was using some invalid FQDN in the DNS query auto-completed by underlying C runtime library (libc), or due to some incorrect DNS settings, |
I'm going to dig into tcpdump, but this will take time, however I was able to make a simple test - change DNS domain name of the redis service to the concrete IP address in service configs - this removed any delays completely. |
We have found the cause of delays for our setup and it is DNS (as always) 😄 I'm closing the issue as node-local-dns setup works as expected here, however I have a few questions and I'll be grateful if someone can clarify them:
|
I am not sure this will fix the issue. How do requests from pods reach AWS resolver(10.2.02) today? Has it been configured as a stubDomain? Replacing the upstream server will not disable searchpath expansion, not sure if that's the intent.
You can use the prefetch option for this - https://coredns.io/plugins/cache/
I believe it would open separate requests for it, since upon cache miss, control goes to the next plugin, which I think will be "forward" in this case. https://github.com/coredns/coredns/blob/614d08cba29ed4904d11008e795c081c4f392b77/plugin/cache/handler.go#L35 |
That's correct. |
Thank you very much for your answers and especially for |
We have an issue that we are troubleshooting almost a week for now related to the well known issue with DNS conntrack race conditions ( kubernetes/kubernetes#56903 ) and I hope someone hits it too and can at least confirm we are not alone here.
iptables -L
conntrack -S
shows almost none insert_failed on node (at least not that much as we see delays and numbers are pretty static)-
conntrack -L
shows no connections with src/dst IP/port of DNS udp traffic, only tcp connections to the upstream CoreDNS - so I assume it all goes directly to the node cachehowever we still see in jaeger intermitent 2s delays in some calls (PHP app, new process and new DNS search per request), when I set DNSOptions in pod to
single-request-reopen
,ndots 2
andtimeout 1
delays are still present but now they are all 1s so this all looks like some issue with DNS traffic from pod to node-cache but I can't figure out what else to look here.The text was updated successfully, but these errors were encountered: