Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix DNS latency of 5s when use iptables forward #62764

Closed
wants to merge 2 commits into from
Closed

Fix DNS latency of 5s when use iptables forward #62764

wants to merge 2 commits into from

Conversation

xiaoxubeii
Copy link
Member

What this PR does / why we need it:
Fix the dns latency of 5s when uses iptables forward.

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #62628

Special notes for your reviewer:

Release note:

NONE

@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Apr 18, 2018
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: xiaoxubeii
To fully approve this pull request, please assign additional approvers.
We suggest the following additional approver: danwinship

Assign the PR to them by writing /assign @danwinship in a comment when ready.

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@xiaoxubeii xiaoxubeii changed the title Fix 62628 Fix DNS latency of 5s when uses iptables forward Apr 18, 2018
@xiaoxubeii xiaoxubeii changed the title Fix DNS latency of 5s when uses iptables forward Fix DNS latency of 5s when use iptables forward Apr 18, 2018
@dims
Copy link
Member

dims commented Apr 18, 2018

/ok-to-test

@k8s-ci-robot k8s-ci-robot removed the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Apr 18, 2018
@xiaoxubeii
Copy link
Member Author

/assign @danwinship

@xiaoxubeii
Copy link
Member Author

/unassign danwinship

@xiaoxubeii
Copy link
Member Author

/assign mrhohn

@xiaoxubeii
Copy link
Member Author

/retest

@xiaoxubeii
Copy link
Member Author

@MrHohn for review and approval : )

@MrHohn
Copy link
Member

MrHohn commented May 14, 2018

@bowei @thockin

@MrHohn
Copy link
Member

MrHohn commented May 14, 2018

Would be great if we can have a cluster wide knob for tweaking this defaultDNSOptions so user can place this workaround via that. I believe this might help mitigate the packet dropping bug for pod DNS resolution, but whether there would be any other side effect is unclear to me and that's why I'm hesitated.

Another pod wide option would be something like below (similar to what you posted on weaveworks/weave#3287 (comment)):

apiVersion: v1
kind: Pod
metadata:
  namespace: default
  name: dns-example
spec:
  containers:
    - name: test
      image: nginx
  dnsPolicy: "ClusterFirst"
  dnsConfig:
    options:
      - name: single-request-reopen

The given options will be merged with the pre-set ones (e.g. ndots:5).

@Quentin-M
Copy link
Contributor

Quentin-M commented May 15, 2018

I would just like to add here that the single-request(-reopen) workaround does not work with Alpine-based containers, as musl does not support the option (see below). Unfortunately, Alpine Linux is the base image for 90% of our infrastructure.

src/network/resolvconf.c

                if (!strncmp(line, "options", 7) && isspace(line[7])) {
                        p = strstr(line, "ndots:");
                        if (p && isdigit(p[6])) {
                                p += 6;
                                unsigned long x = strtoul(p, &z, 10);
                                if (z != p) conf->ndots = x > 15 ? 15 : x;
                        }
                        p = strstr(line, "attempts:");
                        if (p && isdigit(p[9])) {
                                p += 9;
                                unsigned long x = strtoul(p, &z, 10);
                                if (z != p) conf->attempts = x > 10 ? 10 : x;
                        }
                        p = strstr(line, "timeout:");
                        if (p && (isdigit(p[8]) || p[8]=='.')) {
                                p += 8;
                                unsigned long x = strtoul(p, &z, 10);
                                if (z != p) conf->timeout = x > 60 ? 60 : x;
                        }
                        continue;
                }

src/network/lookup.h

struct resolvconf {
        struct address ns[MAXNS];
        unsigned nns, attempts, ndots;
        unsigned timeout;
};

@Quentin-M
Copy link
Contributor

Here is the workaround we are about to use: weaveworks/weave#3287 (comment)

@xiaoxubeii
Copy link
Member Author

@MrHohn OK, that's a compromise settlement, i will close the pr : )

@xiaoxubeii
Copy link
Member Author

/close

@steven-sheehy
Copy link

Since we run various helm charts, not every pod we run is under our control to be able to add a custom dnsConfig to specify single-request-reopen. A custom kubelet flag to enable this would help, but I think this should be enabled by kubelet by default. From my understanding of the option, single-request-reopen sounds pretty safe since it provides a fallback for the existing functionality to try a new socket. If underlying libraries don't support it like alpine, it will just be ignored. Anyway this PR could be reopened?

@Quentin-M
Copy link
Contributor

I just posted a little write-up about our journey troubleshooting the issue, and how we are worked around it in production: https://blog.quentin-machu.fr/2018/06/24/5-15s-dns-lookups-on-kubernetes/.

@steven-sheehy Our workaround does not involve setting dnsConfig, nor does it require any change from the users.

@steven-sheehy
Copy link

@Quentin-M I tried your workaround and it still occurs. Left a comment on your blog. We may be suffering from the SNAT race condition as well. Regardless, we at least need a cluster level option to tweak this.

@Quentin-M
Copy link
Contributor

Quentin-M commented Jun 27, 2018 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. release-note-none Denotes a PR that doesn't merit a release note. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

DNS latency of 5s when uses iptables forward in pods network traffic
7 participants