Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Q&A + HELP] How to make k8s coredns work with OVN DNS #228

Open
bzhaoopenstack opened this issue Dec 18, 2023 · 12 comments
Open

[Q&A + HELP] How to make k8s coredns work with OVN DNS #228

bzhaoopenstack opened this issue Dec 18, 2023 · 12 comments

Comments

@bzhaoopenstack
Copy link

Hello team,

We have a DNS specific problem. What we want is making K8S coredns can forwarding the POD dns request traffic into the OVN world, and OVN can give back the openstack DNS records.

We have an openstack-based Cloud with OVN, enable domain and DNS in neutron. The VMs can be pinged by its domain name with the correct FQDN. Then we deploy a K8S env on the said VMs with coredns, the PODs on the K8S can work well with the internal DNS name, but can not reach the VMs DNS world. Both of openstack and K8S deployments were configured with the same internal DNS service.

The DNS request of VMs is "hijacked" by OVN, but the coredns will forward the DNS request from POD to outside of K8S, it will be the similar network frame just like the VM sent the DNS request itself. I saw the very strange network packages on the compute node, they were generated by the VM nic tcpdump capture.

VM ping the domain name in openstack world. (Work well)

23:52:35.428669 fa:16:3e:74:c8:b8 (oui Unknown) > fa:16:3e:d3:4a:85 (oui Unknown), ethertype IPv4 (0x0800), length 96: (tos 0x0, ttl 64, id 43939, offset 0, flags [DF], proto UDP (17), length 82)
    192.168.200.200.41224 > cedar.internal.com.domain: [bad udp cksum 0x9a35 -> 0x16d5!] 30257+ A? k8s-tt-new-bz.zonea.teststack. (54)
23:52:35.429924 fa:16:3e:d3:4a:85 (oui Unknown) > fa:16:3e:74:c8:b8 (oui Unknown), ethertype IPv4 (0x0800), length 148: (tos 0x0, ttl 64, id 43939, offset 0, flags [DF], proto UDP (17), length 134)
    cedar.internal.com.domain > 192.168.200.200.41224: [no cksum] 30257- q: A? k8s-tt-new-bz.zonea.teststack. 1/0/0 k8s-tt-new-bz.zonea.teststack. [1h] A 192.168.200.44 (106)

Pod ping the domain name in openstack world.(NOT WORK)

23:47:50.668761 fa:16:3e:74:c8:b8 (oui Unknown) > fa:16:3e:d3:4a:85 (oui Unknown), ethertype IPv4 (0x0800), length 107: (tos 0x0, ttl 63, id 58906, offset 0, flags [DF], proto UDP (17), length 93)
    192.168.200.200.5448 > cedar.internal.com.domain: [bad udp cksum 0x9a40 -> 0x79bd!] 30058+ [1au] A? k8s-tt-new-bz.zonea.teststack. ar: . OPT UDPsize=2048 DO (65)
23:47:50.669640 fa:16:3e:d3:4a:85 (oui Unknown) > fa:16:3e:74:c8:b8 (oui Unknown), ethertype IPv4 (0x0800), length 159: (tos 0x0, ttl 63, id 58906, offset 0, flags [DF], proto UDP (17), length 145)
    cedar.internal.com.domain > 192.168.200.200.5448: [no cksum] 30058- q: A? k8s-tt-new-bz.zonea.teststack. 1/0/0 . OPT UDPsize=2048 DO (117)

The last package VM received looks strange, as we got a zero A record. I'm not sure whether OVN is failed to dns_lookup in somehow and return the wrong DNS resp.

Could any one can help to leave a suggest about how to make them work as wish in our situation? Thank you.

@dceara
Copy link
Collaborator

dceara commented Dec 19, 2023

@bzhaoopenstack thanks for the bug report! It really looks like ovn-controller can't handle DNS requests that include OPT UDPsize=X. I did a quick test:

$ ovn-nbctl list dns
_uuid               : 7d2b8afe-7d6f-4745-a28d-3aaf2a4cbe5d
external_ids        : {}
records             : {google.com="42.42.42.3"}

$ ovn-nbctl show
switch 1949fdae-2502-4b81-8e4d-d739a1bdf906 (ls)
    port vm2
        addresses: ["00:00:00:00:00:02"]
    port vm1
        addresses: ["00:00:00:00:00:01"]

Without bufsize as option:

$ ip netns exec vm1 dig google.com +bufsize=0

; <<>> DiG 9.11.36-RedHat-9.11.36-11.el8_9 <<>> google.com +bufsize=0
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 18412
;; flags: qr rd ad; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0
;; WARNING: recursion requested but not available

;; QUESTION SECTION:
;google.com.                    IN      A

;; ANSWER SECTION:
google.com.             3600    IN      A       42.42.42.3

;; Query time: 0 msec
;; SERVER: 10.38.5.26#53(10.38.5.26)
;; WHEN: Tue Dec 19 10:44:06 CET 2023
;; MSG SIZE  rcvd: 54

With bufsize set:

$ ip netns exec vm1 dig google.com +bufsize=2048

; <<>> DiG 9.11.36-RedHat-9.11.36-11.el8_9 <<>> google.com +bufsize=2048
;; global options: +cmd
;; connection timed out; no servers could be reached
[...]
10:44:46.428097 00:00:00:00:00:01 > 00:00:00:00:01:00, ethertype IPv4 (0x0800), length 93: (tos 0x0, ttl 64, id 32910, offset 0, flags [none], proto UDP (17), length 79)                                                                    
    42.42.42.2.48523 > 10.11.5.19.53: [bad udp cksum 0x6396 -> 0xd37f!] 64569+ [1au] A? google.com. ar: . OPT UDPsize=2048 (51)

@dceara
Copy link
Collaborator

dceara commented Dec 19, 2023

Ah, this is due to 4b10571

@brianphaley do you think there's a safe way to ignore additional (EDNS) records or accept some of the common ones? In this case dig seems to set +bufsize by default (I see it set at 4K on my system).

@bzhaoopenstack
Copy link
Author

bzhaoopenstack commented Dec 20, 2023

@dceara Thanks so much for your verify. And I also see the associated bug issue in openstack https://bugs.launchpad.net/neutron/+bug/2030294 .

And we are using the OVN based openstack, so we are confused that why the behavior is different than linux bridge/ovs based deployment.

During we deep into the DNS resp nw package, we found the package is very strange and wrong. Such as,
image

That's why the tcpdump msg looks so confused. ;-)

@bzhaoopenstack
Copy link
Author

In our product deployment, we use CoreDNS for K8S deployment, and K8S runs on the openstack based Cloud. But seems CoreDNS enable the EDNS by default, we are considering to change the CoreDNS to other DNS service. If you could provide more advice towards this issue, that's really helpful to us. Thank you.

@dceara
Copy link
Collaborator

dceara commented Jan 3, 2024

Disabling EDNS would be a way forward for now. However, I really think OVN should try to handle this and ignore unknown resource requests of unknown type.

dceara added a commit to dceara/ovn that referenced this issue Jan 5, 2024
EDNS is backwards compatible so it's safe to just ignore additional ARs.

Reported-at: ovn-org#228
Reported-at: https://issues.redhat.com/browse/FDP-222
Signed-off-by: Dumitru Ceara <dceara@redhat.com>
@dceara
Copy link
Collaborator

dceara commented Jan 5, 2024

@bzhaoopenstack Would you be able to test the following change?

It's quite a rough first approach and not really tested but I think it should work:
dceara@d0ea907

@brianphaley
Copy link
Contributor

Sorry, I was on extended leave until today, but hopefully that change can be tested and verified. My original change might have been too big a hammer, even if it did seem to work for us.

@bzhaoopenstack
Copy link
Author

@dceara Thank you so much for such quick fix. Let me check whether there is any resource to verify this.

@dceara
Copy link
Collaborator

dceara commented Jan 9, 2024

@brianphaley @bzhaoopenstack OK, I'll wait a few days from confirmation from you guys before posting this as a formal patch for review on ovs-dev.

Thanks,
Dumitru

@dceara
Copy link
Collaborator

dceara commented Jan 19, 2024

@brianphaley @bzhaoopenstack Did you happen to have time to try out the potential fix?

Thanks!

@dceara
Copy link
Collaborator

dceara commented Jan 23, 2024

I went ahead and posted the patch on the dev mailing list:
https://patchwork.ozlabs.org/project/ovn/patch/20240123141545.2093189-1-dceara@redhat.com/

ovsrobot pushed a commit to ovsrobot/ovn that referenced this issue Jan 24, 2024
EDNS is backwards compatible so it's safe to just ignore additional ARs.

Reported-at: ovn-org#228
Reported-at: https://issues.redhat.com/browse/FDP-222
Signed-off-by: Dumitru Ceara <dceara@redhat.com>
Signed-off-by: 0-day Robot <robot@bytheb.org>
@dceara
Copy link
Collaborator

dceara commented Feb 9, 2024

I posted a v2 (it's the same fix as in v1 it just adds a test case):
https://patchwork.ozlabs.org/project/ovn/patch/20240209152117.957387-1-dceara@redhat.com/

ovsrobot pushed a commit to ovsrobot/ovn that referenced this issue Feb 9, 2024
EDNS is backwards compatible so it's safe to just ignore additional ARs.

Reported-at: ovn-org#228
Reported-at: https://issues.redhat.com/browse/FDP-222
Signed-off-by: Dumitru Ceara <dceara@redhat.com>
Signed-off-by: 0-day Robot <robot@bytheb.org>
dceara added a commit to dceara/ovn that referenced this issue Feb 12, 2024
EDNS is backwards compatible so it's safe to just ignore additional ARs.

Reported-at: ovn-org#228
Reported-at: https://issues.redhat.com/browse/FDP-222
Signed-off-by: Dumitru Ceara <dceara@redhat.com>
Acked-by: Mark Michelson <mmichels@redhat.com>
dceara added a commit to dceara/ovn that referenced this issue Feb 12, 2024
EDNS is backwards compatible so it's safe to just ignore additional ARs.

Reported-at: ovn-org#228
Reported-at: https://issues.redhat.com/browse/FDP-222
Signed-off-by: Dumitru Ceara <dceara@redhat.com>
Acked-by: Mark Michelson <mmichels@redhat.com>
(cherry picked from commit b7fe2c8)
dceara added a commit to dceara/ovn that referenced this issue Feb 12, 2024
EDNS is backwards compatible so it's safe to just ignore additional ARs.

Reported-at: ovn-org#228
Reported-at: https://issues.redhat.com/browse/FDP-222
Signed-off-by: Dumitru Ceara <dceara@redhat.com>
Acked-by: Mark Michelson <mmichels@redhat.com>
(cherry picked from commit b7fe2c8)
dceara added a commit to dceara/ovn that referenced this issue Feb 12, 2024
EDNS is backwards compatible so it's safe to just ignore additional ARs.

Reported-at: ovn-org#228
Reported-at: https://issues.redhat.com/browse/FDP-222
Signed-off-by: Dumitru Ceara <dceara@redhat.com>
Acked-by: Mark Michelson <mmichels@redhat.com>
(cherry picked from commit b7fe2c8)
dceara added a commit to dceara/ovn that referenced this issue Feb 12, 2024
EDNS is backwards compatible so it's safe to just ignore additional ARs.

Reported-at: ovn-org#228
Reported-at: https://issues.redhat.com/browse/FDP-222
Signed-off-by: Dumitru Ceara <dceara@redhat.com>
Acked-by: Mark Michelson <mmichels@redhat.com>
(cherry picked from commit b7fe2c8)
dceara added a commit to dceara/ovn that referenced this issue Feb 12, 2024
EDNS is backwards compatible so it's safe to just ignore additional ARs.

Reported-at: ovn-org#228
Reported-at: https://issues.redhat.com/browse/FDP-222
Signed-off-by: Dumitru Ceara <dceara@redhat.com>
Acked-by: Mark Michelson <mmichels@redhat.com>
(cherry picked from commit b7fe2c8)
dceara added a commit to dceara/ovn that referenced this issue Feb 12, 2024
EDNS is backwards compatible so it's safe to just ignore additional ARs.

Reported-at: ovn-org#228
Reported-at: https://issues.redhat.com/browse/FDP-222
Signed-off-by: Dumitru Ceara <dceara@redhat.com>
Acked-by: Mark Michelson <mmichels@redhat.com>
(cherry picked from commit b7fe2c8)
dceara added a commit to dceara/ovn that referenced this issue Feb 12, 2024
EDNS is backwards compatible so it's safe to just ignore additional ARs.

Reported-at: ovn-org#228
Reported-at: https://issues.redhat.com/browse/FDP-222
Signed-off-by: Dumitru Ceara <dceara@redhat.com>
Acked-by: Mark Michelson <mmichels@redhat.com>
(cherry picked from commit b7fe2c8)
dceara added a commit to dceara/ovn that referenced this issue Feb 12, 2024
EDNS is backwards compatible so it's safe to just ignore additional ARs.

Reported-at: ovn-org#228
Reported-at: https://issues.redhat.com/browse/FDP-222
Signed-off-by: Dumitru Ceara <dceara@redhat.com>
Acked-by: Mark Michelson <mmichels@redhat.com>
(cherry picked from commit b7fe2c8)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants