Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support inspecting fragmented DNS message for FQDN policy #4732

Merged
merged 1 commit into from
Mar 22, 2023

Conversation

tnqn
Copy link
Member

@tnqn tnqn commented Mar 21, 2023

A DNS message sent over TCP will be fragmented if its length is greater than MTU, in which case FQDN policy wouldn't work as expected because the used DNS library expects the entire message. However, it's been observed that DNS message was fragmented frequently in tests, which may be applicable to production environments as well. On the other hand, we don't really need the entire message to support FQDN policy, and the first fragment normally already contains the information we need, the question and answer sections.

This patch forks and modifies the message Unpack function to be able to handle fragmented messages.

@tnqn
Copy link
Member Author

tnqn commented Mar 21, 2023

/test-e2e
/test-ipv6-e2e
/test-ipv6-only-e2e

@tnqn tnqn requested review from GraysonWu and Dyanngg March 21, 2023 12:43
@tnqn tnqn marked this pull request as ready for review March 21, 2023 12:43
@tnqn
Copy link
Member Author

tnqn commented Mar 21, 2023

@GraysonWu ACNPFQDNPolicyTCP test failed because of fragmented DNS message, so I assume it could happen in production envrionments. I think better to fix it before we declare the support of DNS over TCP officially.

=== RUN   TestAntreaPolicy/TestGroupNoK8sNP/Case=ACNPFQDNPolicyTCP
I0320 13:58:09.130753  469843 k8s_util.go:943] Creating/updating ClusterNetworkPolicy test-acnp-fqdn-tcp
D0320 13:58:09.132515  469843 k8s_util.go:946] Creating ClusterNetworkPolicy test-acnp-fqdn-tcp
    antreapolicy_test.go:4576: Waiting for ACNP 'test-acnp-fqdn-tcp' to be realized
I0320 13:58:09.242977  469843 util.go:44] Confirming ready status costs 102.817845ms
T0320 13:58:09.245975  469843 k8s_util.go:296] Running: kubectl exec y-qbd8guota-577856bfd5-x5p5b -c c80 -n y-qbd8guot -- /bin/sh -c dig github.com +tcp
T0320 13:58:09.316675  469843 k8s_util.go:298] a -> github.com: error when running command: err - <nil> /// stdout - 
; <<>> DiG 9.16.6 <<>> github.com +tcp
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 64172
;; flags: qr aa rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 13, ADDITIONAL: 27

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
; COOKIE: 72768b34582473d2 (echoed)
;; QUESTION SECTION:
;github.com.			IN	A

;; ANSWER SECTION:
github.com.		30	IN	A	192.30.255.112

;; AUTHORITY SECTION:
.			30	IN	NS	b.root-servers.net.
.			30	IN	NS	h.root-servers.net.
.			30	IN	NS	d.root-servers.net.
.			30	IN	NS	l.root-servers.net.
.			30	IN	NS	c.root-servers.net.
.			30	IN	NS	m.root-servers.net.
.			30	IN	NS	g.root-servers.net.
.			30	IN	NS	f.root-servers.net.
.			30	IN	NS	k.root-servers.net.
.			30	IN	NS	e.root-servers.net.
.			30	IN	NS	i.root-servers.net.
.			30	IN	NS	a.root-servers.net.
.			30	IN	NS	j.root-servers.net.

;; ADDITIONAL SECTION:
e.root-servers.net.	30	IN	A	192.203.230.10
c.root-servers.net.	30	IN	AAAA	2001:500:2::c
d.root-servers.net.	30	IN	A	199.7.91.13
l.root-servers.net.	30	IN	AAAA	2001:500:9f::42
m.root-servers.net.	30	IN	A	202.12.27.33
j.root-servers.net.	30	IN	A	192.58.128.30
a.root-servers.net.	30	IN	AAAA	2001:503:ba3e::2:30
b.root-servers.net.	30	IN	AAAA	2001:500:200::b
j.root-servers.net.	30	IN	AAAA	2001:503:c27::2:30
f.root-servers.net.	30	IN	AAAA	2001:500:2f::f
a.root-servers.net.	30	IN	A	198.41.0.4
m.root-servers.net.	30	IN	AAAA	2001:dc3::35
e.root-servers.net.	30	IN	AAAA	2001:500:a8::e
i.root-servers.net.	30	IN	A	192.36.148.17
h.root-servers.net.	30	IN	A	198.97.190.53
l.root-servers.net.	30	IN	A	199.7.83.42
f.root-servers.net.	30	IN	A	192.5.5.241
c.root-servers.net.	30	IN	A	192.33.4.12
i.root-servers.net.	30	IN	AAAA	2001:7fe::53
k.root-servers.net.	30	IN	A	193.0.14.129
d.root-servers.net.	30	IN	AAAA	2001:500:2d::d
g.root-servers.net.	30	IN	AAAA	2001:500:12::d0d
b.root-servers.net.	30	IN	A	199.9.14.201
h.root-servers.net.	30	IN	AAAA	2001:500:1::53
g.root-servers.net.	30	IN	A	192.112.36.4
k.root-servers.net.	30	IN	AAAA	2001:7fd::1

;; Query time: 4 msec
;; SERVER: 10.96.0.10#53(10.96.0.10)
;; WHEN: Mon Mar 20 13:58:09 UTC 2023
;; MSG SIZE  rcvd: 1520

 /// stderr - 
T0320 13:58:09.316728  469843 antreapolicy_test.go:3354] Probing: a -> github.com(	192.30.255.112)
T0320 13:58:09.316772  469843 k8s_util.go:158] Running: kubectl exec y-qbd8guota-577856bfd5-x5p5b -c c80 -n y-qbd8guot -- /bin/sh -c for i in $(seq 1 3); do /agnhost connect 	192.30.255.112:80 --timeout=1s --protocol=tcp; done;
    antreapolicy_test.go:3360: Failure -- wrong results for probe: Source y-qbd8guot/a --> Dest github.com:80 connectivity: Con, expected: Drp
I0320 13:58:09.488766  469843 k8s_util.go:972] Deleting AntreaClusterNetworkPolicies test-acnp-fqdn-tcp

The same test has passed stably with the patch.

=== RUN   TestAntreaPolicy/TestGroupNoK8sNP/Case=ACNPFQDNPolicyTCP
I0321 08:25:15.123552 2185211 k8s_util.go:943] Creating/updating ClusterNetworkPolicy test-acnp-fqdn-tcp
D0321 08:25:15.126301 2185211 k8s_util.go:946] Creating ClusterNetworkPolicy test-acnp-fqdn-tcp
    antreapolicy_test.go:4576: Waiting for ACNP 'test-acnp-fqdn-tcp' to be realized
I0321 08:25:15.238703 2185211 util.go:44] Confirming ready status costs 104.794228ms
T0321 08:25:15.241870 2185211 k8s_util.go:296] Running: kubectl exec y-hpqmkz8ja-68f76d8bc8-kd86n -c c80 -n y-hpqmkz8j -- /bin/sh -c dig github.com +tcp
T0321 08:25:15.311400 2185211 k8s_util.go:298] a -> github.com: error when running command: err - <nil> /// stdout - 
; <<>> DiG 9.16.6 <<>> github.com +tcp
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 42373
;; flags: qr aa rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 13, ADDITIONAL: 27

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
; COOKIE: c76919210c0c5c8c (echoed)
;; QUESTION SECTION:
;github.com.			IN	A

;; ANSWER SECTION:
github.com.		5	IN	A	192.30.255.113

;; AUTHORITY SECTION:
.			5	IN	NS	a.root-servers.net.
.			5	IN	NS	f.root-servers.net.
.			5	IN	NS	b.root-servers.net.
.			5	IN	NS	i.root-servers.net.
.			5	IN	NS	m.root-servers.net.
.			5	IN	NS	h.root-servers.net.
.			5	IN	NS	l.root-servers.net.
.			5	IN	NS	j.root-servers.net.
.			5	IN	NS	g.root-servers.net.
.			5	IN	NS	d.root-servers.net.
.			5	IN	NS	c.root-servers.net.
.			5	IN	NS	k.root-servers.net.
.			5	IN	NS	e.root-servers.net.

;; ADDITIONAL SECTION:
f.root-servers.net.	5	IN	AAAA	2001:500:2f::f
l.root-servers.net.	5	IN	A	199.7.83.42
a.root-servers.net.	5	IN	AAAA	2001:503:ba3e::2:30
b.root-servers.net.	5	IN	AAAA	2001:500:200::b
d.root-servers.net.	5	IN	A	199.7.91.13
g.root-servers.net.	5	IN	A	192.112.36.4
k.root-servers.net.	5	IN	AAAA	2001:7fd::1
g.root-servers.net.	5	IN	AAAA	2001:500:12::d0d
h.root-servers.net.	5	IN	A	198.97.190.53
i.root-servers.net.	5	IN	AAAA	2001:7fe::53
c.root-servers.net.	5	IN	AAAA	2001:500:2::c
m.root-servers.net.	5	IN	A	202.12.27.33
c.root-servers.net.	5	IN	A	192.33.4.12
b.root-servers.net.	5	IN	A	199.9.14.201
e.root-servers.net.	5	IN	AAAA	2001:500:a8::e
j.root-servers.net.	5	IN	A	192.58.128.30
d.root-servers.net.	5	IN	AAAA	2001:500:2d::d
a.root-servers.net.	5	IN	A	198.41.0.4
j.root-servers.net.	5	IN	AAAA	2001:503:c27::2:30
e.root-servers.net.	5	IN	A	192.203.230.10
l.root-servers.net.	5	IN	AAAA	2001:500:9f::42
k.root-servers.net.	5	IN	A	193.0.14.129
f.root-servers.net.	5	IN	A	192.5.5.241
i.root-servers.net.	5	IN	A	192.36.148.17
h.root-servers.net.	5	IN	AAAA	2001:500:1::53
m.root-servers.net.	5	IN	AAAA	2001:dc3::35

;; Query time: 4 msec
;; SERVER: 10.96.0.10#53(10.96.0.10)
;; WHEN: Tue Mar 21 08:25:15 UTC 2023
;; MSG SIZE  rcvd: 1520

 /// stderr - 
T0321 08:25:15.311460 2185211 antreapolicy_test.go:3354] Probing: a -> github.com(	192.30.255.113)
T0321 08:25:15.311491 2185211 k8s_util.go:158] Running: kubectl exec y-hpqmkz8ja-68f76d8bc8-kd86n -c c80 -n y-hpqmkz8j -- /bin/sh -c for i in $(seq 1 3); do /agnhost connect 	192.30.255.113:80 --timeout=1s --protocol=tcp; done;
T0321 08:25:18.430264 2185211 k8s_util.go:165] y-hpqmkz8j/a -> 	192.30.255.113: error when running command: err - command terminated with exit code 1 /// stdout -  /// stderr - TIMEOUT
TIMEOUT
TIMEOUT

I0321 08:25:18.430309 2185211 k8s_util.go:972] Deleting AntreaClusterNetworkPolicies test-acnp-fqdn-tcp

@tnqn tnqn added this to the Antrea v1.11 release milestone Mar 21, 2023
@tnqn
Copy link
Member Author

tnqn commented Mar 21, 2023

/test-all

@tnqn tnqn requested a review from antoninbas March 21, 2023 13:26
Copy link
Contributor

@Dyanngg Dyanngg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM overall, thanks Quan for adding this

pkg/agent/controller/networkpolicy/fqdn.go Show resolved Hide resolved
antoninbas
antoninbas previously approved these changes Mar 21, 2023
A DNS message sent over TCP will be fragmented if its length is greater
than MTU, in which case FQDN policy wouldn't work as expected because
the used DNS library expects the entire message. However, it's been
observed that DNS message was fragmented frequently in tests, which may
be applicable to production environments as well. On the other hand, we
don't really need the entire message to support FQDN policy, and the
first fragment normally already contains the information we need, the
question and answer sections.

This patch forks and modifies the message Unpack function to be able to
handle fragmented messages.

Signed-off-by: Quan Tian <qtian@vmware.com>
@tnqn
Copy link
Member Author

tnqn commented Mar 22, 2023

/skip-all which have passed before the last comment update.

Copy link
Contributor

@Dyanngg Dyanngg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

},
}

// Pack msg and then unpack into partialMsg
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Pack msg and then unpack into partialMsg
// Pack msg and unpack into partialMsg.

// If we cannot unpack the whole array, then it will return nil
func unpackRRslice(l int, msg []byte, off int) (dst1 []godns.RR, off1 int, err error) {
var r godns.RR
// Don't pre-allocate, l may be under attacker control
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Don't pre-allocate, l may be under attacker control
// Don't pre-allocate, 'l' may be under attacker control.

}

// unpackRRslice unpacks msg[off:] into an []RR.
// If we cannot unpack the whole array, then it will return nil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// If we cannot unpack the whole array, then it will return nil
// If we cannot unpack the whole array, then it will return nil.

}

dns.Answer, off, err = unpackRRslice(int(dh.Ancount), msg, off)
// The header counts might have been wrong so we need to update it
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// The header counts might have been wrong so we need to update it
// The header counts might have been wrong so we need to update it.

Comment on lines +119 to +121
// Qdcount, Ancount, Nscount, Arcount can't be trusted, as they are
// attacker controlled. This means we can't use them to pre-allocate
// slices.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Qdcount, Ancount, Nscount, Arcount can't be trusted, as they are
// attacker controlled. This means we can't use them to pre-allocate
// slices.
// Qdcount, Ancount, Nscount, Arcount can't be trusted, as they are
// under the control of attackers. This means that we can't use them to pre-allocate
// slices.

// header. This can still be useful to the caller. 9.9.9.9 sends these
// when responding with REFUSED for instance.
if off == len(msg) {
// reset sections before returning
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// reset sections before returning
// Reset sections before returning.

// the additional section.
func unpackPartially(dns *godns.Msg, dh godns.Header, msg []byte, off int) (err error) {
// If we are at the end of the message we should return *just* the
// header. This can still be useful to the caller. 9.9.9.9 sends these
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// header. This can still be useful to the caller. 9.9.9.9 sends these
// header. This can be still useful to the caller. 9.9.9.9 sends these

off = len(msg)
break
}
// If offset does not increase anymore, l is a lie
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// If offset does not increase anymore, l is a lie
// If offset does not increase anymore, 'l' is a lie.

@tnqn
Copy link
Member Author

tnqn commented Mar 22, 2023

@hongliangl the code and comments are copied from third party as the directory indicates. I don't feel it's worth to update them as commented to introduce unncessary diff.

@hongliangl
Copy link
Contributor

hongliangl commented Mar 22, 2023

@hongliangl the code and comments are copied from third party as the directory indicates. I don't feel it's worth to update them as commented to introduce unncessary diff.

If so, it's not worth to update them, just keep them.

@tnqn tnqn merged commit a6f5f0a into antrea-io:main Mar 22, 2023
@tnqn tnqn deleted the fix-fragmented-dns branch March 22, 2023 03:58
@tnqn tnqn mentioned this pull request Mar 24, 2023
jainpulkit22 pushed a commit to urharshitha/antrea that referenced this pull request Apr 28, 2023
…4732)

A DNS message sent over TCP will be fragmented if its length is greater
than MTU, in which case FQDN policy wouldn't work as expected because
the used DNS library expects the entire message. However, it's been
observed that DNS message was fragmented frequently in tests, which may
be applicable to production environments as well. On the other hand, we
don't really need the entire message to support FQDN policy, and the
first fragment normally already contains the information we need, the
question and answer sections.

This patch forks and modifies the message Unpack function to be able to
handle fragmented messages.

Signed-off-by: Quan Tian <qtian@vmware.com>
@tnqn tnqn added the kind/bug Categorizes issue or PR as related to a bug. label Jan 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants