Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix L7NP enable logging wrong packet #6651

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

qiyueyao
Copy link
Contributor

@qiyueyao qiyueyao commented Sep 4, 2024

Current logs by Suricata when enableLogging is set, logs the wrong packet of RST instead of the original TCP packet, and remains an existing bug in Suricata.

This solution modifies the design to remove enableLogging in L7, keep it in L4, and instead configure Suricata to log Packet info for all alert events.

Also add Packet test to E2E tests.

Decode the packet dtwWezuaHlOhfWpNgQAAAQgARQAAjbT0QABABm9cCgoBBAoKAQOJUgBQa2w1WZlax6yAGAH7nAIAAAEBCAorcsv8RSTwQkdFVCAvYWRtaW4vaW5kZXguaHRtbCBIVFRQLzEuMQ0KSG9zdDogMTAuMTAuMS4zDQpVc2VyLUFnZW50OiBjdXJsLzcuNzQuMA0KQWNjZXB0OiAqLyoNCg0K

-> 10.10.1.4 → 10.10.1.3 HTTP GET /admin/index.html HTTP/1.1

Fixes #6636.

Copy link
Contributor

@antoninbas antoninbas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think we could validate the logged packet in our e2e test?

"dest_port": 80,
"proto": "TCP",
"pkt_src": "wire/pcap",
"tenant_id": 2,
"pkt_src": "stream (flow timeout)",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the pkt src is stream (flow timeout), you are probably looking at the wrong message in the logs. There should be a message with src wire/pcap that shows up immediately, instead of after a flow timeout (which takes 30s to a minute IIRC)

pkg/agent/controller/networkpolicy/l7engine/reconciler.go Outdated Show resolved Hide resolved
@@ -178,7 +178,7 @@ func generateTenantRulesData(policyName string, protoKeywords map[string]sets.Se
// Refer to Suricata detect engine in codebase for detailed tag keyword configuration.
var tagKeyword string
if enableLogging {
tagKeyword = " tag: session, 30, seconds;"
tagKeyword = " tag:host;"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder why the previous version was not working, it feels like session should also have included the request packet.

BTW, should we use something like tag:host,1,packets as we only need the first packet?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did some further investigation and found some problems:
Logging the wrong packet is a known bug/expected behavior documented https://redmine.openinfosecfoundation.org/issues/3480 and https://redmine.openinfosecfoundation.org/issues/2069. Basically the idea is that the log advanced past the first packet that triggered the event, so this will be the case for each session. If changed to host, the first alert stays the same, but the second alert would capture the initial packet as required.
Below is the complete log:

-----(trigger 1st time)
alert
10.10.1.3 → 10.10.1.4 TCP 80 → 47588 [RST, ACK] 
-----(trigger 2nd time)
10.10.1.4 → 10.10.1.3 TCP 55344 → 80 [SYN] SACK_PERM 
10.10.1.3 → 10.10.1.4 TCP 80 → 55344 [SYN, ACK] SACK_PERM 
10.10.1.4 → 10.10.1.3 TCP 55344 → 80 [ACK]
alert
10.10.1.4 → 10.10.1.3 HTTP GET /admin/index.html HTTP/1.1 
10.10.1.3 → 10.10.1.4 TCP 80 → 55344 [RST, ACK] 
alert: timeout
http: timeout

Given this fact, for the e2e it is possible to test it by triggering twice and look for the original packet. But I'm not sure of the necessity of enableLogging actually if this issue continues in Suricata.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the first packet is dropped and we have to wait for the next request, then I would argue that this is not a correct implementation of the feature. It's a different request in a different TCP session (source port is different).
So it looks like it is not possible to capture the correct packet at all?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIK if we use the existing Suricata log for this feature, it's not possible to capture that initial packet =(
I can remove the tagged-packet if capturing the initial packet is the only purpose 😂

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean removing the feature completely?

Copy link
Contributor Author

@qiyueyao qiyueyao Sep 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I modified the alert configuration, it was able to log the initial packet

{"timestamp":"2024-09-06T22:56:25.262709+0000","flow_id":561336788716570,"in_iface":"antrea-l7-tap0","event_type":"alert","vlan":[2],"src_ip":"10.10.1.4","src_port":46094,"dest_ip":"10.10.1.3","dest_port":80,"proto":"TCP","pkt_src":"wire/pcap","tenant_id":2,"alert":{"action":"blocked","gid":1,"signature_id":1,"rev":0,"signature":"Reject by AntreaNetworkPolicy:default/allow-privileged-url-to-admin-role","category":"","severity":3,"tenant_id":2},"app_proto":"http","direction":"to_server","flow":{"pkts_toserver":3,"pkts_toclient":1,"bytes_toserver":307,"bytes_toclient":78,"start":"2024-09-06T22:56:25.261768+0000","src_ip":"10.10.1.4","dest_ip":"10.10.1.3","src_port":46094,"dest_port":80},"packet":"dtwWezuaHlOhfWpNgQAAAggARQAAjd+4QABABkSYCgoBBAoKAQO0DgBQH6jA1rH7p7qAGAH7zPsAAAEBCAou2X6GSIuizUdFVCAvYWRtaW4vaW5kZXguaHRtbCBIVFRQLzEuMQ0KSG9zdDogMTAuMTAuMS4zDQpVc2VyLUFnZW50OiBjdXJsLzcuNzQuMA0KQWNjZXB0OiAqLyoNCg0K","packet_info":{"linktype":1}}

But it applies to all rules because it's in the Suricata configuration, so cannot be controlled by enableLogging. Does it sound good to remove enableLogging which relates to tagged-packet, but add the packet support in alert that applies to all rules?
Or we can configure whether we want packet to be logged additionally in alert upon start of Suricata.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it would be a good idea to discuss this at the community meeting.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, if this is for the alert event type, it only applies to rules that reject traffic, right?
We don't need the packet dump for allowed traffic, as we already have the app-layer metadata.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, tested that it only applies to rules rejecting traffic ( packet will only be added to alert logged for reject, only http will be logged for allow )

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should do some perf testing using ab as @tnqn suggested, just to make sure that the number of rejected requests that we can handle stays roughly the same after enabling packet logging.

@qiyueyao
Copy link
Contributor Author

qiyueyao commented Sep 14, 2024

AB testing done:

With packet: yes for alert:

$ ab -c 10 -n 500 -r http://10.10.1.3/admin/index.html
Concurrency Level:      10
Time taken for tests:   3.151 seconds
Complete requests:      500
Failed requests:        1000
   (Connect: 0, Receive: 500, Length: 0, Exceptions: 500)
Total transferred:      0 bytes
HTML transferred:       0 bytes
Requests per second:    158.70 [#/sec] (mean)
Time per request:       63.011 [ms] (mean)
Time per request:       6.301 [ms] (mean, across all concurrent requests)
Transfer rate:          0.00 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        1    1   0.5      1       7
Processing:     4    5   1.4      5      23
Waiting:        0    0   0.0      0       0
Total:          4    6   1.6      6      26

With packet: no for alert:

$ ab -c 10 -n 500 -r http://10.10.1.3/admin/index.html
Concurrency Level:      10
Time taken for tests:   3.011 seconds
Complete requests:      500
Failed requests:        1000
   (Connect: 0, Receive: 500, Length: 0, Exceptions: 500)
Total transferred:      0 bytes
HTML transferred:       0 bytes
Requests per second:    166.08 [#/sec] (mean)
Time per request:       60.210 [ms] (mean)
Time per request:       6.021 [ms] (mean, across all concurrent requests)
Transfer rate:          0.00 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        1    1   0.6      1       8
Processing:     4    5   1.6      4      34
Waiting:        0    0   0.0      0       0
Total:          5    6   1.8      6      37

@antoninbas
Copy link
Contributor

antoninbas commented Sep 16, 2024

Thanks @qiyueyao. It seems that there is a noticeable impact, but nothing too significant (around 5%). Based on these results, I would lean towards enabling this by default, without introducing a new configuration parameter. But I'll wait until @tnqn chimes in, as he requested the benchmarking.

Out of curiosity, do you have a baseline (same testbed) for the same traffic but without any L7NP enforced?

Complete requests:      500
Failed requests:        1000

I am having trouble understanding these numbers. Shouldn't all requests fail (be rejected by Suricata)?

@qiyueyao
Copy link
Contributor Author

qiyueyao commented Sep 16, 2024

Thanks @qiyueyao. It seems that there is a noticeable impact, but nothing too significant (around 5%). Based on these results, I would lean towards enabling this by default, without introducing a new configuration parameter. But I'll wait until @tnqn chimes in, as he requested the benchmarking.

Out of curiosity, do you have a baseline (same testbed) for the same traffic but without any L7NP enforced?

Complete requests:      500
Failed requests:        1000

I am having trouble understanding these numbers. Shouldn't all requests fail (be rejected by Suricata)?

I only sent 500 requests for both cases, and kept the socket open on receive errors. Based on online sources, Receive means count of err_recv when status is not APR_SUCCESS, Exceptions means exceptions thrown.

Without specifying -r, the socket exits with just one request, output apr_socket_recv: Connection reset by peer (104).

Without the source code of Apache ab tool, my guess is that for each rejected packet, apr_socket_recv is counted towards Receive and Complete requests, reject is counted towards Exceptions, they in total sum as 1000 Failed requests.

Current logs by Suricata when enableLogging is set, logs the
wrong packet of RST instead of the original TCP packet, and
remains an existing bug in Suricata.

This solution modifies the design to remove enableLogging in
L7, keep it in L4, and instead configure Suricata to log Packet
info for all alert events.

Fixes antrea-io#6636.

Signed-off-by: Qiyue Yao <yaoq@vmware.com>
@tnqn
Copy link
Member

tnqn commented Sep 20, 2024

Thanks @qiyueyao. It seems that there is a noticeable impact, but nothing too significant (around 5%). Based on these results, I would lean towards enabling this by default, without introducing a new configuration parameter. But I'll wait until @tnqn chimes in, as he requested the benchmarking.

5% difference seems fine, enabling it without a new configuration parameter sounds good to me. We can add a configuration when there is a real need.

test/e2e/l7networkpolicy_test.go Outdated Show resolved Hide resolved
test/e2e/l7networkpolicy_test.go Outdated Show resolved Hide resolved
return reflect.DeepEqual(e, x)
packetLooseEqual := conversion.EqualitiesOrDie(
func(a, b L7LogEntry) bool {
if a != b {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that doesn't really seem useful here, as it can never be true?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually it is needed, for the second case when both are bytes.

// ignore unexpected log entries and duplicates
if slices.ContainsFunc(expected, log.Matches) && !slices.ContainsFunc(actual, log.Matches) {

func (e *L7LogEntry) Equal(x *L7LogEntry) bool {
return reflect.DeepEqual(e, x)
packetLooseEqual := conversion.EqualitiesOrDie(
func(a, b L7LogEntry) bool {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unless I am missing something, this is incorrect because it completely ignores the other fields in L7LogEntry and only looks at the Packet field (all other fields are ignored in the comparison)?

I don't know exactly how conversion.EqualitiesOrDie is supposed to be used, but in this case it may be easier / more readable to just write the comparison manually:

return e1.EventType == e2.EventType && ... && packetMatches && reflect.DeepEqual(e1.Http, e2.Http) && reflect.DeepEqual(e1.Alert, e2.Alert)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the Matches logic.

@antoninbas antoninbas added the action/release-note Indicates a PR that should be included in release notes. label Sep 20, 2024
Signed-off-by: Qiyue Yao <yaoq@vmware.com>
Comment on lines +453 to +464
packetMatch := func(e, x *L7LogEntry) bool {
var expectedPacket *regexp.Regexp
var packetBytes string
if e.expectedPacketRegex != nil {
expectedPacket = e.expectedPacketRegex
packetBytes = string(x.PacketBytes)
} else if x.expectedPacketRegex != nil {
expectedPacket = x.expectedPacketRegex
packetBytes = string(e.PacketBytes)
} else {
return bytes.Equal(e.PacketBytes, x.PacketBytes)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should make it more symmetric:

if e.expectedPacketRegex != nil {
    // return false if x.PacketBytes don't match the regex
}
if x.expectedPacketRegex != nil {
    // return false if e.PacketBytes don't match the regex
}
if e.expectedPacketRegex == nil && x.expectedPacketRegex == nil {
    // return false if the byte slices don't match
}

Comment on lines +449 to +451
// Compares if two L7LogEntry are equal. They must be exactly equal if Packet is
// in the same base. Otherwise, if either of the Packet is in base64, they equal
// if Packet in base64 contains essential message in other Packet.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does the comment match the current function?
There is nothing related to base64 in the function implementation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
action/release-note Indicates a PR that should be included in release notes.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Wrong packet is logged by Suricata for L7NP when enabling logging for rule
3 participants