Improve IPFIX collector #338

yuntanghsu · 2023-12-12T21:35:59Z

In this PR:

We disable printing records whenever we receive it. Instead, we store records in a string array.
Add http listener and handler to receive request to return or reset records.

The reason we do this change is because we want to reduce the time we use to retrieve logs.
Previously, using kubectl logs will take lots of time to get the logs. And the time depends on the number of records inside the IPFIX collector Pod. The time range is ~0.8s to 4s. After above changes, the time is reduced to ~80ms.

Functionality has been tested in this PR: https://github.com/antrea-io/antrea/actions/runs/7201229562/job/19617317869?pr=5770

codecov · 2023-12-12T21:36:55Z

Codecov Report

Merging #338 (5a11b76) into main (092fb4b) will decrease coverage by 0.15%.
Report is 1 commits behind head on main.
The diff coverage is n/a.

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #338      +/-   ##
==========================================
- Coverage   72.87%   72.73%   -0.15%     
==========================================
  Files          19       19              
  Lines        2853     2853              
==========================================
- Hits         2079     2075       -4     
- Misses        602      604       +2     
- Partials      172      174       +2

Flag	Coverage Δ
integration-tests	`50.65% <ø> (-2.25%)`	⬇️
unit-tests	`71.74% <ø> (ø)`

Flags with carried forward coverage won't be shown. Click here to find out more.

see 1 file with indirect coverage changes

cmd/collector/collector.go

antoninbas · 2023-12-13T18:48:34Z

cmd/collector/collector.go

+	w.Write(jsonData)
+}
+
+func resetRecordHandler(w http.ResponseWriter, r *http.Request) {


same as above, but you should request the HTTP method to be POST in this handler

Why should this method be POST?
Do you think we still need this Handler. After disabling printing the log, the retrieve time is decreased from ~4s to ~80ms.

@antoninbas
I think this concern hasn't been solved?

sorry I didn't see the question before
It should be POST because it does a mutation the server. That's an HTTP convention. You can lookup which HTTP verbs are most appropriate for different type of APIs.

Do you think we still need this Handler. After disabling printing the log, the retrieve time is decreased from ~4s to ~80ms.

I thought you wrote 0.8s in the PR description, which would be 800ms?

Regardless, let's add the reset method, it's easy to do and if we don't end up using it in Antrea tests, so be it.
It's easier than opening a new PR in this repo later and publishing a new collector image tag...

0.8s to 4s will be the range when we use "kubectl logs" to get the logs.
Under the current changes, this time will be approximately 80ms regardless of the number of records inside the IPFIX collector.

cmd/collector/collector.go

In this commit, we do: 1. Changed the order where we append expired records before exporting them from our exporter. For inter-node traffic with egress/ingress np with action drop, we will receive records from PacketIn and the conntrack table. If the record from the conntrack table is exported first, then the record will need to do correlation at FA. From the egress case, there is no issue as we will receive the records from both nodes. But the record won't be sent to the collector as it keeps waiting to do correlation if both records come from the same node. Similar approach is added in vmware/go-ipfix#338 as well. 2. Add check to verify if Flow Exporters can successfully resolve the Flow Aggregator Service address before sending traffic. 3. Add check to verify if Flow Aggregator can successfully connect to the ClickHouse before sending traffic. 4. Add labels to External subtest to filter useless logs from the IPFIX collector Pod. 5. Confirm the correct addition of a label to a specific Pod after updating the Pod. 6. Adjust the flow-visibility end-to-end test by disabling the octetDeltaCount check. This modification is necessary because, when the dual-stack cluster is enabled, the time taken to retrieve logs from the IPFIX collector Pod is significantly longer (around 4 seconds). In the e2e test, we regularly checked the logs every 500 milliseconds to ensure that we didn't receive the last record (where octetDeltaCount is 0). However, due to the delay, the PollImmediately() function doesn't execute every 500 milliseconds. Therefore, we have removed the octetDeltaCount check and, instead, filter out all records with octetDeltaCount=0 when retrieving records from the IPFIX collector Pod. 7. Use new image from go-ipfix PR ( vmware/go-ipfix#338). We improve the IPFIX collector by: a. Disable printing records whenever we receive it. Instead, we store records in a string array. b. Add http listener and handler to receive request to return or reset records. In this way, we can reduce the retrieving log time from ~4s to ~80ms when we have ~1900 records inside it. Signed-off-by: Yun-Tang Hsu <hsuy@vmware.com>

cmd/collector/collector.go

In this commit, we do: 1. Changed the order where we append expired records before exporting them from our exporter. For inter-node traffic with egress/ingress np with action drop, we will receive records from PacketIn and the conntrack table. If the record from the conntrack table is exported first, then the record will need to do correlation at FA. From the egress case, there is no issue as we will receive the records from both nodes. But the record won't be sent to the collector as it keeps waiting to do correlation if both records come from the same node. Similar approach is added in vmware/go-ipfix#338 as well. 2. Add check to verify if Flow Exporters can successfully resolve the Flow Aggregator Service address before sending traffic. 3. Add check to verify if Flow Aggregator can successfully connect to the ClickHouse before sending traffic. 4. Add labels to External subtest to filter useless logs from the IPFIX collector Pod. 5. Confirm the correct addition of a label to a specific Pod after updating the Pod. 6. Adjust the flow-visibility end-to-end test by disabling the octetDeltaCount check. This modification is necessary because, when the dual-stack cluster is enabled, the time taken to retrieve logs from the IPFIX collector Pod is significantly longer (around 4 seconds). In the e2e test, we regularly checked the logs every 500 milliseconds to ensure that we didn't receive the last record (where octetDeltaCount is 0). However, due to the delay, the PollImmediately() function doesn't execute every 500 milliseconds. Therefore, we have removed the octetDeltaCount check and, instead, filter out all records with octetDeltaCount=0 when retrieving records from the IPFIX collector Pod. 7. Use new image from go-ipfix PR ( vmware/go-ipfix#338). We improve the IPFIX collector by: a. Disable printing records whenever we receive it. Instead, we store records in a string array. b. Add http listener and handler to receive request to return or reset records. In this way, we can reduce the retrieving log time from ~4s to ~80ms when we have ~1900 records inside it. Signed-off-by: Yun-Tang Hsu <hsuy@vmware.com>

antoninbas

LGTM

antoninbas · 2023-12-14T01:31:17Z

I don't remember how tagging works for the collector image. @heanlan @dreamtalen will we need a new release of go-ipfix?

heanlan · 2023-12-14T01:44:36Z

will we need a new release of go-ipfix?

Yes, we need a new release

pkg/intermediate/aggregate.go

In this commit, we do: 1. We disable printing records whenever we receive it. Instead, we store records in a string array. 2. Add http listener and handler to receive request to return or reset records. Signed-off-by: Yun-Tang Hsu <hsuy@vmware.com>

1. Disable printing records whenever we receive them. Instead, we store records in a string array. 2. Add http listener and handler to receive request to return or reset records. Signed-off-by: Yun-Tang Hsu <hsuy@vmware.com>

vmwclabot added the cla-not-required label Dec 12, 2023

antoninbas reviewed Dec 13, 2023

View reviewed changes

yuntanghsu changed the title ~~Add http handler to reset/get records~~ Improve IPFIX collector Dec 13, 2023

yuntanghsu force-pushed the improve_ipfix branch from 4d30427 to 071a3eb Compare December 13, 2023 21:23

yuntanghsu mentioned this pull request Dec 13, 2023

Improvement for flow-visibility e2e test antrea-io/antrea#5770

Merged

antoninbas reviewed Dec 13, 2023

View reviewed changes

cmd/collector/collector.go Outdated Show resolved Hide resolved

cmd/collector/collector.go Outdated Show resolved Hide resolved

cmd/collector/collector.go Outdated Show resolved Hide resolved

yuntanghsu force-pushed the improve_ipfix branch from 071a3eb to 7ca0084 Compare December 13, 2023 22:40

antoninbas reviewed Dec 13, 2023

View reviewed changes

cmd/collector/collector.go Outdated Show resolved Hide resolved

yuntanghsu force-pushed the improve_ipfix branch from 7ca0084 to ff5363a Compare December 13, 2023 22:50

antoninbas previously approved these changes Dec 14, 2023

View reviewed changes

yuntanghsu dismissed antoninbas’s stale review via fa19580 December 14, 2023 17:58

yuntanghsu force-pushed the improve_ipfix branch from ff5363a to fa19580 Compare December 14, 2023 17:58

yuntanghsu requested a review from antoninbas December 14, 2023 18:00

antoninbas reviewed Dec 14, 2023

View reviewed changes

pkg/intermediate/aggregate.go Outdated Show resolved Hide resolved

Improve IPFIX collector

5a11b76

In this commit, we do: 1. We disable printing records whenever we receive it. Instead, we store records in a string array. 2. Add http listener and handler to receive request to return or reset records. Signed-off-by: Yun-Tang Hsu <hsuy@vmware.com>

yuntanghsu force-pushed the improve_ipfix branch from fa19580 to 5a11b76 Compare December 14, 2023 18:05

antoninbas approved these changes Dec 14, 2023

View reviewed changes

yuntanghsu requested a review from heanlan December 14, 2023 21:42

antoninbas merged commit 837050b into vmware:main Dec 15, 2023
8 of 9 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve IPFIX collector #338

Improve IPFIX collector #338

yuntanghsu commented Dec 12, 2023 •

edited

Loading

codecov bot commented Dec 12, 2023 •

edited

Loading

antoninbas Dec 13, 2023

yuntanghsu Dec 13, 2023

yuntanghsu Dec 14, 2023

antoninbas Dec 14, 2023

yuntanghsu Dec 14, 2023

antoninbas left a comment

antoninbas commented Dec 14, 2023

heanlan commented Dec 14, 2023

Improve IPFIX collector #338

Improve IPFIX collector #338

Conversation

yuntanghsu commented Dec 12, 2023 • edited Loading

codecov bot commented Dec 12, 2023 • edited Loading

Codecov Report

antoninbas Dec 13, 2023

Choose a reason for hiding this comment

yuntanghsu Dec 13, 2023

Choose a reason for hiding this comment

yuntanghsu Dec 14, 2023

Choose a reason for hiding this comment

antoninbas Dec 14, 2023

Choose a reason for hiding this comment

yuntanghsu Dec 14, 2023

Choose a reason for hiding this comment

antoninbas left a comment

Choose a reason for hiding this comment

antoninbas commented Dec 14, 2023

heanlan commented Dec 14, 2023

yuntanghsu commented Dec 12, 2023 •

edited

Loading

codecov bot commented Dec 12, 2023 •

edited

Loading