Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce dataplane overhead caused by ct action #3858

Merged
merged 1 commit into from
Jun 6, 2022

Conversation

tnqn
Copy link
Member

@tnqn tnqn commented Jun 2, 2022

To support performing both SNAT and DNAT for traffic, Antrea uses two CT zones for SNAT and DNAT separately. For each packet, multiple CT actions are executed to go through the zones. And because SNAT is performed after DNAT, reply traffic wouldn't be unNATed correctly if they go through the zones in the same order as request traffic, an extra CT action for unSNAT was added before DNAT to resolve it. These CT actions introduce measurable overhead to the dataplane.

Since the first unSNAT action is for reply traffic of SNATed connections only, and there are only few cases needing SNAT, this patch adds conditions to the unSNAT flow to make irrelevant traffic bypass it.

With less CT action and less recirculation caused by it, the dataplane performance is significantly increased. TCP_RR and TCP_CRR improvement in a kind cluster is as below:

Test      old TPS     new TPS       delta
TCP_RR   14568.69    17826.26     +22.36%
TCP_CRR    2781.7     3498.12     +25.75%

Signed-off-by: Quan Tian qtian@vmware.com

@tnqn
Copy link
Member Author

tnqn commented Jun 2, 2022

/test-all
/test-windows-all

@tnqn
Copy link
Member Author

tnqn commented Jun 2, 2022

This should increase Windows dataplane performance as well since CT action is the main factor affecting the performance.

@tnqn tnqn added this to the Antrea v1.7 release milestone Jun 2, 2022
@tnqn
Copy link
Member Author

tnqn commented Jun 2, 2022

Adding it to milestone 1.7 since in increases dataplane performance and may simplify flows #3603 will add.

@codecov-commenter
Copy link

codecov-commenter commented Jun 2, 2022

Codecov Report

Merging #3858 (04af829) into main (e3d4352) will decrease coverage by 7.04%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #3858      +/-   ##
==========================================
- Coverage   64.36%   57.31%   -7.05%     
==========================================
  Files         288      403     +115     
  Lines       41239    57057   +15818     
==========================================
+ Hits        26542    32703    +6161     
- Misses      12557    21740    +9183     
- Partials     2140     2614     +474     
Flag Coverage Δ
e2e-tests 45.64% <100.00%> (?)
integration-tests 38.09% <ø> (?)
kind-e2e-tests 50.15% <100.00%> (-1.56%) ⬇️
unit-tests 44.31% <5.55%> (+0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
pkg/agent/openflow/framework.go 86.13% <100.00%> (ø)
pkg/agent/openflow/pipeline.go 74.37% <100.00%> (+0.20%) ⬆️
...agent/flowexporter/connections/deny_connections.go 44.82% <0.00%> (-35.64%) ⬇️
pkg/ipfix/ipfix_process.go 81.25% <0.00%> (-18.75%) ⬇️
.../flowexporter/connections/conntrack_connections.go 72.54% <0.00%> (-4.42%) ⬇️
pkg/agent/flowexporter/utils.go 72.34% <0.00%> (-4.26%) ⬇️
pkg/controller/networkpolicy/status_controller.go 68.54% <0.00%> (-3.23%) ⬇️
antrea/pkg/ovs/ovsctl/appctl.go 3.10% <0.00%> (ø)
antrea/pkg/agent/flowexporter/utils.go 14.89% <0.00%> (ø)
...ea/pkg/agent/ipassigner/responder/ndp_responder.go 0.00% <0.00%> (ø)
... and 128 more

wenyingd
wenyingd previously approved these changes Jun 2, 2022
Copy link
Contributor

@wenyingd wenyingd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

hongliangl
hongliangl previously approved these changes Jun 2, 2022
jianjuns
jianjuns previously approved these changes Jun 2, 2022
Copy link
Contributor

@jianjuns jianjuns left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great to see the significant perf improvement!

antoninbas
antoninbas previously approved these changes Jun 2, 2022
@@ -219,14 +219,14 @@ func (f *featureService) getRequiredTables() []*Table {
return []*Table{DNATTable}
}
tables := []*Table{
SNATConntrackTable,
UNSNATTable,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I think UnSNAT instead of UNSNAT makes more sense and is more readable

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated, thanks

To support performing both SNAT and DNAT for traffic, Antrea uses two
CT zones for SNAT and DNAT separately. For each packet, multiple CT
actions are executed to go through the zones. And because SNAT is
performed after DNAT, reply traffic wouldn't be unNATed correctly if
they go through the zones in the same order as request traffic, an
extra CT action for unSNAT was added before DNAT to resolve it. These
CT actions introduce measurable overhead to the dataplane.

Since the first unSNAT action is for reply traffic of SNATed connections
only, and there are only few cases needing SNAT, this patch adds
conditions to the unSNAT flow to make irrelevant traffic bypass it.

With less CT action and less recirculation caused by it, the dataplane
performance is significantly increased. TCP_RR and TCP_CRR improvement
in a kind cluster is as below:

```
Test      old TPS     new TPS       delta
TCP_RR   14568.69    17826.26     +22.36%
TCP_CRR    2781.7     3498.12     +25.75%
```

Signed-off-by: Quan Tian <qtian@vmware.com>
@tnqn
Copy link
Member Author

tnqn commented Jun 4, 2022

/test-all

@tnqn
Copy link
Member Author

tnqn commented Jun 6, 2022

/test-networkpolicy

@tnqn tnqn added the action/release-note Indicates a PR that should be included in release notes. label Jun 6, 2022
@tnqn tnqn merged commit 90419f2 into antrea-io:main Jun 6, 2022
@tnqn tnqn deleted the reduce-ct-action branch June 6, 2022 04:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
action/release-note Indicates a PR that should be included in release notes.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants