Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for batch OVS flow updates #844

Merged
merged 3 commits into from
Jul 30, 2020
Merged

Conversation

Dyanngg
Copy link
Contributor

@Dyanngg Dyanngg commented Jun 17, 2020

Currently in Antrea, OVS flows for NetworkPolicies are installed asynchronously. For agent restart cases however, the initial NetworkPolicy events before the Bookmark (#704) can be processed in batch and be sent to OVS bridge in a single bundle. This PR implements this methodology.

@antrea-bot
Copy link
Collaborator

Thanks for your PR.
Unit tests and code linters are run automatically every time the PR is updated.
E2e, conformance and network policy tests can only be triggered by a member of the vmware-tanzu organization. Regular contributors to the project should join the org.

The following commands are available:

  • /test-e2e: to trigger e2e tests.
  • /skip-e2e: to skip e2e tests.
  • /test-conformance: to trigger conformance tests.
  • /skip-conformance: to skip conformance tests.
  • /test-networkpolicy: to trigger networkpolicy tests.
  • /skip-networkpolicy: to skip networkpolicy tests.
  • /test-windows-conformance: to trigger windows conformance tests.
  • /skip-windows-conformance: to skip windows conformance tests.
  • /test-all: to trigger all tests.
  • /skip-all: to skip all tests.

These commands can only be run by members of the vmware-tanzu organization.

Copy link
Member

@tnqn tnqn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I only reviewed the commit of batch update, though not finished too.

@Dyanngg Dyanngg force-pushed the batch-flow branch 4 times, most recently from 4c52c27 to ef7c13e Compare June 19, 2020 21:08
@antrea-bot
Copy link
Collaborator

Thanks for your PR.
Unit tests and code linters are run automatically every time the PR is updated.
E2e, conformance and network policy tests can only be triggered by a member of the vmware-tanzu organization. Regular contributors to the project should join the org.

The following commands are available:

  • /test-e2e: to trigger e2e tests.
  • /skip-e2e: to skip e2e tests.
  • /test-conformance: to trigger conformance tests.
  • /skip-conformance: to skip conformance tests.
  • /test-whole-conformance: to trigger all conformance tests on linux.
  • /skip-whole-conformance: to skip all conformance tests on linux.
  • /test-networkpolicy: to trigger networkpolicy tests.
  • /skip-networkpolicy: to skip networkpolicy tests.
  • /test-windows-conformance: to trigger windows conformance tests.
  • /skip-windows-conformance: to skip windows conformance tests.
  • /test-all: to trigger all tests (except whole conformance).
  • /skip-all: to skip all tests (except whole conformance).

These commands can only be run by members of the vmware-tanzu organization.

@Dyanngg Dyanngg force-pushed the batch-flow branch 2 times, most recently from 5ed4ccb to c3b9e96 Compare June 19, 2020 21:37
@Dyanngg Dyanngg force-pushed the batch-flow branch 4 times, most recently from ef9112b to 78243a2 Compare July 8, 2020 06:17
@Dyanngg Dyanngg requested a review from tnqn July 8, 2020 19:08
@Dyanngg Dyanngg changed the title [WIP] Add support for batch OVS flow updates Support for batch OVS flow updates Jul 8, 2020
@Dyanngg
Copy link
Contributor Author

Dyanngg commented Jul 8, 2020

/test-all

@@ -711,21 +711,41 @@ func (c *policyRuleConjunction) getAddressClause(addrType types.AddressType) *cl
// If there is an error in any clause's addAddrFlows or addServiceFlows, the conjunction action flow will never be hit.
// If the default drop flow is already installed before this error, all packets will be dropped by the default drop flow,
// Otherwise all packets will be allowed.
func (c *client) InstallPolicyRuleFlows(ruleID uint32, rule *types.PolicyRule, npName, npNamespace string) error {
func (c *client) InstallPolicyRuleFlows(ofPolicyRule types.OFPolicyRule) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it better to flow the workflow of BatchInstallPolicyRuleFlows? I mean we don't need to implement two workflows.

Copy link
Contributor Author

@Dyanngg Dyanngg Jul 15, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I intentionally kept those two workflows separate. The reason is:

  1. For InstallPolicyRuleFlows, the conjMatchFlowLock is needed to protect from concurrency issues. This lock is not needed for batch install because the rules will be processed in sequence.
  2. For BatchInstallPolicyRuleFlows, for each rule it is processing, as soon as the context changes are calculated, ctxChange.updateContextStatus() needs to be called to ensure that when we process the next rule, the "conjuctive contexts" are up to date before the next round of context change calculation. I do not want to change the behavior of InstallPolicyRuleFlows as it would cause regression issues.

@antrea-bot
Copy link
Collaborator

Thanks for your PR.
Unit tests and code linters are run automatically every time the PR is updated.
E2e, conformance and network policy tests can only be triggered by a member of the vmware-tanzu organization. Regular contributors to the project should join the org.

The following commands are available:

  • /test-e2e: to trigger e2e tests.
  • /skip-e2e: to skip e2e tests.
  • /test-conformance: to trigger conformance tests.
  • /skip-conformance: to skip conformance tests.
  • /test-whole-conformance: to trigger all conformance tests on linux.
  • /skip-whole-conformance: to skip all conformance tests on linux.
  • /test-networkpolicy: to trigger networkpolicy tests.
  • /skip-networkpolicy: to skip networkpolicy tests.
  • /test-windows-conformance: to trigger windows conformance tests.
  • /skip-windows-conformance: to skip windows conformance tests.
  • /test-windows-networkpolicy: to trigger windows networkpolicy tests.
  • /skip-windows-networkpolicy: to skip windows networkpolicy tests.
  • /test-all: to trigger all tests (except whole conformance).
  • /skip-all: to skip all tests (except whole conformance).

These commands can only be run by members of the vmware-tanzu organization.

@Dyanngg
Copy link
Contributor Author

Dyanngg commented Jul 14, 2020

/test-all

@Dyanngg Dyanngg requested a review from wenyingd July 15, 2020 22:06
@Dyanngg Dyanngg force-pushed the batch-flow branch 2 times, most recently from 8f50c85 to 8df0b15 Compare July 16, 2020 22:21
@Dyanngg
Copy link
Contributor Author

Dyanngg commented Jul 16, 2020

/test-all

@Dyanngg
Copy link
Contributor Author

Dyanngg commented Jul 16, 2020

/test-conformance /test-networkpolicy /test-windows-networkpolicy

@Dyanngg
Copy link
Contributor Author

Dyanngg commented Jul 17, 2020

Looks like the clusterAPI for Jenkins e2e builds are failing @lzhecheng

=== Wait for workload cluster secret for 10 min ===
=== Get kubeconfig (try for 1m) ===
Error from server (NotFound): secrets "antrea-networkpolicy-for-pull-request-716-kubeconfig" not found
=== Get kubeconfig (try for 1m) ===
Error from server (NotFound): secrets "antrea-networkpolicy-for-pull-request-716-kubeconfig" not found
=== Get kubeconfig (try for 1m) ===
Error from server (NotFound): secrets "antrea-networkpolicy-for-pull-request-716-kubeconfig" not found
=== Get kubeconfig (try for 1m) ===

@lzhecheng
Copy link
Contributor

@Dyanngg Yes, this may relate to ip allocation. Could you try again? It should work now.

This is an optimization for NetworkPolicy OVS flows to be collectively installed
in agent restart case, so that those flows will rolled out at once and avoid potential
priority re-assignments.
In the future, mechanisms can also be added so that once this operation is done,
NetworkPolicy flows from the previous round are deleted immediately after.
@Dyanngg
Copy link
Contributor Author

Dyanngg commented Jul 29, 2020

/test-all

... as well as resolving comments
@Dyanngg
Copy link
Contributor Author

Dyanngg commented Jul 29, 2020

/test-all

@Dyanngg
Copy link
Contributor Author

Dyanngg commented Jul 29, 2020

/test-windows-networkpolicy

1 similar comment
@Dyanngg
Copy link
Contributor Author

Dyanngg commented Jul 29, 2020

/test-windows-networkpolicy

@antrea-bot
Copy link
Collaborator

Can one of the admins verify this patch?

tnqn
tnqn previously approved these changes Jul 29, 2020
Copy link
Member

@tnqn tnqn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM except two minor suggestions. If you apply the suggestions, feel free to merge once you get other reviewers' approval back.

pkg/agent/openflow/network_policy.go Outdated Show resolved Hide resolved
pkg/agent/openflow/network_policy.go Outdated Show resolved Hide resolved
@Dyanngg
Copy link
Contributor Author

Dyanngg commented Jul 29, 2020

/test-all

@Dyanngg
Copy link
Contributor Author

Dyanngg commented Jul 29, 2020

/test-windows-conformance /test-windows-networkpolicy

@Dyanngg
Copy link
Contributor Author

Dyanngg commented Jul 29, 2020

/test-windows-networkpolicy

@abhiraut
Copy link
Contributor

/test-windows-networkpolicy

@Dyanngg Dyanngg requested a review from tnqn July 30, 2020 01:10
@abhiraut abhiraut merged commit d214b52 into antrea-io:master Jul 30, 2020
@abhiraut
Copy link
Contributor

looks like flaky test result from windows np ..

  • kubectl rollout status deployment/coredns -n kube-system
    Waiting for deployment spec update to be observed...
    Waiting for deployment "coredns" rollout to finish: 0 out of 2 new replicas have been updated...
    Waiting for deployment "coredns" rollout to finish: 0 out of 2 new replicas have been updated...
    Waiting for deployment "coredns" rollout to finish: 1 old replicas are pending termination...
    error: deployment "coredns" exceeded its progress deadline
    Build step 'Execute shell' marked build as failure

merging as it is unrelated

@abhiraut
Copy link
Contributor

/cc @lzhecheng

@lzhecheng
Copy link
Contributor

@abhiraut Thanks. There is something wrong on one testbed. I disconnected it and is investigating.

GraysonWu pushed a commit to GraysonWu/antrea that referenced this pull request Sep 22, 2020
Support for batch OVS flow updates

This is an optimization for NetworkPolicy OVS flows to be collectively installed in agent restart case, so that those
flows will rolled out at once and avoid potential priority re-assignments.
In the future, mechanisms can also be added so that once this operation is done, NetworkPolicy flows from the
previous round are deleted immediately after.
@Dyanngg Dyanngg deleted the batch-flow branch September 30, 2020 23:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants