Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement NPL agent unification. #3936

Merged
merged 1 commit into from
Aug 8, 2022
Merged

Implement NPL agent unification. #3936

merged 1 commit into from
Aug 8, 2022

Conversation

XinShuYang
Copy link
Contributor

@XinShuYang XinShuYang commented Jun 24, 2022

  • Unify agent behavior across Linux and Windows. Linux agent should support
    allocating different nodeports for different protocols when the podports are the same.
  • Update port allocation related unit tests.
  • Enable windows e2e test.
  • Delete unused functions.

Signed-off-by: Shuyang Xin gavinx@vmware.com

@XinShuYang
Copy link
Contributor Author

/test-all

@codecov-commenter
Copy link

codecov-commenter commented Jun 24, 2022

Codecov Report

Merging #3936 (0a20df2) into main (4b788e7) will increase coverage by 3.45%.
The diff coverage is 51.64%.

❗ Current head 0a20df2 differs from pull request most recent head 8461ba4. Consider uploading reports for the commit 8461ba4 to get more accurate results

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #3936      +/-   ##
==========================================
+ Coverage   61.42%   64.87%   +3.45%     
==========================================
  Files         296      294       -2     
  Lines       43864    44520     +656     
==========================================
+ Hits        26942    28882    +1940     
+ Misses      14672    13315    -1357     
- Partials     2250     2323      +73     
Flag Coverage Δ *Carryforward flag
e2e-tests 40.72% <17.17%> (?)
kind-e2e-tests 50.91% <21.29%> (+6.41%) ⬆️ Carriedforward from 2a37aec
unit-tests 44.35% <44.62%> (+0.10%) ⬆️ Carriedforward from 2a37aec

*This pull request uses carry forward flags. Click here to find out more.

Impacted Files Coverage Δ
...icluster/cmd/multicluster-controller/controller.go 7.86% <0.00%> (-0.47%) ⬇️
multicluster/cmd/multicluster-controller/leader.go 0.00% <0.00%> (ø)
multicluster/cmd/multicluster-controller/main.go 0.00% <0.00%> (ø)
...uster/controllers/multicluster/controller_utils.go 26.66% <ø> (-0.89%) ⬇️
pkg/agent/openflow/client.go 68.30% <0.00%> (+2.05%) ⬆️
pkg/agent/openflow/multicast.go 0.00% <0.00%> (ø)
pkg/agent/openflow/pipeline.go 73.13% <0.00%> (+2.67%) ⬆️
pkg/agent/util/iptables/iptables.go 43.52% <ø> (ø)
pkg/ovs/openflow/ofctrl_group.go 47.82% <0.00%> (-2.95%) ⬇️
pkg/agent/route/route_linux.go 45.70% <7.14%> (+16.18%) ⬆️
... and 94 more

@XinShuYang XinShuYang marked this pull request as ready for review June 28, 2022 07:53
@XinShuYang
Copy link
Contributor Author

/test-all

@XinShuYang
Copy link
Contributor Author

/test-all
/test-windows-all

@XinShuYang
Copy link
Contributor Author

/test-all

pkg/agent/nodeportlocal/npl_agent_test.go Outdated Show resolved Hide resolved
pkg/agent/nodeportlocal/npl_agent_test.go Outdated Show resolved Hide resolved
pkg/agent/nodeportlocal/portcache/port_table_linux.go Outdated Show resolved Hide resolved
pkg/agent/nodeportlocal/portcache/port_table_linux.go Outdated Show resolved Hide resolved
@XinShuYang
Copy link
Contributor Author

/test-all
/test-windows-all

pkg/agent/nodeportlocal/portcache/port_table.go Outdated Show resolved Hide resolved
pkg/agent/nodeportlocal/portcache/port_table.go Outdated Show resolved Hide resolved
pkg/agent/nodeportlocal/portcache/port_table_linux.go Outdated Show resolved Hide resolved
pkg/agent/nodeportlocal/portcache/port_table_linux.go Outdated Show resolved Hide resolved
pkg/agent/nodeportlocal/portcache/port_table_linux.go Outdated Show resolved Hide resolved
pkg/agent/nodeportlocal/portcache/port_table_windows.go Outdated Show resolved Hide resolved
@XinShuYang
Copy link
Contributor Author

/test-all
/test-windows-all

@XinShuYang XinShuYang force-pushed the npluni branch 2 times, most recently from 9ac1c52 to e50a9e8 Compare July 22, 2022 02:30
@XinShuYang
Copy link
Contributor Author

/test-windows-proxyall-e2e

@XinShuYang
Copy link
Contributor Author

/test-all
/test-windows-all

@XinShuYang
Copy link
Contributor Author

/test-all
/test-windows-all

@tnqn tnqn added the kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. label Aug 2, 2022
@XinShuYang XinShuYang force-pushed the npluni branch 3 times, most recently from 7fedcb5 to bf9ac29 Compare August 2, 2022 14:09
@XinShuYang
Copy link
Contributor Author

Hi @antoninbas , do you have more comments for this PR?

pkg/agent/nodeportlocal/npl_agent_test.go Outdated Show resolved Hide resolved
// irrespective of which protocol is in use.
// In particular we make sure that a given NodePort is never used by more than one Pod.
// One Pod could use multiple Nodeports for different protocol with the same Pod port
// because of the new NPL unification implementation.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that may not be a useful precision

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.

Comment on lines 776 to 777
// TestNodePortAlreadyBoundTo validates that when a port with TCP protocol is already bound to,
// the same port should be selected for NPL if any other protocol is available.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this comment is incorrect.

the same port should be selected for NPL

we don't use the same port

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.

pkg/agent/nodeportlocal/npl_agent_test.go Outdated Show resolved Hide resolved
pkg/agent/nodeportlocal/portcache/port_table.go Outdated Show resolved Hide resolved
@@ -20,9 +20,15 @@ import (
"net"
"sync"

"k8s.io/klog/v2"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removing klog seems to be causing build issues?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it should not be removed.

func (pt *PortTable) getEntryByPodIPPortProto(ip string, port int, protocol string) *NodePortData {
return pt.PodEndpointTable[podIPPortProtoFormat(ip, port, protocol)]
}

func (pt *PortTable) GetEntry(ip string, port int, protocol string) *NodePortData {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function is duplicated for the Linux and Windows implementations I think. I don't a difference between the 2 versions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function has been moved to port_table.go. So both linux and windows implementation can call it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you made this change, I don't see it?

Comment on lines 150 to 152
var protocolSocketData *ProtocolSocketData
protocolSocketData = &data.Protocols[0]
protocolSocketData = &data.Protocol
if protocolSocketData != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand what we need protocolSocketData for here. I don't see similar logic in the Linux version?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This data structure is shared by both linux and windows. Do you think we need to build a new data structure without socket variable for windows?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I don't understand your reply.

Let me rephrase my question. This is the corresponding code for Linux:

	if err := pt.PodPortRules.DeleteRule(data.NodePort, podIP, podPort, protocol); err != nil {
		return err
	}

why do we need to check protocolSocketData / data.Protocol for Windows:

	var protocolSocketData *ProtocolSocketData
	protocolSocketData = &data.Protocols[0]
	protocolSocketData = &data.Protocol
	if protocolSocketData != nil {
		if err := pt.PodPortRules.DeleteRule(data.NodePort, podIP, podPort, protocol); err != nil {
			return err
		}
	}
  1. we don't seem to actually use protocolSocketData
  2. is it even possible for protocolSocketData to be nil? I am not sure, I don't see any other place where we check for this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@antoninbas I see, yes, protocolSocketData will never be nil if we can successfully get entry from cache table. I have removed it, thanks.

pt.addPortTableCache(npData)
} else {
// Only add rules for if the entry does not exist.
return 0, fmt.Errorf("existed windows nodeport entry for %s:%d:%s", podIP, podPort, protocol)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wrong

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.

}

pt.NodePortTable[NodePortProtoFormat(nodePort, protocol)] = npData
pt.PodEndpointTable[podIPPortProtoFormat(podIP, podPort, protocol)] = npData
pt.addPortTableCache(npData)
} else {
// Only add rules for if the entry does not exist.
return 0, fmt.Errorf("existed windows nodeport entry for %s:%d:%s", podIP, podPort, protocol)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return 0, fmt.Errorf("existed windows nodeport entry for %s:%d:%s", podIP, podPort, protocol)
return 0, fmt.Errorf("existing Windows NodePort entry for %s:%d:%s", podIP, podPort, protocol)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated, thanks.

@XinShuYang XinShuYang force-pushed the npluni branch 3 times, most recently from 16d5be1 to f00f3ae Compare August 4, 2022 23:16
@XinShuYang
Copy link
Contributor Author

/test-all
/test-windows-all

@antoninbas antoninbas added the action/release-note Indicates a PR that should be included in release notes. label Aug 4, 2022
Comment on lines 150 to 152
var protocolSocketData *ProtocolSocketData
protocolSocketData = &data.Protocols[0]
protocolSocketData = &data.Protocol
if protocolSocketData != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I don't understand your reply.

Let me rephrase my question. This is the corresponding code for Linux:

	if err := pt.PodPortRules.DeleteRule(data.NodePort, podIP, podPort, protocol); err != nil {
		return err
	}

why do we need to check protocolSocketData / data.Protocol for Windows:

	var protocolSocketData *ProtocolSocketData
	protocolSocketData = &data.Protocols[0]
	protocolSocketData = &data.Protocol
	if protocolSocketData != nil {
		if err := pt.PodPortRules.DeleteRule(data.NodePort, podIP, podPort, protocol); err != nil {
			return err
		}
	}
  1. we don't seem to actually use protocolSocketData
  2. is it even possible for protocolSocketData to be nil? I am not sure, I don't see any other place where we check for this.

// TestNodePortAlreadyBoundTo validates that when a port is already bound to, a different port will
// be selected for NPL.
// TestNodePortAlreadyBoundTo validates that when a port with TCP protocol is already bound to,
// the next port should be selected for NPL if the same protocol is available.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// the next port should be selected for NPL if the same protocol is available.
// the next sequential TCP port should be selected for NPL when it is available.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.

socket: socket,
})
func openSocketsForPort(localPortOpener LocalPortOpener, port int, protocol string) (ProtocolSocketData, error) {
// Port needs to be only available for the protocol used by NPL rule.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Port needs to be only available for the protocol used by NPL rule.
// Port only needs to be available for the protocol used by the NPL rule.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.

func (pt *PortTable) getEntryByPodIPPortProto(ip string, port int, protocol string) *NodePortData {
return pt.PodEndpointTable[podIPPortProtoFormat(ip, port, protocol)]
}

func (pt *PortTable) GetEntry(ip string, port int, protocol string) *NodePortData {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you made this change, I don't see it?

return nil
}

func (pt *PortTable) delPortTableCache(npData *NodePortData) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/del/delete

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.

return fmt.Sprintf("%s:%d:%s", ip, port, protocol)
}

func (pt *PortTable) getEntryByPodIPPortProto(ip string, port int, protocol string) *NodePortData {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you made this change, I don't see it?
@antoninbas I moved getEntryByPodIPPortProto here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was talking about GetEntry

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My bad. Move it to port_table.go, thanks.

* Unify agent behavior across Linux and Windows. Linux agent should support
allocating different nodeports for different protocols when the podports are the same.
* Replace map with cache.indexer for cachetable to reduce repeated insertion.
* Update port allocation related unit tests.
* Enable windows e2e test.
* Delete unused functions.

Signed-off-by: Shuyang Xin <gavinx@vmware.com>
@XinShuYang
Copy link
Contributor Author

/test-all
/test-windows-all

Copy link
Contributor

@antoninbas antoninbas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@XinShuYang
Copy link
Contributor Author

/test-windows-e2e
/test-windows-proxyall-e2e

@XinShuYang
Copy link
Contributor Author

/test-windows-proxyall-e2e

@XinShuYang XinShuYang requested a review from tnqn August 7, 2022 11:34
Copy link
Contributor

@wenyingd wenyingd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@tnqn
Copy link
Member

tnqn commented Aug 8, 2022

/skip-integration tested manually

@tnqn tnqn merged commit cdd98b4 into antrea-io:main Aug 8, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
action/release-note Indicates a PR that should be included in release notes. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants