-
Notifications
You must be signed in to change notification settings - Fork 363
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix Service update processing #4845
Conversation
/test-all |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the PR description, you list a few scenarios that were not supported correctly. For example: "After updating InternalTrafficPolicy, the flows for ClusterIP isn't updated."
Should we add unit test coverage for these scenarios?
pkg/agent/proxy/proxier.go
Outdated
if exists && !needUpdate { | ||
return groupID, true | ||
} | ||
succeed := false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: s/succeed/success
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated
pkg/agent/proxy/proxier.go
Outdated
} | ||
|
||
func serviceExternalAddressesChanged(svcInfo, pSvcInfo *types.ServiceInfo) bool { | ||
return svcInfo.NodePort() != pSvcInfo.NodePort() || !reflect.DeepEqual(svcInfo.LoadBalancerIPStrings(), pSvcInfo.LoadBalancerIPStrings()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: you can use https://pkg.go.dev/golang.org/x/exp/slices#Equal now instead of DeepEqual
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the pointer. I saw we don't have other dependency on this package. Given it's declared experimental and unreliable, I used "k8s.io/utils/strings/slices" which has almost exactly same code.
8f10002
to
1392758
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the PR description, you list a few scenarios that were not supported correctly. For example: "After updating InternalTrafficPolicy, the flows for ClusterIP isn't updated."
Should we add unit test coverage for these scenarios?
@antoninbas all these scenarios were covered by unit test, but the expectations were also wrong. I have added comments how the changes of test code validate the fixes.
mockOFClient.EXPECT().UninstallServiceFlows(svcIP, uint16(svcPort), bindingProtocol).Times(1) | ||
mockOFClient.EXPECT().InstallServiceFlows(groupID, svcIP, uint16(svcPort), bindingProtocol, uint16(0), false, corev1.ServiceTypeClusterIP, false).Times(1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The removal of these lines validates that there is no need to reinstall ClusterIP flows when only NodePort changes.
|
||
mockOFClient.EXPECT().UninstallServiceFlows(loadBalancerIP, uint16(svcPort), bindingProtocol) | ||
mockOFClient.EXPECT().InstallServiceFlows(groupID, loadBalancerIP, uint16(svcPort), bindingProtocol, uint16(0), false, corev1.ServiceTypeLoadBalancer, false).Times(1) | ||
mockRouteClient.EXPECT().DeleteLoadBalancer(loadBalancerIP).Times(1) | ||
mockRouteClient.EXPECT().AddLoadBalancer(loadBalancerIP).Times(1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The removal of these lines validates that there is no need to reinstall LoadBalancerIP flows when only NodePort changes.
@@ -2082,7 +2072,6 @@ func testServiceExternalTrafficPolicyUpdate(t *testing.T, | |||
mockOFClient.EXPECT().InstallServiceFlows(groupID, svcIP, uint16(svcPort), bindingProtocol, uint16(0), true, corev1.ServiceTypeClusterIP, false).Times(1) | |||
|
|||
if svcType == corev1.ServiceTypeNodePort || svcType == corev1.ServiceTypeLoadBalancer { | |||
mockOFClient.EXPECT().InstallEndpointFlows(bindingProtocol, gomock.InAnyOrder(expectedAllEps)).Times(1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The removal of this line validates the 5th bug "Endpoints are installed repeatedly" is fixed.
mockOFClient.EXPECT().UninstallEndpointFlows(bindingProtocol, expectedRemoteEps).Times(1) | ||
mockOFClient.EXPECT().UninstallServiceGroup(groupID).Times(1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The line validates the 4th bug "After updating InternalTrafficPolicy for a ClusterIP Service, the stale group is not removed" is fixed.
mockOFClient.EXPECT().InstallServiceGroup(groupIDLocal, false, expectedLocalEps).Times(1) | ||
mockOFClient.EXPECT().InstallServiceFlows(groupIDLocal, svcIP, uint16(svcPort), bindingProtocol, uint16(0), false, corev1.ServiceTypeClusterIP, false).Times(1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The uninstall and install validate the 3rd bug "After updating InternalTrafficPolicy, the flows for ClusterIP isn't updated" is fixed.
@@ -2344,16 +2335,24 @@ func testServiceStickyMaxAgeSecondsUpdate(t *testing.T, | |||
mockOFClient.EXPECT().InstallEndpointFlows(bindingProtocol, expectedEps).Times(1) | |||
mockOFClient.EXPECT().InstallServiceGroup(groupID, true, expectedEps).Times(1) | |||
mockOFClient.EXPECT().InstallServiceFlows(groupID, svcIP, uint16(svcPort), bindingProtocol, uint16(affinitySeconds), false, corev1.ServiceTypeClusterIP, false).Times(1) | |||
|
|||
mockOFClient.EXPECT().UninstallServiceFlows(svcIP, uint16(svcPort), bindingProtocol).Times(1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The uninstall and install validate the 1st bug "After updating stickyMaxAgeSeconds, the flow for ClusterIP isn't updated" is fixed.
mockOFClient.EXPECT().UninstallServiceFlows(vIP, uint16(svcNodePort), bindingProtocol).Times(1) | ||
mockRouteClient.EXPECT().DeleteNodePort(nodePortAddresses, uint16(svcNodePort), bindingProtocol).Times(1) | ||
mockOFClient.EXPECT().InstallServiceFlows(groupID, vIP, uint16(svcNodePort), bindingProtocol, uint16(updatedAffinitySeconds), false, corev1.ServiceTypeNodePort, false).Times(1) | ||
mockRouteClient.EXPECT().AddNodePort(nodePortAddresses, uint16(svcNodePort), bindingProtocol).Times(1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The uninstall and install validate the 2st bug "After updating stickyMaxAgeSeconds, the flows for NodePort aren't updated" is fixed.
mockOFClient.EXPECT().UninstallServiceFlows(loadBalancerIP, uint16(svcPort), bindingProtocol).Times(1) | ||
mockRouteClient.EXPECT().DeleteLoadBalancer(loadBalancerIP).Times(1) | ||
mockOFClient.EXPECT().InstallServiceFlows(groupID, loadBalancerIP, uint16(svcPort), bindingProtocol, uint16(updatedAffinitySeconds), false, corev1.ServiceTypeLoadBalancer, false).Times(1) | ||
mockRouteClient.EXPECT().AddLoadBalancer(loadBalancerIP).Times(1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The uninstall and install validate the 2st bug "After updating stickyMaxAgeSeconds, the flows for LoadBalancerIPs aren't updated" is fixed.
@@ -410,11 +464,18 @@ func (p *proxier) installServices() { | |||
|
|||
installedSvcPort, ok := p.serviceInstalledMap[svcPortName] | |||
var pSvcInfo *types.ServiceInfo | |||
var needRemoval, needUpdateService, needUpdateEndpoints bool | |||
var needUpdateServiceExternalAddresses, needUpdateService, needUpdateEndpoints bool |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need more flags like these since updating some attributes of a Service will not affect all flows and configurations of a Service?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It may be possible to be more fine-grained, however, I think it's not very worth to save a few calls but introduce many special processing for some infrequent operations. Unless it can be done without adding many complexities, I feel it's unnecessary.
1392758
to
8a0fcfb
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Currently the installServices method is somewhat long and redundant, making it hard to maintain and error-prone. In fact, a few bugs were found in the recent releases which are related to it more or less. While sorting out the code, I found there are actually more bugs in it: 1. After updating stickyMaxAgeSeconds, the flow for ClusterIP isn't updated because the installServiceFlows interface skip updating flows whose cache keys already exist. 2. After updating stickyMaxAgeSeconds, the flows for NodePort and LoadBalancerIPs aren't updated because the installServiceFlows interface is not even called. 3. After updating InternalTrafficPolicy, the flows for ClusterIP isn't updated. 4. After updating InternalTrafficPolicy for a ClusterIP Service, the stale group is not removed. 5. Endpoints are installed repeatedly even though there are already reference counters for them. This patch tries to refactor the method to make it eaiser to understand and maintain, and fixes all the above bugs. It makes the following changes: 1. Code redundancy is reduced with some shareable sub-procedures being extracted to sub-functions. 2. Calculation of Variables that are required by a sub-procedure only are moved to the corresponding sub-function. 2. Repeated code that retrieves the group IDs are removed. 3. The ways of processing ClusterIP, NodePort, and LoadBalancerIPs are unified. 4. A method for installing Endpoints in the same way as uninstalling Endpoints is added. 5. Use needUpdateService to represent all the flows of the Service need update, and use needUpdateServiceExternalAddresses to represent only the flows related to ExternalAddresses need update. Signed-off-by: Quan Tian <qtian@vmware.com>
8a0fcfb
to
c67d40d
Compare
/test-all |
@antoninbas could you take another look at this one? |
/test-ipv6-e2e |
/test-windows-proxyall-e2e |
Currently the installServices method is somewhat long and redundant, making it hard to maintain and error-prone. In fact, a few bugs were found in the recent releases which are related to it more or less. While sorting out the code, I found there are actually more bugs in it: 1. After updating stickyMaxAgeSeconds, the flow for ClusterIP isn't updated because the installServiceFlows interface skip updating flows whose cache keys already exist. 2. After updating stickyMaxAgeSeconds, the flows for NodePort and LoadBalancerIPs aren't updated because the installServiceFlows interface is not even called. 3. After updating InternalTrafficPolicy, the flows for ClusterIP isn't updated. 4. After updating InternalTrafficPolicy for a ClusterIP Service, the stale group is not removed. 5. Endpoints are installed repeatedly even though there are already reference counters for them. This patch tries to refactor the method to make it eaiser to understand and maintain, and fixes all the above bugs. It makes the following changes: 1. Code redundancy is reduced with some shareable sub-procedures being extracted to sub-functions. 2. Calculation of Variables that are required by a sub-procedure only are moved to the corresponding sub-function. 2. Repeated code that retrieves the group IDs are removed. 3. The ways of processing ClusterIP, NodePort, and LoadBalancerIPs are unified. 4. A method for installing Endpoints in the same way as uninstalling Endpoints is added. 5. Use needUpdateService to represent all the flows of the Service need update, and use needUpdateServiceExternalAddresses to represent only the flows related to ExternalAddresses need update. Signed-off-by: Quan Tian <qtian@vmware.com>
Currently the installServices method is somewhat long and redundant, making it hard to maintain and error-prone. In fact, a few bugs were found in the recent releases which are related to it more or less. While sorting out the code, I found there are actually more bugs in it: 1. After updating stickyMaxAgeSeconds, the flow for ClusterIP isn't updated because the installServiceFlows interface skip updating flows whose cache keys already exist. 2. After updating stickyMaxAgeSeconds, the flows for NodePort and LoadBalancerIPs aren't updated because the installServiceFlows interface is not even called. 3. After updating InternalTrafficPolicy, the flows for ClusterIP isn't updated. 4. After updating InternalTrafficPolicy for a ClusterIP Service, the stale group is not removed. 5. Endpoints are installed repeatedly even though there are already reference counters for them. This patch tries to refactor the method to make it eaiser to understand and maintain, and fixes all the above bugs. It makes the following changes: 1. Code redundancy is reduced with some shareable sub-procedures being extracted to sub-functions. 2. Calculation of Variables that are required by a sub-procedure only are moved to the corresponding sub-function. 2. Repeated code that retrieves the group IDs are removed. 3. The ways of processing ClusterIP, NodePort, and LoadBalancerIPs are unified. 4. A method for installing Endpoints in the same way as uninstalling Endpoints is added. 5. Use needUpdateService to represent all the flows of the Service need update, and use needUpdateServiceExternalAddresses to represent only the flows related to ExternalAddresses need update. Signed-off-by: Quan Tian <qtian@vmware.com>
Currently the installServices method is somewhat long and redundant, making it hard to maintain and error-prone. In fact, a few bugs were found in the recent releases which are related to it more or less. While sorting out the code, I found there are actually more bugs in it:
This patch tries to refactor the method to make it eaiser to understand and maintain, and fixes all the above bugs. It makes the following changes: