Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support EndpointSlice in Antrea proxy #1703

Merged
merged 2 commits into from
Feb 11, 2021

Conversation

hongliangl
Copy link
Contributor

@hongliangl hongliangl commented Dec 24, 2020

Support EndpointSlice in AntreaProxy, ServiceTopology is not supported in this commit

@codecov-io
Copy link

codecov-io commented Dec 24, 2020

Codecov Report

❗ No coverage uploaded for pull request base (main@60d1d28). Click here to learn what that means.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##             main    #1703   +/-   ##
=======================================
  Coverage        ?   52.80%           
=======================================
  Files           ?      188           
  Lines           ?    16347           
  Branches        ?        0           
=======================================
  Hits            ?     8632           
  Misses          ?     6856           
  Partials        ?      859           
Flag Coverage Δ
e2e-tests 26.78% <0.00%> (?)
unit-tests 42.12% <0.00%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

@hongliangl hongliangl force-pushed the endpointslice branch 2 times, most recently from b8fbd91 to d0e83a1 Compare December 27, 2020 02:20
@hongliangl hongliangl force-pushed the endpointslice branch 2 times, most recently from 57cbc6c to 3866e27 Compare January 11, 2021 09:44
Makefile Outdated Show resolved Hide resolved
ci/check-manifest.sh Outdated Show resolved Hide resolved
cmd/antrea-agent/agent.go Outdated Show resolved Hide resolved
pkg/agent/proxy/endpoints.go Outdated Show resolved Hide resolved
Copy link
Contributor

@antoninbas antoninbas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM. What is the behavior if some enabled the Antrea EndpointSlice feature gate in a K8s cluster which does not support the EndpointSlice API?

Makefile Outdated Show resolved Hide resolved
docs/feature-gates.md Outdated Show resolved Hide resolved
docs/feature-gates.md Outdated Show resolved Hide resolved
hack/generate-manifest.sh Outdated Show resolved Hide resolved
hack/generate-manifest.sh Outdated Show resolved Hide resolved
pkg/agent/proxy/endpointslicecache.go Show resolved Hide resolved
pkg/features/antrea_features.go Outdated Show resolved Hide resolved
.github/workflows/kind.yml Outdated Show resolved Hide resolved
pkg/agent/proxy/proxier.go Outdated Show resolved Hide resolved
docs/feature-gates.md Show resolved Hide resolved
pkg/agent/proxy/endpoints.go Outdated Show resolved Hide resolved
pkg/agent/proxy/endpoints.go Outdated Show resolved Hide resolved
@@ -105,6 +109,7 @@ func (p *proxier) removeStaleServices() {
}
}
delete(p.serviceInstalledMap, svcPortName)
delete(p.endpointInstalledMap, svcPortName)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a bug fix for memory leak?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tnqn Yes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@weiqiangt is it still supposed to remove the endpointInstalledMap here given that you also do it in removeStaleEndpoints?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hongliangl please reply here so we have the context to discuss.
Moving your reply here:

is it still supposed to remove the endpointInstalledMap here given that you also do it in removeStaleEndpoints?

Here removes all endpoints related with a stale service. Function removeStaleEndpoints removes stale endpoints of a normal service.

@weiqiangt From my understanding of #1815, you need the endpointInstalledMap to be there so that you can identify stale endpoints to delete, right? I think deleting it here cause openflow leak?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I see that in this version, endpoints are still removed in removeStaleServices so it makes sense to delete p.endpointInstalledMap.
There will be a conflict between this PR and #1815. @weiqiangt could you cooperate with @hongliangl to see how to merge these PRs?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it still supposed to remove the endpointInstalledMap here given that you also do it in removeStaleEndpoints?

Here removes all endpoints related with a stale service. Function removeStaleEndpoints removes stale endpoints of a normal service.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @tnqn. I think this PR does not have many changes on the syncProxyRules and its sub-calls. I think we can first merge #1815 and then rebase #1703 onto #1815 without this line.

pkg/agent/proxy/proxier.go Outdated Show resolved Hide resolved
pkg/agent/proxy/proxier.go Outdated Show resolved Hide resolved
cmd/antrea-agent/agent.go Outdated Show resolved Hide resolved
Copy link
Contributor

@antoninbas antoninbas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code LGTM. I will defer to @tnqn for final approval.
One of my comments has not been addressed: I do not see any mention anywhere of the behavior if 1) AntreaProxy and EndpointSlice feature gates are enabled; 2) the EndpointSlice API is disabled in K8s. IMO, the behavior should at least be mentioned in docs/feature-gates.md and should probably also be included in the commit message.

docs/feature-gates.md Outdated Show resolved Hide resolved
hack/generate-manifest.sh Outdated Show resolved Hide resolved
Copy link
Member

@tnqn tnqn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM

.github/workflows/kind.yml Outdated Show resolved Hide resolved
pkg/agent/proxy/endpointslicecache.go Outdated Show resolved Hide resolved
@@ -106,6 +108,11 @@ case $key in
PROXY=false
shift
;;
--endpointslice)
PROXY=true
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@antoninbas AntreaProxy and EndpointSlice can be enabled at the same time. However it's unnecessary to enable both arguments. We can see here if EndpointSlice is enabled, AntreaProxy will also be set as enabled.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, is this the answer to one of my questions? I'm confused

Copy link
Contributor Author

@hongliangl hongliangl Jan 20, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One of my comments has not been addressed: I do not see any mention anywhere of the behavior if 1) AntreaProxy and EndpointSlice feature gates are enabled; 2) the EndpointSlice API is disabled in K8s. IMO, the behavior should at least be mentioned in docs/feature-gates.md and should probably also be included in the commit message.

@antoninbas This is the answer of question 1).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am asking about the behavior when EndpointSlice is enabled in Antrea but the EndpointSlice API is disabled / not supported in Kubernetes. Is there a crash? Is there an error log message? Is it a no-op?

Copy link
Contributor Author

@hongliangl hongliangl Jan 20, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@antoninbas The restriction of using EndpointSlice in Antrea should be explained in docs. The explain is that this feature can be set as enabled if only that Kubernetes 1.16 (alpha) or later with EndpointSlice enabled. For users of Antrea, the behavior when EndpointSlice is enabled in Antrea but the EndpointSlice API is disabled / not supported in Kubernetes should be avoid. So far, there is not an exact evaluation about the unwanted behavior.

Copy link

@vicky-liu vicky-liu Jan 22, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So far, there is not an exact evaluation about the unwanted behavior.

Could we make a quick test to have a clear idea on that?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've tested this scenario on a single node minikube cluster with Kubernetes v1.16.1.
The agent was not crash and no services/endpoints could be watched, and thus the agent can not connect to the controller.
There are logs in the agent indicate that the EndpointSlice resource is unavailable.

E0202 04:53:35.852020       1 reflector.go:178] pkg/mod/github.com/tnqn/client-go@v0.18.4-1/tools/cache/reflector.go:125: Failed to list *v1beta1.EndpointSlice: the server could not find the requested resource (get endpointslices.discovery.k8s.io)

I think we can first try to check if the EndpointSlice resource exists before really starting Antrea Proxy with EndpointSlice enabled.

@hongliangl hongliangl force-pushed the endpointslice branch 6 times, most recently from f0204ae to 77304a6 Compare January 20, 2021 16:33
cmd/antrea-agent/agent.go Outdated Show resolved Hide resolved
pkg/agent/proxy/proxier.go Outdated Show resolved Hide resolved
docs/feature-gates.md Outdated Show resolved Hide resolved
docs/feature-gates.md Outdated Show resolved Hide resolved
build/yamls/base/conf/antrea-agent.conf Show resolved Hide resolved
cmd/antrea-agent/agent.go Outdated Show resolved Hide resolved
@@ -0,0 +1,364 @@
/*
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would agree!

Also thinking should we add comments for the changes we made to the original copied files, e.g. at the function level? Or there might be too many changes to track in this way?

@hongliangl hongliangl force-pushed the endpointslice branch 3 times, most recently from 1da8bbc to e7b242e Compare February 3, 2021 07:05
@hongliangl
Copy link
Contributor Author

./skip-all

@hongliangl hongliangl force-pushed the endpointslice branch 2 times, most recently from 56d7762 to 81639b8 Compare February 5, 2021 06:07
@@ -105,6 +109,7 @@ func (p *proxier) removeStaleServices() {
}
}
delete(p.serviceInstalledMap, svcPortName)
delete(p.endpointInstalledMap, svcPortName)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@weiqiangt is it still supposed to remove the endpointInstalledMap here given that you also do it in removeStaleEndpoints?

@@ -276,6 +281,9 @@ func (p *proxier) installServices() {
}

p.serviceInstalledMap[svcPortName] = svcPort
for _, endpoint := range endpointUpdateList {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this needed? Haven't L210-L213 done it?

Copy link
Contributor Author

@hongliangl hongliangl Feb 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this needed? Haven't L210-L213 done it?

Agreed, it is not needed since endpointUpdateList is not updated after L210-L213.

Copy link
Contributor Author

@hongliangl hongliangl Feb 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it still supposed to remove the endpointInstalledMap here given that you also do it in removeStaleEndpoints?

Here removes all endpoints related with a stale service. Function removeStaleEndpoints removes stale endpoints of a normal service.

@hongliangl hongliangl force-pushed the endpointslice branch 2 times, most recently from 867d6fe to 0ba1d22 Compare February 10, 2021 07:29
The EndpointSlice API version that AntreaProxy supports is v1beta1 for
now, and other EndpointSlice API versions are not supported. Endpoint
condition Serving,Terminating as well as ServiceTopology is not
supported in this commit.
tnqn
tnqn previously approved these changes Feb 10, 2021
Copy link
Member

@tnqn tnqn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, LGTM

jianjuns
jianjuns previously approved these changes Feb 10, 2021
Copy link
Contributor

@jianjuns jianjuns left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just have some nits on comments, logs, and documentation. I am fine to merge the PR first, given Hongliang should be taking holiday leave already.


`EndpointSlice` enables Service EndpointSlice support in AntreaProxy. The
EndpointSlice API was introduced in Kubernetes 1.16 (alpha) and it is enabled
by default in Kubernetes 1.17 (beta). This flag will take no effect if AntreaProxy
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably clearer to replace "This flag" with "The EndpointSlice feature gate".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for advice, I will handle today.

by default in Kubernetes 1.17 (beta). This flag will take no effect if AntreaProxy
is not enabled. The endpoint conditions of `Serving` and `Terminating` are not
supported currently. ServiceTopology is not supported either. Refer to this [link](https://kubernetes.io/docs/tasks/administer-cluster/enabling-endpointslices/)
for more information. EndpointSlice API version that AntreaProxy supports is v1beta1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The EndpointSlice API version

for more information. EndpointSlice API version that AntreaProxy supports is v1beta1
currently, and other EndpointSlice API versions are not supported. If EndpointSlice is
enabled in AntreaProxy, but EndpointSlice API is disabled in Kubernetes or EndpointSlice
API version v1beta1 is not supported in Kubernetes, error messages will be logged by
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably "Antrea Agent will log an error message and ..."

}

if _, _, err := endpointSliceCacheKeys(endpointSlice); err != nil {
klog.Warningf("Error getting endpoint slice cache keys: %v", err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better to use the same format for EndpointService (rather than all endpointSlice, EndpointSlice, endpoint slice..)

// limitations under the License.
//
// Original file https://raw.githubusercontent.com/kubernetes/kubernetes/0c0d4fea8dd6bdcd16b9e1d35da3f7d209341a6f/pkg/proxy/endpointslicecache.go
// If this file is located in third_party, there will be an import cycle issue when build Antrea as this file import
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when build -> when building

import -> imports

@antoninbas antoninbas added this to the Antrea v0.13.0 release milestone Feb 10, 2021
@antoninbas
Copy link
Contributor

/test-all
/test-ipv6-all
/test-ipv6-only-all

@antoninbas
Copy link
Contributor

All the test have passed. @hongliangl let me know if you only make documentation changes and log message changes (please consider pushing a separate commit instead of amending the existing one) and I will not need to run the IPv6 and dual-stack tests again.

@hongliangl hongliangl dismissed stale reviews from jianjuns and tnqn via 57e5e8f February 11, 2021 01:26
Including files:
- docs/feature-gates.md
- pkg/agent/proxy/endpoints.go
- pkg/agent/proxy/endpointslicecache.go
@hongliangl
Copy link
Contributor Author

All the test have passed. @hongliangl let me know if you only make documentation changes and log message changes (please consider pushing a separate commit instead of amending the existing one) and I will not need to run the IPv6 and dual-stack tests again.

I have pushed a separate commit that only includes documentation and log message changes. @antoninbas

Copy link
Contributor

@jianjuns jianjuns left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for addressing the comments!

@antoninbas
Copy link
Contributor

/skip-all
all the tests passed before the last commit

@antoninbas
Copy link
Contributor

Thanks @hongliangl I will merge this - enjoy your holiday!

@antoninbas antoninbas merged commit 5037859 into antrea-io:main Feb 11, 2021
@hongliangl
Copy link
Contributor Author

Thanks for everyone's help and advice, I've learned a lot.

@hongliangl hongliangl deleted the endpointslice branch April 28, 2022 11:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants