-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Is the network policy for VPC CNI designed to be stateful or stateless? #175
Comments
Moving to Network Policy agent repo |
@khayong the network policy implementation is stateless. What does the policy endpoint object show for this policy? You can see the output with |
Yes you are right, there is no need to explicitly define rules for return traffic. Can you check the number of the entries in the network policy agent's conntrack table when the issue starts to happen? When the issue happens is there any pod churn or just the established connections fail after a while? Steps to check -
For example here ID is 5 ->
|
I have also encountered this issue, and it seems to relate to long-lived connections being removed from the conntrack table prematurely. There are other issues in this repository relating to this and the latest version (CNI If you enable policy logging using the below configuration on the VPC CNI (if deployed through the UI, else use the appropriate args in Helm/CLI), you'll see that there's an
|
I think this could very well be the case, as we bin pack on a small number of nodes to keep cost low. This would explain why we did not witness this issue in an our development environment which does not have more than 1 replica per deployment. |
We will have a release candidate image soon if you are willing to try it out to see if it resolves the issue. The official release image containing #179 is targeting mid-January. |
Thanks jayanthvn, it works. With |
I have observed the same behaviour. This is with a single pod in a replicaset so unrelated to the race condition I think. |
@khayong - There will be few seconds(1-2s) delay for the controller to reconcile and attach probes to the new pods. Traffic will be allowed until the probes are attached and then the policy enforcement will take into effect based on the config..in this case probe was probably missing when ingress traffic came in and so no conntrack entry was created. Regarding the 2nd issue, do you have active policy on .54 pod? if yes can you share the PE? |
yes, here it is
|
@khayong we are unable to repro this. Can we get on a call? Are you on Kubernetes channel is so we can connect in #aws-vpc-cni . |
Can you please try with the latest v1.0.8 release? - https://github.com/aws/amazon-vpc-cni-k8s/releases/tag/v1.16.3 |
Closing as |
What happened:
I have created an egress network policy allowing the web pod to establish connections with the backend server pod at port 4000.
While initially operating as intended, after some time, the packet log occasionally registers a DENY entry for certain return traffic.
where 10.0.68.172 is the backend server, 10.0.74.123 is the web server.
To mitigate this issue, I have to define an ephemeral port range for the ingress of the returned traffic, similar to the VPC ACL configuration.
Attach logs
What you expected to happen:
Kubernetes Network Policies are stateful, which means there's often no need to explicitly define rules for return traffic?
How to reproduce it (as minimally and precisely as possible):
Anything else we need to know?:
Environment:
kubectl version
):Server Version: v1.28.4-eks-8cb36c9
v1.16.0-eksbuild.1
cat /etc/os-release
):uname -a
):Linux ip-10-0-64-172.ap-southeast-1.compute.internal 5.10.199-190.747.amzn2.x86_64 aws/amazon-vpc-cni-k8s#1 SMP Sat Nov 4 16:55:14 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
The text was updated successfully, but these errors were encountered: