Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BPF map entries not being removed #131

Closed
alemuro opened this issue Nov 7, 2023 · 3 comments · Fixed by #133
Closed

BPF map entries not being removed #131

alemuro opened this issue Nov 7, 2023 · 3 comments · Fixed by #133
Labels
bug Something isn't working

Comments

@alemuro
Copy link

alemuro commented Nov 7, 2023

What happened:

Hello, since I've enabled AWS VPC CNI Network Policies I've detected that some nodes in my EKS cluster fails randomly. After debugging a bit, I saw that the aws-eks-nodeagent container is creating a lot of open_files/processes. This causes the node to be unresponsive after a long time (some hours), when services cannot create more files.

Attach logs

I've entered the aws-eks-nodeagent and I saw the following logs. Apparently, seems like the container is unable to delete entries from the BPF map.

$ tail -f /var/log/aws-routed-eni/ebpf-sdk.log 
{"level":"error","ts":"2023-11-07T14:09:31.711Z","caller":"conntrack/conntrack_client.go:131","msg":"unable to delete map entry and ret -1 and err no such file or directory"}
{"level":"error","ts":"2023-11-07T14:09:31.711Z","caller":"conntrack/conntrack_client.go:131","msg":"unable to delete map entry and ret -1 and err no such file or directory"}
{"level":"error","ts":"2023-11-07T14:09:31.711Z","caller":"conntrack/conntrack_client.go:131","msg":"unable to delete map entry and ret -1 and err no such file or directory"}
{"level":"error","ts":"2023-11-07T14:09:31.711Z","caller":"conntrack/conntrack_client.go:131","msg":"unable to delete map entry and ret -1 and err no such file or directory"}
{"level":"error","ts":"2023-11-07T14:09:31.711Z","caller":"conntrack/conntrack_client.go:131","msg":"unable to delete map entry and ret -1 and err no such file or directory"}
{"level":"error","ts":"2023-11-07T14:09:31.711Z","caller":"conntrack/conntrack_client.go:131","msg":"unable to delete map entry and ret -1 and err no such file or directory"}
{"level":"error","ts":"2023-11-07T14:09:31.711Z","caller":"conntrack/conntrack_client.go:131","msg":"unable to delete map entry and ret -1 and err no such file or directory"}
{"level":"error","ts":"2023-11-07T14:09:31.711Z","caller":"conntrack/conntrack_client.go:131","msg":"unable to delete map entry and ret -1 and err no such file or directory"}
{"level":"error","ts":"2023-11-07T14:09:31.711Z","caller":"conntrack/conntrack_client.go:131","msg":"unable to delete map entry and ret -1 and err no such file or directory"}
{"level":"error","ts":"2023-11-07T14:09:31.711Z","caller":"conntrack/conntrack_client.go:131","msg":"unable to delete map entry and ret -1 and err no such file or directory"}

What you expected to happen:

How to reproduce it (as minimally and precisely as possible):

Anything else we need to know?:

This an "empty" cluster with one application and the following components:

  • Karpenter
  • EFS CSI driver

There are monitoring and ingress tools as well.

Environment:

  • Kubernetes version (use kubectl version): v1.27.7
  • CNI Version: v1.15.3
  • Network Policy Agent Version: v1.0.5
  • OS (e.g: cat /etc/os-release): Amazon Linux 2
  • Kernel (e.g. uname -a): 5.10.197-186.748.amzn2.x86_64
@alemuro alemuro added the bug Something isn't working label Nov 7, 2023
@jayanthvn
Copy link
Contributor

jayanthvn commented Nov 7, 2023

@alemuro - Can you please confirm if the enable-policy-event-logs is disabled or enabled? Do you still have the node? if so can you collect node logs via /opt/cni/bin/aws-cni-support.sh , o/p from bpftool map show and mail them to k8s-awscni-triage@amazon.com

@jayanthvn
Copy link
Contributor

Nvm, we were able to repro the issue and have a possible fix. Right now mitigation would be to use v1.15.1 i.e, with agent version v1.0.4.

@alemuro
Copy link
Author

alemuro commented Nov 8, 2023

Hello @jayanthvn , I've sent the output of the aws-cni-support.sh script. Unfortunately, the instance is not live anymore so I cannot run the bpftool map show command.

I will try to downgrade to 1.15.1 and see if that fixes the issue. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants