Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Init container for vpc-cni fails on startup on system with multiple/changed enis on EC2 #1323

Closed
davidgibbons opened this issue Dec 11, 2020 · 2 comments
Labels

Comments

@davidgibbons
Copy link

What happened:
CNI plugin is failing to launch on our hosts with two ENI already. This is part of how we have our ETCD hosts setup, they boot up with a random eni, and then allocation a known ENI for consistent IP allocation.

Attach logs

From the logs of the init container:

~ ❯ ~/bin/k -n kube-system logs aws-node-kwpdh -c aws-vpc-cni-init                                                                                                                               
+ PLUGIN_BINS='loopback portmap bandwidth aws-cni-support.sh'
+ for b in '$PLUGIN_BINS'
+ '[' '!' -f loopback ']'
Copying CNI plugin binaries ...
+ for b in '$PLUGIN_BINS'
+ '[' '!' -f portmap ']'
+ for b in '$PLUGIN_BINS'
+ '[' '!' -f bandwidth ']'
+ for b in '$PLUGIN_BINS'
+ '[' '!' -f aws-cni-support.sh ']'
+ HOST_CNI_BIN_PATH=/host/opt/cni/bin
+ echo 'Copying CNI plugin binaries ... '
+ for b in '$PLUGIN_BINS'
+ install loopback /host/opt/cni/bin
+ for b in '$PLUGIN_BINS'
+ install portmap /host/opt/cni/bin
+ for b in '$PLUGIN_BINS'
+ install bandwidth /host/opt/cni/bin
+ for b in '$PLUGIN_BINS'
+ install aws-cni-support.sh /host/opt/cni/bin
+ echo 'Configure rp_filter loose... '
++ curl -X PUT http://169.254.169.254/latest/api/token -H 'X-aws-ec2-metadata-token-ttl-seconds: 60'
Configure rp_filter loose...
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100    56  100    56    0     0  56000      0 --:--:-- --:--:-- --:--:-- 56000
+ TOKEN=AQAEAIRBUrXmr38hU9BstXIoL9aDWAWfZGe_7D9WB7wN-IaHU4i9Dw==
++ curl -H 'X-aws-ec2-metadata-token: AQAEAIRBUrXmr38hU9BstXIoL9aDWAWfZGe_7D9WB7wN-IaHU4i9Dw==' http://169.254.169.254/latest/meta-data/local-ipv4
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100    11  100    11    0     0  11000      0 --:--:-- --:--:-- --:--:-- 11000
+ HOST_IP=10.21.4.253
++ ip -4 -o a
++ awk '{print $2}'
++ grep 10.21.4.253
+ PRIMARY_IF=

From the pod describe:

  - containerID: docker://9e080ce170c35a6650aed7b3ba49e06e95ce3da44e2832902efe061dd9c7cb5b
    image: 602401143452.dkr.ecr.us-west-2.amazonaws.com/amazon-k8s-cni-init:v1.7.3
    imageID: docker-pullable://602401143452.dkr.ecr.us-west-2.amazonaws.com/amazon-k8s-cni-init@sha256:13469f302daa41aeda14438b08513b063e1ac34be2eb9b0e1e139e92d2b62927
    lastState:
      terminated:
        containerID: docker://9e080ce170c35a6650aed7b3ba49e06e95ce3da44e2832902efe061dd9c7cb5b
        exitCode: 1
        finishedAt: "2020-12-11T00:10:47Z"
        reason: Error
        startedAt: "2020-12-11T00:10:46Z"
    name: aws-vpc-cni-init
    ready: false
    restartCount: 12
    state:
      waiting:
        message: back-off 5m0s restarting failed container=aws-vpc-cni-init pod=aws-node-kwpdh_kube-system(419b7edc-f371-4a1a-b60c-3971fdc7c3a3)
        reason: CrashLoopBackOff

What you expected to happen:
CNI plugin runs and doesn't error out at start up

How to reproduce it (as minimally and precisely as possible):
Bring up a system with one ENI, then attach a new/replacement one and detach the original, then add the vpc-cni.

Anything else we need to know?:
Based on my manual testing, this appears to work with the latest regex committed to master in 5d05d33
I think we just need a new bug release cut.

Environment:
Kubernetes: v1.17.5
CNI: 1.7.5
OS: Flatcar 2492.0.0
Kernel: 5.4.35-flatcar

  • Kubernetes version (use kubectl version):
  • CNI Version
  • OS (e.g: cat /etc/os-release):
  • Kernel (e.g. uname -a):
@jayanthvn
Copy link
Contributor

Hi @davidgibbons

Thanks for testing out the commit and glad it fixes. Yes will take it up in the next release, I will be able to provide you the timeline by next week.

@jayanthvn
Copy link
Contributor

#1311 is merged to master and add to next release milestone. So closing this issue for now. Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants