using `amazon-vpc-cni-k8s` outside eks #2839

is-it-ayush · 2024-03-12T02:15:54Z

What happened:

Hi! I have an ec2 instance & containerd as the container runtime inside a private subnet (which has outbound internet access) in ap-south-1. I have intialized a new cluster with kubeadm init on this master node. It ran successfully. I then wanted to install amazon-vpc-cni as the network manager for my k8s cluster. I ran kubectl apply -f https://raw.githubusercontent.com/aws/amazon-vpc-cni-k8s/master/config/master/aws-k8s-cni.yaml and checked the pods in kubectl get pods -n kube-system. One of the pod created by amazon-vpc-cni-k8s named aws-node-xxxx throws an error when trying to initialise. I did kubectl describe pod aws-node-xxx -n kube-system and I get the following.

Failed to pull image "602401143452.dkr.ecr.us-west-2.amazonaws.com/amazon-k8s-cni-init:v1.16.4": failed to pull and unpack image "602401143452.dkr.ecr.us-west-2.amazonaws.com/amazon-k8s-cni-init:v1.16.4": failed to resolve reference "amazon-k8s-cni-init:v1.16.4": pull access denied, repository does not exist or may require authorization: authorization failed: no basic auth credential

I don't understand why this fails. Is it not possible to use amazon-vpc-cni outside eks in self managed cluster? I also looked around here in issues & it seems like other people had this issue before but I was unable to resolve it myself. Here is my policy k8s_master_ecr inside a k8s_master role which is connected to this master instance via an instance profile,

{
	"Version": "2012-10-17",
	"Statement": [
		{
			"Sid": "K8sECR",
			"Effect": "Allow",
			"Action": [
				"ecr:GetAuthorizationToken",
				"ecr:BatchCheckLayerAvailability",
				"ecr:GetDownloadUrlForLayer",
				"ecr:GetRepositoryPolicy",
				"ecr:DescribeRepositories",
				"ecr:ListImages",
				"ecr:BatchGetImage"
			],
			"Resource": "*"
		}
	]
}

Environment:

Kubernetes version (use kubectl version):

Client Version: v1.29.2
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: v1.29.2

CNI Version: master branch
OS (e.g: cat /etc/os-release):

PRETTY_NAME="Debian GNU/Linux 12 (bookworm)"
NAME="Debian GNU/Linux"
VERSION_ID="12"
VERSION="12 (bookworm)"
VERSION_CODENAME=bookworm
ID=debian
HOME_URL="https://www.debian.org/"
SUPPORT_URL="https://www.debian.org/support"
BUG_REPORT_URL="https://bugs.debian.org/

Kernel (e.g. uname -a): Linux ip-x-x-x-x.ap-south-1.compute.internal 6.1.0-13-cloud-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.55-1 (2023-09-29) x86_64 GNU/Linux

The text was updated successfully, but these errors were encountered:

kwohlfahrt · 2024-03-12T09:23:15Z

We are running the AWS CNI outside of EKS. We also have the AWS credential provider installed, this allows the kubelet to use the instance credentials to pull from private ECR registries. Before Kubernetes 1.28 (I think, might be off by a version), this functionality was bundled as part of the kubelet.

is-it-ayush · 2024-03-12T16:38:55Z

That's intresting @kwohlfahrt! I've never used aws-credential-provider. After reading into it, I have a few questions,

Should I just deploy it by applying all the files with kubectl apply -f listed here on github.com/kubernetes/cloud-provider-aws/tree/master/examples/existing-cluster/base.
Where do I get the binary aws-credential-provider?
Does it work with containerd? I tried manually placing a username and password @ /etc/containerd/config.toml but it didn't work. I was able to manually pul the image with sudo ctr images pull 602401143452.dkr.ecr.ap-south-1.amazonaws.com/amazon-k8s-cni-init:v1.16.4 -u AWS:$TOKEN where TOKEN=$(aws ecr get-login-password --region ap-south-1) but it didn't really seem to fix the above problem.

kwohlfahrt · 2024-03-13T08:27:56Z

Should I just deploy it by applying all the files with kubectl apply -f listed here on github.com/kubernetes/cloud-provider-aws/tree/master/examples/existing-cluster/base.

AFAIK, the credential provider can't be installed by applying manifests, it must be installed to your node, since you must change the kubelet flags to use it. The binary and configuration must be placed on disk, and then the kubelet's flags have to be modified to point to the configuration, and the path to search for the binary. This is documented on this page, which also includes an example config.

Where do I get the binary aws-credential-provider?

Pre-built binaries can be found here (source)

Does it work with containerd?

Yes, we've used it with containerd in the past, though we are using cri-o now. AFAIK, the container runtime never interacts with the credential provider directly - the credential provider is called by the kubelet, which then passes the received credentials on to your container runtime. So it shouldn't matter whether you are using containerd, crio, etc.

is-it-ayush · 2024-03-13T13:22:06Z

Thank you so much @kwohlfahrt! I was able to follow through and resolve this and all the pods are successfully running now. These are the steps I took,

update cloud provider flag @ /etc/kubernetes/manifests/kube-controller-manager.yaml & /etc/kubernetes/manifests/kube-apiserver.yaml with --cloud-provider=external.
- systemctl daemon-reload && systemctl restart kubelet.service
download ecr-credential-provider via curl -o ecr-credential-provider https://storage.googleapis.com/k8s-artifacts-prod/binaries/cloud-provider-aws/v1.29.0/linux/amd64/ecr-credential-provider-linux-amd64.
- mv ecr-credential-provider /usr/bin/ecr-credential-provider
- chmod +x /usr/bin/ecr-credential-provider
Create a credential-config.yaml with the following

apiVersion: kubelet.config.k8s.io/v1
kind: CredentialProviderConfig
providers:
  - name: ecr-credential-provider
    matchImages:
      - "*.dkr.ecr.*.amazonaws.com"
    defaultCacheDuration: "12h"
    apiVersion: credentialprovider.kubelet.k8s.io/v1
    env:

update kubelet start variables @ /etc/systemd/system/kubelet.service.d/aws.conf with the following.

[Service]
Environment="KUBELET_EXTRA_ARGS=--node-ip=<x.x.x.x> --node-labels=node.kubernetes.io/node= --cloud-provider=external --image-credential-provider-config=/home/admin/.aws/ecr-credential-config.yaml --image-credential-provider-bin-dir=/usr/bin"

systemctl daemon-reload && systemctl restart kubelet.service
apply kubectl -f aws-vpc-cni.yaml

github-actions · 2024-03-13T13:22:21Z

This issue is now closed. Comments on closed issues are hard for our team to see.
If you need more assistance, please either tag a team member or open a new issue that references this one.

is-it-ayush · 2024-03-14T05:25:51Z

Hey @kwohlfahrt! It seems this wasn't resolved entirely. As soon as I joined another node I ran into troubles with aws-node pod failing to communicate with ipam from aws-vpc-cni but the logs from ipam didn't indicate any errors so I was unable to understand what's wrong. The setup hasn't changed & I only added one worker (1 master [10.0.32.163], 1 worker [10.0.32.104]) Here's a few outputs from my master node,

kubectl get nodes -A

admin@ip-10-0-32-163:~$ kubectl get nodes -A
NAME                                         STATUS     ROLES           AGE   VERSION
ip-10-0-32-104.ap-south-1.compute.internal   NotReady   <none>          15h   v1.29.2
ip-10-0-32-163.ap-south-1.compute.internal   Ready      control-plane   16h   v1.29.2

kubectl get pods -A

admin@ip-10-0-32-163:~$ kubectl get pods -A
NAMESPACE     NAME                                                                 READY   STATUS             RESTARTS        AGE
kube-system   aws-cloud-controller-manager-khnq6                                   1/1     Running            1 (72m ago)     16h
kube-system   aws-node-56hf4                                                       1/2     CrashLoopBackOff   7 (4m55s ago)   19m
kube-system   aws-node-ghvzc                                                       2/2     Running            2 (72m ago)     16h
kube-system   coredns-76f75df574-rg724                                             0/1     CrashLoopBackOff   34 (63s ago)    16h
kube-system   coredns-76f75df574-svglz                                             0/1     CrashLoopBackOff   7 (4m43s ago)   22m
kube-system   etcd-ip-10-0-32-163.ap-south-1.compute.internal                      1/1     Running            1 (72m ago)     16h
kube-system   kube-apiserver-ip-10-0-32-163.ap-south-1.compute.internal            1/1     Running            2 (72m ago)     16h
kube-system   kube-controller-manager-ip-10-0-32-163.ap-south-1.compute.internal   1/1     Running            2 (72m ago)     16h
kube-system   kube-proxy-kj778                                                     1/1     Running            1 (72m ago)     15h
kube-system   kube-proxy-xgzzf                                                     1/1     Running            1 (72m ago)     16h
kube-system   kube-scheduler-ip-10-0-32-163.ap-south-1.compute.internal            1/1     Running            1 (72m ago)     16h

kubectl describe pods aws-node-56hf4 -n kube-system

Events:
  Type     Reason                 Age                     From               Message
  ----     ------                 ----                    ----               -------
  Warning  MissingIAMPermissions  7m42s (x2 over 7m42s)   aws-node           Unauthorized operation: failed to call ec2:CreateTags due to missing permissions. Please refer https://github.com/aws/amazon-vpc-cni-k8s/blob/master/docs/iam-policy.md to attach relevant policy to IAM role
  Warning  MissingIAMPermissions  6m8s (x2 over 6m9s)     aws-node           Unauthorized operation: failed to call ec2:CreateTags due to missing permissions. Please refer https://github.com/aws/amazon-vpc-cni-k8s/blob/master/docs/iam-policy.md to attach relevant policy to IAM role
  Warning  MissingIAMPermissions  4m38s (x2 over 4m39s)   aws-node           Unauthorized operation: failed to call ec2:CreateTags due to missing permissions. Please refer https://github.com/aws/amazon-vpc-cni-k8s/blob/master/docs/iam-policy.md to attach relevant policy to IAM role
  Warning  MissingIAMPermissions  3m8s (x2 over 3m9s)     aws-node           Unauthorized operation: failed to call ec2:CreateTags due to missing permissions. Please refer https://github.com/aws/amazon-vpc-cni-k8s/blob/master/docs/iam-policy.md to attach relevant policy to IAM role
  Warning  MissingIAMPermissions  98s (x2 over 99s)       aws-node           Unauthorized operation: failed to call ec2:CreateTags due to missing permissions. Please refer https://github.com/aws/amazon-vpc-cni-k8s/blob/master/docs/iam-policy.md to attach relevant policy to IAM role
  Warning  MissingIAMPermissions  8s (x2 over 9s)         aws-node           Unauthorized operation: failed to call ec2:CreateTags due to missing permissions. Please refer https://github.com/aws/amazon-vpc-cni-k8s/blob/master/docs/iam-policy.md to attach relevant policy to IAM role
  Normal   Scheduled              7m46s                   default-scheduler  Successfully assigned kube-system/aws-node-56hf4 to ip-10-0-32-104.ap-south-1.compute.internal
  Normal   Pulled                 7m45s                   kubelet            Container image "602401143452.dkr.ecr.ap-south-1.amazonaws.com/amazon-k8s-cni-init:v1.16.4" already present on machine
  Normal   Created                7m45s                   kubelet            Created container aws-vpc-cni-init
  Normal   Started                7m45s                   kubelet            Started container aws-vpc-cni-init
  Normal   Pulled                 7m44s                   kubelet            Container image "602401143452.dkr.ecr.ap-south-1.amazonaws.com/amazon-k8s-cni:v1.16.4" already present on machine
  Normal   Started                7m44s                   kubelet            Started container aws-eks-nodeagent
  Normal   Created                7m44s                   kubelet            Created container aws-eks-nodeagent
  Normal   Pulled                 7m44s                   kubelet            Container image "602401143452.dkr.ecr.ap-south-1.amazonaws.com/amazon/aws-network-policy-agent:v1.0.8" already present on machine
  Normal   Started                7m44s                   kubelet            Started container aws-node
  Normal   Created                7m44s                   kubelet            Created container aws-node
  Warning  Unhealthy              7m38s                   kubelet            Readiness probe failed: {"level":"info","ts":"2024-03-14T05:02:54.811Z","caller":"/usr/local/go/src/runtime/proc.go:267","msg":"timeout: failed to connect service \":50051\" within 5s"}
  Warning  Unhealthy              7m33s                   kubelet            Readiness probe failed: {"level":"info","ts":"2024-03-14T05:02:59.865Z","caller":"/usr/local/go/src/runtime/proc.go:267","msg":"timeout: failed to connect service \":50051\" within 5s"}
  Warning  Unhealthy              7m28s                   kubelet            Readiness probe failed: {"level":"info","ts":"2024-03-14T05:03:04.915Z","caller":"/usr/local/go/src/runtime/proc.go:267","msg":"timeout: failed to connect service \":50051\" within 5s"}
  Warning  Unhealthy              7m20s                   kubelet            Readiness probe failed: {"level":"info","ts":"2024-03-14T05:03:12.342Z","caller":"/usr/local/go/src/runtime/proc.go:267","msg":"timeout: failed to connect service \":50051\" within 5s"}
  Warning  Unhealthy              7m10s                   kubelet            Readiness probe failed: {"level":"info","ts":"2024-03-14T05:03:22.350Z","caller":"/usr/local/go/src/runtime/proc.go:267","msg":"timeout: failed to connect service \":50051\" within 5s"}
  Warning  Unhealthy              7m                      kubelet            Readiness probe failed: {"level":"info","ts":"2024-03-14T05:03:32.350Z","caller":"/usr/local/go/src/runtime/proc.go:267","msg":"timeout: failed to connect service \":50051\" within 5s"}
  Warning  Unhealthy              6m50s                   kubelet            Readiness probe failed: {"level":"info","ts":"2024-03-14T05:03:42.342Z","caller":"/usr/local/go/src/runtime/proc.go:267","msg":"timeout: failed to connect service \":50051\" within 5s"}
  Warning  Unhealthy              6m40s                   kubelet            Readiness probe failed: {"level":"info","ts":"2024-03-14T05:03:52.347Z","caller":"/usr/local/go/src/runtime/proc.go:267","msg":"timeout: failed to connect service \":50051\" within 5s"}
  Warning  Unhealthy              6m30s                   kubelet            Readiness probe failed: {"level":"info","ts":"2024-03-14T05:04:02.344Z","caller":"/usr/local/go/src/runtime/proc.go:267","msg":"timeout: failed to connect service \":50051\" within 5s"}
  Normal   Killing                6m10s                   kubelet            Container aws-node failed liveness probe, will be restarted
  Warning  Unhealthy              2m40s (x43 over 6m30s)  kubelet            (combined from similar events): Readiness probe failed: {"level":"info","ts":"2024-03-14T05:07:52.354Z","caller":"/usr/local/go/src/runtime/proc.go:267","msg":"timeout: failed to connect service \":50051\" within 5s"

kubectl logs coredns-76f7df574-rg724

admin@ip-10-0-32-163:~$ kubectl logs coredns-76f75df574-rg724 -n kube-system
[INFO] plugin/kubernetes: waiting for Kubernetes API before starting server
[INFO] plugin/kubernetes: waiting for Kubernetes API before starting server
[INFO] plugin/kubernetes: waiting for Kubernetes API before starting server
[INFO] plugin/kubernetes: waiting for Kubernetes API before starting server
[INFO] plugin/kubernetes: waiting for Kubernetes API before starting server
[INFO] plugin/kubernetes: waiting for Kubernetes API before starting server
[INFO] plugin/kubernetes: waiting for Kubernetes API before starting server
[INFO] plugin/kubernetes: waiting for Kubernetes API before starting server
[INFO] plugin/kubernetes: waiting for Kubernetes API before starting server
[WARNING] plugin/kubernetes: starting server with unsynced Kubernetes API
.:53
[INFO] plugin/reload: Running configuration SHA512 = 591cf328cccc12bc490481273e738df59329c62c0b729d94e8b61db9961c2fa5f046dd37f1cf888b953814040d180f52594972691cd6ff41be96639138a43908
CoreDNS-1.11.1
linux/amd64, go1.20.7, ae2bbc2
[ERROR] plugin/errors: 2 1113266275012896724.8518814352627412410. HINFO: read udp 10.0.32.235:46941->10.0.0.2:53: i/o timeout
[ERROR] plugin/errors: 2 1113266275012896724.8518814352627412410. HINFO: read udp 10.0.32.235:48624->10.0.0.2:53: i/o timeout
[ERROR] plugin/errors: 2 1113266275012896724.8518814352627412410. HINFO: read udp 10.0.32.235:35195->10.0.0.2:53: i/o timeout
[ERROR] plugin/errors: 2 1113266275012896724.8518814352627412410. HINFO: read udp 10.0.32.235:36595->10.0.0.2:53: i/o timeout
[ERROR] plugin/errors: 2 1113266275012896724.8518814352627412410. HINFO: read udp 10.0.32.235:37395->10.0.0.2:53: i/o timeout
[ERROR] plugin/errors: 2 1113266275012896724.8518814352627412410. HINFO: read udp 10.0.32.235:53769->10.0.0.2:53: i/o timeout
[ERROR] plugin/errors: 2 1113266275012896724.8518814352627412410. HINFO: read udp 10.0.32.235:39372->10.0.0.2:53: i/o timeout
[ERROR] plugin/errors: 2 1113266275012896724.8518814352627412410. HINFO: read udp 10.0.32.235:49266->10.0.0.2:53: i/o timeout
[INFO] plugin/kubernetes: pkg/mod/k8s.io/client-go@v0.27.4/tools/cache/reflector.go:231: failed to list *v1.Namespace: Get "https://10.96.0.1:443/api/v1/namespaces?limit=500&resourceVersion=0": dial tcp 10.96.0.1:443: i/o timeout
[INFO] plugin/kubernetes: Trace[870704998]: "Reflector ListAndWatch" name:pkg/mod/k8s.io/client-go@v0.27.4/tools/cache/reflector.go:231 (14-Mar-2024 05:10:50.372) (total time: 30001ms):
Trace[870704998]: ---"Objects listed" error:Get "https://10.96.0.1:443/api/v1/namespaces?limit=500&resourceVersion=0": dial tcp 10.96.0.1:443: i/o timeout 30001ms (05:11:20.374)
Trace[870704998]: [30.001959325s] [30.001959325s] END
[ERROR] plugin/kubernetes: pkg/mod/k8s.io/client-go@v0.27.4/tools/cache/reflector.go:231: Failed to watch *v1.Namespace: failed to list *v1.Namespace: Get "https://10.96.0.1:443/api/v1/namespaces?limit=500&resourceVersion=0": dial tcp 10.96.0.1:443: i/o timeout
[INFO] plugin/kubernetes: pkg/mod/k8s.io/client-go@v0.27.4/tools/cache/reflector.go:231: failed to list *v1.Service: Get "https://10.96.0.1:443/api/v1/services?limit=500&resourceVersion=0": dial tcp 10.96.0.1:443: i/o timeout
[INFO] plugin/kubernetes: pkg/mod/k8s.io/client-go@v0.27.4/tools/cache/reflector.go:231: failed to list *v1.EndpointSlice: Get "https://10.96.0.1:443/apis/discovery.k8s.io/v1/endpointslices?limit=500&resourceVersion=0": dial tcp 10.96.0.1:443: i/o timeout
[INFO] plugin/kubernetes: Trace[1121138999]: "Reflector ListAndWatch" name:pkg/mod/k8s.io/client-go@v0.27.4/tools/cache/reflector.go:231 (14-Mar-2024 05:10:50.372) (total time: 30001ms):
Trace[1121138999]: ---"Objects listed" error:Get "https://10.96.0.1:443/api/v1/services?limit=500&resourceVersion=0": dial tcp 10.96.0.1:443: i/o timeout 30001ms (05:11:20.374)
Trace[1121138999]: [30.001824712s] [30.001824712s] END
[ERROR] plugin/kubernetes: pkg/mod/k8s.io/client-go@v0.27.4/tools/cache/reflector.go:231: Failed to watch *v1.Service: failed to list *v1.Service: Get "https://10.96.0.1:443/api/v1/services?limit=500&resourceVersion=0": dial tcp 10.96.0.1:443: i/o timeout
[INFO] plugin/kubernetes: Trace[757947080]: "Reflector ListAndWatch" name:pkg/mod/k8s.io/client-go@v0.27.4/tools/cache/reflector.go:231 (14-Mar-2024 05:10:50.373) (total time: 30001ms):
Trace[757947080]: ---"Objects listed" error:Get "https://10.96.0.1:443/apis/discovery.k8s.io/v1/endpointslices?limit=500&resourceVersion=0": dial tcp 10.96.0.1:443: i/o timeout 30001ms (05:11:20.374)
Trace[757947080]: [30.001669002s] [30.001669002s] END
[ERROR] plugin/kubernetes: pkg/mod/k8s.io/client-go@v0.27.4/tools/cache/reflector.go:231: Failed to watch *v1.EndpointSlice: failed to list *v1.EndpointSlice: Get "https://10.96.0.1:443/apis/discovery.k8s.io/v1/endpointslices?limit=500&resourceVersion=0": dial tcp 10.96.0.1:443: i/o timeout
[ERROR] plugin/errors: 2 1113266275012896724.8518814352627412410. HINFO: read udp 10.0.32.235:59870->10.0.0.2:53: i/o timeout
[ERROR] plugin/errors: 2 1113266275012896724.8518814352627412410. HINFO: read udp 10.0.32.235:36793->10.0.0.2:53: i/o timeout
[INFO] plugin/kubernetes: pkg/mod/k8s.io/client-go@v0.27.4/tools/cache/reflector.go:231: failed to list *v1.Service: Get "https://10.96.0.1:443/api/v1/services?limit=500&resourceVersion=0": dial tcp 10.96.0.1:443: i/o timeout
[INFO] plugin/kubernetes: Trace[308293075]: "Reflector ListAndWatch" name:pkg/mod/k8s.io/client-go@v0.27.4/tools/cache/reflector.go:231 (14-Mar-2024 05:11:21.583) (total time: 30001ms):
Trace[308293075]: ---"Objects listed" error:Get "https://10.96.0.1:443/api/v1/services?limit=500&resourceVersion=0": dial tcp 10.96.0.1:443: i/o timeout 30001ms (05:11:51.584)
Trace[308293075]: [30.00153721s] [30.00153721s] END
[ERROR] plugin/kubernetes: pkg/mod/k8s.io/client-go@v0.27.4/tools/cache/reflector.go:231: Failed to watch *v1.Service: failed to list *v1.Service: Get "https://10.96.0.1:443/api/v1/services?limit=500&resourceVersion=0": dial tcp 10.96.0.1:443: i/o timeout
[INFO] plugin/kubernetes: pkg/mod/k8s.io/client-go@v0.27.4/tools/cache/reflector.go:231: failed to list *v1.EndpointSlice: Get "https://10.96.0.1:443/apis/discovery.k8s.io/v1/endpointslices?limit=500&resourceVersion=0": dial tcp 10.96.0.1:443: i/o timeout
[INFO] plugin/kubernetes: Trace[1924537645]: "Reflector ListAndWatch" name:pkg/mod/k8s.io/client-go@v0.27.4/tools/cache/reflector.go:231 (14-Mar-2024 05:11:21.772) (total time: 30001ms):
Trace[1924537645]: ---"Objects listed" error:Get "https://10.96.0.1:443/apis/discovery.k8s.io/v1/endpointslices?limit=500&resourceVersion=0": dial tcp 10.96.0.1:443: i/o timeout 30001ms (05:11:51.773)
Trace[1924537645]: [30.001441343s] [30.001441343s] END
[ERROR] plugin/kubernetes: pkg/mod/k8s.io/client-go@v0.27.4/tools/cache/reflector.go:231: Failed to watch *v1.EndpointSlice: failed to list *v1.EndpointSlice: Get "https://10.96.0.1:443/apis/discovery.k8s.io/v1/endpointslices?limit=500&resourceVersion=0": dial tcp 10.96.0.1:443: i/o timeout
[INFO] plugin/kubernetes: pkg/mod/k8s.io/client-go@v0.27.4/tools/cache/reflector.go:231: failed to list *v1.Namespace: Get "https://10.96.0.1:443/api/v1/namespaces?limit=500&resourceVersion=0": dial tcp 10.96.0.1:443: i/o timeout
[INFO] plugin/kubernetes: Trace[1601989491]: "Reflector ListAndWatch" name:pkg/mod/k8s.io/client-go@v0.27.4/tools/cache/reflector.go:231 (14-Mar-2024 05:11:21.892) (total time: 30000ms):
Trace[1601989491]: ---"Objects listed" error:Get "https://10.96.0.1:443/api/v1/namespaces?limit=500&resourceVersion=0": dial tcp 10.96.0.1:443: i/o timeout 30000ms (05:11:51.893)
Trace[1601989491]: [30.000541411s] [30.000541411s] END
[ERROR] plugin/kubernetes: pkg/mod/k8s.io/client-go@v0.27.4/tools/cache/reflector.go:231: Failed to watch *v1.Namespace: failed to list *v1.Namespace: Get "https://10.96.0.1:443/api/v1/namespaces?limit=500&resourceVersion=0": dial tcp 10.96.0.1:443: i/o timeout
[INFO] plugin/kubernetes: pkg/mod/k8s.io/client-go@v0.27.4/tools/cache/reflector.go:231: failed to list *v1.Service: Get "https://10.96.0.1:443/api/v1/services?limit=500&resourceVersion=0": dial tcp 10.96.0.1:443: i/o timeout
[INFO] plugin/kubernetes: Trace[1839797281]: "Reflector ListAndWatch" name:pkg/mod/k8s.io/client-go@v0.27.4/tools/cache/reflector.go:231 (14-Mar-2024 05:11:53.729) (total time: 30002ms):
Trace[1839797281]: ---"Objects listed" error:Get "https://10.96.0.1:443/api/v1/services?limit=500&resourceVersion=0": dial tcp 10.96.0.1:443: i/o timeout 30002ms (05:12:23.731)
Trace[1839797281]: [30.002135986s] [30.002135986s] END
[ERROR] plugin/kubernetes: pkg/mod/k8s.io/client-go@v0.27.4/tools/cache/reflector.go:231: Failed to watch *v1.Service: failed to list *v1.Service: Get "https://10.96.0.1:443/api/v1/services?limit=500&resourceVersion=0": dial tcp 10.96.0.1:443: i/o timeout
[INFO] plugin/kubernetes: pkg/mod/k8s.io/client-go@v0.27.4/tools/cache/reflector.go:231: failed to list *v1.Namespace: Get "https://10.96.0.1:443/api/v1/namespaces?limit=500&resourceVersion=0": dial tcp 10.96.0.1:443: i/o timeout
[INFO] plugin/kubernetes: Trace[2131737096]: "Reflector ListAndWatch" name:pkg/mod/k8s.io/client-go@v0.27.4/tools/cache/reflector.go:231 (14-Mar-2024 05:11:54.116) (total time: 30001ms):
Trace[2131737096]: ---"Objects listed" error:Get "https://10.96.0.1:443/api/v1/namespaces?limit=500&resourceVersion=0": dial tcp 10.96.0.1:443: i/o timeout 30000ms (05:12:24.117)
Trace[2131737096]: [30.001094761s] [30.001094761s] END
[ERROR] plugin/kubernetes: pkg/mod/k8s.io/client-go@v0.27.4/tools/cache/reflector.go:231: Failed to watch *v1.Namespace: failed to list *v1.Namespace: Get "https://10.96.0.1:443/api/v1/namespaces?limit=500&resourceVersion=0": dial tcp 10.96.0.1:443: i/o timeout
[INFO] plugin/kubernetes: pkg/mod/k8s.io/client-go@v0.27.4/tools/cache/reflector.go:231: failed to list *v1.EndpointSlice: Get "https://10.96.0.1:443/apis/discovery.k8s.io/v1/endpointslices?limit=500&resourceVersion=0": dial tcp 10.96.0.1:443: i/o timeout
[INFO] plugin/kubernetes: Trace[342939726]: "Reflector ListAndWatch" name:pkg/mod/k8s.io/client-go@v0.27.4/tools/cache/reflector.go:231 (14-Mar-2024 05:11:54.708) (total time: 30001ms):
Trace[342939726]: ---"Objects listed" error:Get "https://10.96.0.1:443/apis/discovery.k8s.io/v1/endpointslices?limit=500&resourceVersion=0": dial tcp 10.96.0.1:443: i/o timeout 30000ms (05:12:24.709)
Trace[342939726]: [30.001121228s] [30.001121228s] END
[ERROR] plugin/kubernetes: pkg/mod/k8s.io/client-go@v0.27.4/tools/cache/reflector.go:231: Failed to watch *v1.EndpointSlice: failed to list *v1.EndpointSlice: Get "https://10.96.0.1:443/apis/discovery.k8s.io/v1/endpointslices?limit=500&resourceVersion=0": dial tcp 10.96.0.1:443: i/o timeout
[INFO] SIGTERM: Shutting down servers then terminating
[INFO] plugin/health: Going into lameduck mode for 5s
[INFO] plugin/kubernetes: Trace[731275138]: "Reflector ListAndWatch" name:pkg/mod/k8s.io/client-go@v0.27.4/tools/cache/reflector.go:231 (14-Mar-2024 05:12:28.220) (total time: 11342ms):
Trace[731275138]: [11.342820089s] [11.342820089s] END
[INFO] plugin/kubernetes: Trace[1946198945]: "Reflector ListAndWatch" name:pkg/mod/k8s.io/client-go@v0.27.4/tools/cache/reflector.go:231 (14-Mar-2024 05:12:28.081) (total time: 11481ms):
Trace[1946198945]: [11.481121164s] [11.481121164s] END
[INFO] plugin/kubernetes: Trace[1707910341]: "Reflector ListAndWatch" name:pkg/mod/k8s.io/client-go@v0.27.4/tools/cache/reflector.go:231 (14-Mar-2024 05:12:27.480) (total time: 12082ms):
Trace[1707910341]: [12.082670995s] [12.082670995s] END

kubectl logs coredns-76f75df574-svglz -n kube-system

admin@ip-10-0-32-163:~$ kubectl logs coredns-76f75df574-svglz -n kube-system
[INFO] plugin/kubernetes: waiting for Kubernetes API before starting server
[INFO] plugin/kubernetes: waiting for Kubernetes API before starting server
[INFO] plugin/kubernetes: waiting for Kubernetes API before starting server
[INFO] plugin/kubernetes: waiting for Kubernetes API before starting server
[INFO] plugin/kubernetes: waiting for Kubernetes API before starting server
[INFO] plugin/kubernetes: waiting for Kubernetes API before starting server
[INFO] plugin/kubernetes: waiting for Kubernetes API before starting server
[INFO] plugin/kubernetes: waiting for Kubernetes API before starting server
[INFO] plugin/kubernetes: waiting for Kubernetes API before starting server
[WARNING] plugin/kubernetes: starting server with unsynced Kubernetes API
.:53
[INFO] plugin/reload: Running configuration SHA512 = 591cf328cccc12bc490481273e738df59329c62c0b729d94e8b61db9961c2fa5f046dd37f1cf888b953814040d180f52594972691cd6ff41be96639138a43908
CoreDNS-1.11.1
linux/amd64, go1.20.7, ae2bbc2
[ERROR] plugin/errors: 2 1600033383188009841.8067679233946884018. HINFO: read udp 10.0.32.13:39153->10.0.0.2:53: i/o timeout
[ERROR] plugin/errors: 2 1600033383188009841.8067679233946884018. HINFO: read udp 10.0.32.13:34390->10.0.0.2:53: i/o timeout
[ERROR] plugin/errors: 2 1600033383188009841.8067679233946884018. HINFO: read udp 10.0.32.13:34202->10.0.0.2:53: i/o timeout
[ERROR] plugin/errors: 2 1600033383188009841.8067679233946884018. HINFO: read udp 10.0.32.13:44007->10.0.0.2:53: i/o timeout
[ERROR] plugin/errors: 2 1600033383188009841.8067679233946884018. HINFO: read udp 10.0.32.13:40443->10.0.0.2:53: i/o timeout
[ERROR] plugin/errors: 2 1600033383188009841.8067679233946884018. HINFO: read udp 10.0.32.13:47108->10.0.0.2:53: i/o timeout
[ERROR] plugin/errors: 2 1600033383188009841.8067679233946884018. HINFO: read udp 10.0.32.13:59620->10.0.0.2:53: i/o timeout
[ERROR] plugin/errors: 2 1600033383188009841.8067679233946884018. HINFO: read udp 10.0.32.13:39071->10.0.0.2:53: i/o timeout
[INFO] plugin/kubernetes: pkg/mod/k8s.io/client-go@v0.27.4/tools/cache/reflector.go:231: failed to list *v1.Namespace: Get "https://10.96.0.1:443/api/v1/namespaces?limit=500&resourceVersion=0": dial tcp 10.96.0.1:443: i/o timeout
[INFO] plugin/kubernetes: Trace[244891391]: "Reflector ListAndWatch" name:pkg/mod/k8s.io/client-go@v0.27.4/tools/cache/reflector.go:231 (14-Mar-2024 05:11:24.389) (total time: 30001ms):
Trace[244891391]: ---"Objects listed" error:Get "https://10.96.0.1:443/api/v1/namespaces?limit=500&resourceVersion=0": dial tcp 10.96.0.1:443: i/o timeout 30001ms (05:11:54.390)
Trace[244891391]: [30.001548794s] [30.001548794s] END
[ERROR] plugin/kubernetes: pkg/mod/k8s.io/client-go@v0.27.4/tools/cache/reflector.go:231: Failed to watch *v1.Namespace: failed to list *v1.Namespace: Get "https://10.96.0.1:443/api/v1/namespaces?limit=500&resourceVersion=0": dial tcp 10.96.0.1:443: i/o timeout
[INFO] plugin/kubernetes: pkg/mod/k8s.io/client-go@v0.27.4/tools/cache/reflector.go:231: failed to list *v1.EndpointSlice: Get "https://10.96.0.1:443/apis/discovery.k8s.io/v1/endpointslices?limit=500&resourceVersion=0": dial tcp 10.96.0.1:443: i/o timeout
[INFO] plugin/kubernetes: Trace[106582316]: "Reflector ListAndWatch" name:pkg/mod/k8s.io/client-go@v0.27.4/tools/cache/reflector.go:231 (14-Mar-2024 05:11:24.389) (total time: 30002ms):
Trace[106582316]: ---"Objects listed" error:Get "https://10.96.0.1:443/apis/discovery.k8s.io/v1/endpointslices?limit=500&resourceVersion=0": dial tcp 10.96.0.1:443: i/o timeout 30001ms (05:11:54.391)
Trace[106582316]: [30.00208516s] [30.00208516s] END
[ERROR] plugin/kubernetes: pkg/mod/k8s.io/client-go@v0.27.4/tools/cache/reflector.go:231: Failed to watch *v1.EndpointSlice: failed to list *v1.EndpointSlice: Get "https://10.96.0.1:443/apis/discovery.k8s.io/v1/endpointslices?limit=500&resourceVersion=0": dial tcp 10.96.0.1:443: i/o timeout
[INFO] plugin/kubernetes: pkg/mod/k8s.io/client-go@v0.27.4/tools/cache/reflector.go:231: failed to list *v1.Service: Get "https://10.96.0.1:443/api/v1/services?limit=500&resourceVersion=0": dial tcp 10.96.0.1:443: i/o timeout
[INFO] plugin/kubernetes: Trace[1365423089]: "Reflector ListAndWatch" name:pkg/mod/k8s.io/client-go@v0.27.4/tools/cache/reflector.go:231 (14-Mar-2024 05:11:24.389) (total time: 30001ms):
Trace[1365423089]: ---"Objects listed" error:Get "https://10.96.0.1:443/api/v1/services?limit=500&resourceVersion=0": dial tcp 10.96.0.1:443: i/o timeout 30001ms (05:11:54.390)
Trace[1365423089]: [30.001969555s] [30.001969555s] END
[ERROR] plugin/kubernetes: pkg/mod/k8s.io/client-go@v0.27.4/tools/cache/reflector.go:231: Failed to watch *v1.Service: failed to list *v1.Service: Get "https://10.96.0.1:443/api/v1/services?limit=500&resourceVersion=0": dial tcp 10.96.0.1:443: i/o timeout
[ERROR] plugin/errors: 2 1600033383188009841.8067679233946884018. HINFO: read udp 10.0.32.13:57291->10.0.0.2:53: i/o timeout
[ERROR] plugin/errors: 2 1600033383188009841.8067679233946884018. HINFO: read udp 10.0.32.13:52147->10.0.0.2:53: i/o timeout
[INFO] plugin/kubernetes: pkg/mod/k8s.io/client-go@v0.27.4/tools/cache/reflector.go:231: failed to list *v1.EndpointSlice: Get "https://10.96.0.1:443/apis/discovery.k8s.io/v1/endpointslices?limit=500&resourceVersion=0": dial tcp 10.96.0.1:443: i/o timeout
[INFO] plugin/kubernetes: Trace[1202752718]: "Reflector ListAndWatch" name:pkg/mod/k8s.io/client-go@v0.27.4/tools/cache/reflector.go:231 (14-Mar-2024 05:11:55.195) (total time: 30000ms):
Trace[1202752718]: ---"Objects listed" error:Get "https://10.96.0.1:443/apis/discovery.k8s.io/v1/endpointslices?limit=500&resourceVersion=0": dial tcp 10.96.0.1:443: i/o timeout 30000ms (05:12:25.196)
Trace[1202752718]: [30.000482356s] [30.000482356s] END
[ERROR] plugin/kubernetes: pkg/mod/k8s.io/client-go@v0.27.4/tools/cache/reflector.go:231: Failed to watch *v1.EndpointSlice: failed to list *v1.EndpointSlice: Get "https://10.96.0.1:443/apis/discovery.k8s.io/v1/endpointslices?limit=500&resourceVersion=0": dial tcp 10.96.0.1:443: i/o timeout
[INFO] plugin/kubernetes: pkg/mod/k8s.io/client-go@v0.27.4/tools/cache/reflector.go:231: failed to list *v1.Service: Get "https://10.96.0.1:443/api/v1/services?limit=500&resourceVersion=0": dial tcp 10.96.0.1:443: i/o timeout
[INFO] plugin/kubernetes: Trace[528314086]: "Reflector ListAndWatch" name:pkg/mod/k8s.io/client-go@v0.27.4/tools/cache/reflector.go:231 (14-Mar-2024 05:11:55.738) (total time: 30004ms):
Trace[528314086]: ---"Objects listed" error:Get "https://10.96.0.1:443/api/v1/services?limit=500&resourceVersion=0": dial tcp 10.96.0.1:443: i/o timeout 30004ms (05:12:25.742)
Trace[528314086]: [30.00474037s] [30.00474037s] END
[ERROR] plugin/kubernetes: pkg/mod/k8s.io/client-go@v0.27.4/tools/cache/reflector.go:231: Failed to watch *v1.Service: failed to list *v1.Service: Get "https://10.96.0.1:443/api/v1/services?limit=500&resourceVersion=0": dial tcp 10.96.0.1:443: i/o timeout
[INFO] plugin/kubernetes: pkg/mod/k8s.io/client-go@v0.27.4/tools/cache/reflector.go:231: failed to list *v1.Namespace: Get "https://10.96.0.1:443/api/v1/namespaces?limit=500&resourceVersion=0": dial tcp 10.96.0.1:443: i/o timeout
[INFO] plugin/kubernetes: Trace[401932378]: "Reflector ListAndWatch" name:pkg/mod/k8s.io/client-go@v0.27.4/tools/cache/reflector.go:231 (14-Mar-2024 05:11:55.919) (total time: 30001ms):
Trace[401932378]: ---"Objects listed" error:Get "https://10.96.0.1:443/api/v1/namespaces?limit=500&resourceVersion=0": dial tcp 10.96.0.1:443: i/o timeout 30001ms (05:12:25.921)
Trace[401932378]: [30.001416591s] [30.001416591s] END
[ERROR] plugin/kubernetes: pkg/mod/k8s.io/client-go@v0.27.4/tools/cache/reflector.go:231: Failed to watch *v1.Namespace: failed to list *v1.Namespace: Get "https://10.96.0.1:443/api/v1/namespaces?limit=500&resourceVersion=0": dial tcp 10.96.0.1:443: i/o timeout
[INFO] plugin/kubernetes: pkg/mod/k8s.io/client-go@v0.27.4/tools/cache/reflector.go:231: failed to list *v1.Service: Get "https://10.96.0.1:443/api/v1/services?limit=500&resourceVersion=0": dial tcp 10.96.0.1:443: i/o timeout
[INFO] plugin/kubernetes: Trace[1029911745]: "Reflector ListAndWatch" name:pkg/mod/k8s.io/client-go@v0.27.4/tools/cache/reflector.go:231 (14-Mar-2024 05:12:27.513) (total time: 30000ms):
Trace[1029911745]: ---"Objects listed" error:Get "https://10.96.0.1:443/api/v1/services?limit=500&resourceVersion=0": dial tcp 10.96.0.1:443: i/o timeout 30000ms (05:12:57.514)
Trace[1029911745]: [30.000923168s] [30.000923168s] END
[ERROR] plugin/kubernetes: pkg/mod/k8s.io/client-go@v0.27.4/tools/cache/reflector.go:231: Failed to watch *v1.Service: failed to list *v1.Service: Get "https://10.96.0.1:443/api/v1/services?limit=500&resourceVersion=0": dial tcp 10.96.0.1:443: i/o timeout
[INFO] plugin/kubernetes: pkg/mod/k8s.io/client-go@v0.27.4/tools/cache/reflector.go:231: failed to list *v1.EndpointSlice: Get "https://10.96.0.1:443/apis/discovery.k8s.io/v1/endpointslices?limit=500&resourceVersion=0": dial tcp 10.96.0.1:443: i/o timeout
[INFO] plugin/kubernetes: Trace[1647125159]: "Reflector ListAndWatch" name:pkg/mod/k8s.io/client-go@v0.27.4/tools/cache/reflector.go:231 (14-Mar-2024 05:12:27.996) (total time: 30003ms):
Trace[1647125159]: ---"Objects listed" error:Get "https://10.96.0.1:443/apis/discovery.k8s.io/v1/endpointslices?limit=500&resourceVersion=0": dial tcp 10.96.0.1:443: i/o timeout 30000ms (05:12:57.997)
Trace[1647125159]: [30.003270334s] [30.003270334s] END
[ERROR] plugin/kubernetes: pkg/mod/k8s.io/client-go@v0.27.4/tools/cache/reflector.go:231: Failed to watch *v1.EndpointSlice: failed to list *v1.EndpointSlice: Get "https://10.96.0.1:443/apis/discovery.k8s.io/v1/endpointslices?limit=500&resourceVersion=0": dial tcp 10.96.0.1:443: i/o timeout
[INFO] plugin/kubernetes: pkg/mod/k8s.io/client-go@v0.27.4/tools/cache/reflector.go:231: failed to list *v1.Namespace: Get "https://10.96.0.1:443/api/v1/namespaces?limit=500&resourceVersion=0": dial tcp 10.96.0.1:443: i/o timeout
[INFO] plugin/kubernetes: Trace[1397932663]: "Reflector ListAndWatch" name:pkg/mod/k8s.io/client-go@v0.27.4/tools/cache/reflector.go:231 (14-Mar-2024 05:12:28.082) (total time: 30000ms):
Trace[1397932663]: ---"Objects listed" error:Get "https://10.96.0.1:443/api/v1/namespaces?limit=500&resourceVersion=0": dial tcp 10.96.0.1:443: i/o timeout 30000ms (05:12:58.083)
Trace[1397932663]: [30.000758193s] [30.000758193s] END
[ERROR] plugin/kubernetes: pkg/mod/k8s.io/client-go@v0.27.4/tools/cache/reflector.go:231: Failed to watch *v1.Namespace: failed to list *v1.Namespace: Get "https://10.96.0.1:443/api/v1/namespaces?limit=500&resourceVersion=0": dial tcp 10.96.0.1:443: i/o timeout
[INFO] SIGTERM: Shutting down servers then terminating
[INFO] plugin/health: Going into lameduck mode for 5s

/var/log/aws-routed-eni/iampd.log:
ipamd-log.tar.gz
/var/log/aws-routed-eni/plugin.log: (worker-node)

{"level":"error","ts":"2024-03-14T04:10:43.568Z","caller":"routed-eni-cni-plugin/cni.go:283","msg":"Error received from DelNetwork gRPC call for container 75d411ca04ea3ea9d079947801458b9938aaf07cbefc8803364c316d28588972: rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing: dial tcp 127.0.0.1:50051: connect: connection refused\""}

I did assign ec2:CreateTags permission which seemed missing & I recreated my entire cluster. The rediness and liveness probes still throw same x.x.x.x:xxx -> 10.x.0.x:53 errors and coredns s unable to get ready.

kwohlfahrt · 2024-03-14T09:40:46Z

Hm, I'm not sure. My only suspicion is you might be hitting #2840 I reported the other day.

You can easily check by connecting to your node and seeing if /run/xtables.lock is a directory - it should be a file. If it is created as a directory, it causes kube-proxy to fail, which means the CNI cannot reach the API server.

You can see the linked PR in that issue for the fix (the volume needs to be defined with type: FileOrCreate), just make sure to SSH to the node and rmdir /run/xtables.lock after applying the fix.

is-it-ayush · 2024-03-15T15:59:03Z

Thank You @kwohlfahrt! I had some missing IAM permissions which I added to master node. It seems though it still hasn't really resolved the problem where "coredns" isn't not being reached apparent from the logs when running kubectl logs coredns-76f75df574-49gs5 -n kube-system. I'm not entirely sure what's causing this.

[ERROR] plugin/errors: 2 4999722014791650549.7690820414208347954. HINFO: read udp 10.0.43.148:57589->10.0.0.2:53: i/o timeout
[ERROR] plugin/errors: 2 4999722014791650549.7690820414208347954. HINFO: read udp 10.0.43.148:38940->10.0.0.2:53: i/o timeout
[INFO] plugin/kubernetes: pkg/mod/k8s.io/client-go@v0.27.4/tools/cache/reflector.go:231: failed to list *v1.EndpointSlice: Get "https://10.96.0.1:443/apis/discovery.k8s.io/v1/endpointslices?limit=500&resourceVersion=0": dial tcp 10.96.0.1:443: i/o timeout

is-it-ayush · 2024-03-21T15:28:27Z

Update! I was really unable to resolve coredns issues with aws-vpc-cni & aws-cloud-controller-manager. There are multiple issues,

It seems like both of them are broken. The controller-manager fails to get providerId from aws cloud for nodes in random order even if you set the hostname to private IPV4 DNS name and add the correct tags. Failing to initialise newly joined nodes or even the master node itself as this leads to the worker nodes getting deleted and master node tainted as NotReady.
The coredns pod fails to run regardless of the first issue and there is no way to debug why. The logs collected by /opt/cni/bin/aws-cni-support.sh are not enough to debug the coredns problem.

I switched to cilium and let go of my dream to connect k8s and aws.

orsenthil · 2024-05-01T20:34:14Z

[INFO] plugin/kubernetes: pkg/mod/k8s.io/client-go@v0.27.4/tools/cache/reflector.go:231: failed to list *v1.EndpointSlice: Get "https://10.96.0.1:443/apis/discovery.k8s.io/v1/endpointslices?limit=500&resourceVersion=0": dial tcp 10.96.0.1:443: i/o timeout

This seems like the coredns pod go the ip-ddress, but it wasn't able to communicate with the API server, due to missing permissions? The nodes/pods should have the ability to communicate with API server with the necessary permissions.

Were you able to narrow down to any permission issue?

is-it-ayush · 2024-05-01T23:49:35Z

[INFO] plugin/kubernetes: pkg/mod/k8s.io/client-go@v0.27.4/tools/cache/reflector.go:231: failed to list *v1.EndpointSlice: Get "https://10.96.0.1:443/apis/discovery.k8s.io/v1/endpointslices?limit=500&resourceVersion=0": dial tcp 10.96.0.1:443: i/o timeout

This seems like the coredns pod go the ip-ddress, but it wasn't able to communicate with the API server, due to missing permissions? The nodes/pods should have the ability to communicate with API server with the necessary permissions.

Were you able to narrow down to any permission issue?

Not really! I really did all I could and scanned all of journalctl to find something. I wrote about it here & I couldn't get aws-vpc-cni working as far as I remember. I double checked permissions and instance roles but it didn't seem like they were a problem.

It seems like both of them are broken. The controller-manager fails to get providerId from aws cloud for nodes in random order even if you set the hostname to private IPV4 DNS name and add the correct tags. Failing to initialise newly joined nodes or even the master node itself as this leads to the worker nodes getting deleted and master node tainted as NotReady.
The coredns pod fails to run regardless of the first issue and there is no way to debug why. The logs collected by /opt/cni/bin/aws-cni-support.sh are not enough to debug the coredns problem.

terryjix · 2024-05-03T16:52:13Z

I am hitting the same issue. the Pod cannot communicate with any endpoints including

coredns
api server
169.254.169.254
etc.

orsenthil · 2024-05-03T17:00:28Z

@terryjix - This is question on setting up VPC CNI on a non EKS cluster. How did you go about with this?

orsenthil · 2024-05-15T20:20:45Z

Closing this due to lack of more information.

github-actions · 2024-05-15T20:21:04Z

This issue is now closed. Comments on closed issues are hard for our team to see.
If you need more assistance, please either tag a team member or open a new issue that references this one.

wtvamp · 2024-09-16T19:29:22Z

This issue needs to be reopened - it seems to be a fairly ubiquitous issue when attempting to use the amazon-vpc-cni in a non-EKS environment.

I've also encountered it (coredns not able to communicate):

[INFO] plugin/kubernetes: waiting for Kubernetes API before starting server
[INFO] plugin/kubernetes: waiting for Kubernetes API before starting server
[INFO] plugin/kubernetes: waiting for Kubernetes API before starting server
[INFO] plugin/kubernetes: waiting for Kubernetes API before starting server
[INFO] plugin/kubernetes: waiting for Kubernetes API before starting server
[INFO] plugin/kubernetes: waiting for Kubernetes API before starting server
[INFO] plugin/kubernetes: waiting for Kubernetes API before starting server
[INFO] plugin/kubernetes: waiting for Kubernetes API before starting server
[INFO] plugin/kubernetes: waiting for Kubernetes API before starting server
[WARNING] plugin/kubernetes: starting server with unsynced Kubernetes API
.:53
[INFO] plugin/reload: Running configuration SHA512 = 591cf328cccc12bc490481273e738df59329c62c0b729d94e8b61db9961c2fa5f046dd37f1cf888b953814040d180f52594972691cd6ff41be96639138a43908
CoreDNS-1.11.3
linux/amd64, go1.21.11, a6338e9
[ERROR] plugin/errors: 2 5717391959630560116.4828385316436471351. HINFO: read udp 10.0.0.75:57241->10.0.0.2:53: i/o timeout
[ERROR] plugin/errors: 2 5717391959630560116.4828385316436471351. HINFO: read udp 10.0.0.75:42295->10.0.0.2:53: i/o timeout
[ERROR] plugin/errors: 2 5717391959630560116.4828385316436471351. HINFO: read udp 10.0.0.75:33996->10.0.0.2:53: i/o timeout
[ERROR] plugin/errors: 2 5717391959630560116.4828385316436471351. HINFO: read udp 10.0.0.75:50361->10.0.0.2:53: i/o timeout
[ERROR] plugin/errors: 2 5717391959630560116.4828385316436471351. HINFO: read udp 10.0.0.75:58932->10.0.0.2:53: i/o timeout
[ERROR] plugin/errors: 2 5717391959630560116.4828385316436471351. HINFO: read udp 10.0.0.75:35147->10.0.0.2:53: i/o timeout
[ERROR] plugin/errors: 2 5717391959630560116.4828385316436471351. HINFO: read udp 10.0.0.75:47365->10.0.0.2:53: i/o timeout
[ERROR] plugin/errors: 2 5717391959630560116.4828385316436471351. HINFO: read udp 10.0.0.75:60287->10.0.0.2:53: i/o timeout
[INFO] plugin/kubernetes: pkg/mod/k8s.io/client-go@v0.29.3/tools/cache/reflector.go:229: failed to list *v1.Namespace: Get "https://10.96.0.1:443/api/v1/namespaces?limit=500&resourceVersion=0": dial tcp 10.96.0.1:443: i/o timeout
[INFO] plugin/kubernetes: Trace[2115550610]: "Reflector ListAndWatch" name:pkg/mod/k8s.io/client-go@v0.29.3/tools/cache/reflector.go:229 (16-Sep-2024 19:24:38.357) (total time: 30000ms):
Trace[2115550610]: ---"Objects listed" error:Get "https://10.96.0.1:443/api/v1/namespaces?limit=500&resourceVersion=0": dial tcp 10.96.0.1:443: i/o timeout 30000ms (19:25:08.358)
Trace[2115550610]: [30.000916518s] [30.000916518s] END
[ERROR] plugin/kubernetes: pkg/mod/k8s.io/client-go@v0.29.3/tools/cache/reflector.go:229: Failed to watch *v1.Namespace: failed to list *v1.Namespace: Get "https://10.96.0.1:443/api/v1/namespaces?limit=500&resourceVersion=0": dial tcp 10.96.0.1:443: i/o timeout
[INFO] plugin/kubernetes: pkg/mod/k8s.io/client-go@v0.29.3/tools/cache/reflector.go:229: failed to list *v1.Service: Get "https://10.96.0.1:443/api/v1/services?limit=500&resourceVersion=0": dial tcp 10.96.0.1:443: i/o timeout
[INFO] plugin/kubernetes: Trace[935094613]: "Reflector ListAndWatch" name:pkg/mod/k8s.io/client-go@v0.29.3/tools/cache/reflector.go:229 (16-Sep-2024 19:24:38.358) (total time: 30000ms):
Trace[935094613]: ---"Objects listed" error:Get "https://10.96.0.1:443/api/v1/services?limit=500&resourceVersion=0": dial tcp 10.96.0.1:443: i/o timeout 30000ms (19:25:08.358)
Trace[935094613]: [30.000403807s] [30.000403807s] END
[ERROR] plugin/kubernetes: pkg/mod/k8s.io/client-go@v0.29.3/tools/cache/reflector.go:229: Failed to watch *v1.Service: failed to list *v1.Service: Get "https://10.96.0.1:443/api/v1/services?limit=500&resourceVersion=0": dial tcp 10.96.0.1:443: i/o timeout
[INFO] plugin/kubernetes: pkg/mod/k8s.io/client-go@v0.29.3/tools/cache/reflector.go:229: failed to list *v1.EndpointSlice: Get "https://10.96.0.1:443/apis/discovery.k8s.io/v1/endpointslices?limit=500&resourceVersion=0": dial tcp 10.96.0.1:443: i/o timeout
[INFO] plugin/kubernetes: Trace[1423531700]: "Reflector ListAndWatch" name:pkg/mod/k8s.io/client-go@v0.29.3/tools/cache/reflector.go:229 (16-Sep-2024 19:24:38.358) (total time: 30000ms):
Trace[1423531700]: ---"Objects listed" error:Get "https://10.96.0.1:443/apis/discovery.k8s.io/v1/endpointslices?limit=500&resourceVersion=0": dial tcp 10.96.0.1:443: i/o timeout 30000ms (19:25:08.359)
Trace[1423531700]: [30.000293311s] [30.000293311s] END
[ERROR] plugin/kubernetes: pkg/mod/k8s.io/client-go@v0.29.3/tools/cache/reflector.go:229: Failed to watch *v1.EndpointSlice: failed to list *v1.EndpointSlice: Get "https://10.96.0.1:443/apis/discovery.k8s.io/v1/endpointslices?limit=500&resourceVersion=0": dial tcp 10.96.0.1:443: i/o timeout
[ERROR] plugin/errors: 2 5717391959630560116.4828385316436471351. HINFO: read udp 10.0.0.75:44224->10.0.0.2:53: i/o timeout
[ERROR] plugin/errors: 2 5717391959630560116.4828385316436471351. HINFO: read udp 10.0.0.75:60914->10.0.0.2:53: i/o timeout
[INFO] plugin/kubernetes: pkg/mod/k8s.io/client-go@v0.29.3/tools/cache/reflector.go:229: failed to list *v1.EndpointSlice: Get "https://10.96.0.1:443/apis/discovery.k8s.io/v1/endpointslices?limit=500&resourceVersion=0": dial tcp 10.96.0.1:443: i/o timeout
[INFO] plugin/kubernetes: Trace[1341126722]: "Reflector ListAndWatch" name:pkg/mod/k8s.io/client-go@v0.29.3/tools/cache/reflector.go:229 (16-Sep-2024 19:25:09.591) (total time: 30000ms):
Trace[1341126722]: ---"Objects listed" error:Get "https://10.96.0.1:443/apis/discovery.k8s.io/v1/endpointslices?limit=500&resourceVersion=0": dial tcp 10.96.0.1:443: i/o timeout 30000ms (19:25:39.592)
Trace[1341126722]: [30.000759936s] [30.000759936s] END
[ERROR] plugin/kubernetes: pkg/mod/k8s.io/client-go@v0.29.3/tools/cache/reflector.go:229: Failed to watch *v1.EndpointSlice: failed to list *v1.EndpointSlice: Get "https://10.96.0.1:443/apis/discovery.k8s.io/v1/endpointslices?limit=500&resourceVersion=0": dial tcp 10.96.0.1:443: i/o timeout
[INFO] plugin/kubernetes: pkg/mod/k8s.io/client-go@v0.29.3/tools/cache/reflector.go:229: failed to list *v1.Namespace: Get "https://10.96.0.1:443/api/v1/namespaces?limit=500&resourceVersion=0": dial tcp 10.96.0.1:443: i/o timeout
[INFO] plugin/kubernetes: Trace[1646410435]: "Reflector ListAndWatch" name:pkg/mod/k8s.io/client-go@v0.29.3/tools/cache/reflector.go:229 (16-Sep-2024 19:25:09.695) (total time: 30001ms):
Trace[1646410435]: ---"Objects listed" error:Get "https://10.96.0.1:443/api/v1/namespaces?limit=500&resourceVersion=0": dial tcp 10.96.0.1:443: i/o timeout 30001ms (19:25:39.696)
Trace[1646410435]: [30.001364482s] [30.001364482s] END
[ERROR] plugin/kubernetes: pkg/mod/k8s.io/client-go@v0.29.3/tools/cache/reflector.go:229: Failed to watch *v1.Namespace: failed to list *v1.Namespace: Get "https://10.96.0.1:443/api/v1/namespaces?limit=500&resourceVersion=0": dial tcp 10.96.0.1:443: i/o timeout
[INFO] plugin/kubernetes: pkg/mod/k8s.io/client-go@v0.29.3/tools/cache/reflector.go:229: failed to list *v1.Service: Get "https://10.96.0.1:443/api/v1/services?limit=500&resourceVersion=0": dial tcp 10.96.0.1:443: i/o timeout
[INFO] plugin/kubernetes: Trace[1072212733]: "Reflector ListAndWatch" name:pkg/mod/k8s.io/client-go@v0.29.3/tools/cache/reflector.go:229 (16-Sep-2024 19:25:09.753) (total time: 30000ms):
Trace[1072212733]: ---"Objects listed" error:Get "https://10.96.0.1:443/api/v1/services?limit=500&resourceVersion=0": dial tcp 10.96.0.1:443: i/o timeout 30000ms (19:25:39.754)
Trace[1072212733]: [30.000533915s] [30.000533915s] END
[ERROR] plugin/kubernetes: pkg/mod/k8s.io/client-go@v0.29.3/tools/cache/reflector.go:229: Failed to watch *v1.Service: failed to list *v1.Service: Get "https://10.96.0.1:443/api/v1/services?limit=500&resourceVersion=0": dial tcp 10.96.0.1:443: i/o timeout

wtvamp · 2024-09-16T19:30:44Z

Closing this due to lack of more information.

@orsenthil Why was this closed? It seems like there's plenty of information and repro steps?

orsenthil · 2024-09-16T20:34:58Z

fairly ubiquitous issue when attempting to use the amazon-vpc-cni in a non-EKS environment.

We will need to reproduce this and investigate. Re-opened.

wtvamp · 2024-09-16T20:36:46Z

Thanks!

I've got a cluster that reproduces and willing to screen share/support as needed.

terryjix · 2024-09-16T20:43:03Z

I've fixed my issue by running vpc-cni-k8s on EKS optimized AMI. vpc-cni-k8s plugin conflicts with ec2-net-utils. ec2-net-utils adds more route rules which broke the pod to pod communication in my case. the EKS optimized ami has optimized this issue.

wtvamp · 2024-09-16T20:52:19Z

I've fixed my issue by running vpc-cni-k8s on EKS optimized AMI. vpc-cni-k8s plugin conflicts with ec2-net-utils. ec2-net-utils adds more route rules which broke the pod to pod communication in my case. the EKS optimized ami has optimized this issue.

Does this work for even outside EKS? I think this bug was for outside EKS (for example, I'm running self-managed on ubuntu AMIs with kubeadm)

terryjix · 2024-09-16T21:37:16Z

yes, I used kubeadmin to create kubernetes cluster on Amazon Linux 2 ami and found the pod cannot communicate with outside. some strange rules created on route table which overwrites the rules vpc-cni created.

You can find optimized ubuntu ami from https://cloud-images.ubuntu.com/aws-eks/ . Maybe it can fix your issue. You can build your self-managed kubernetes control plan on these amis. The optimized ami has disabled some services may affect network configuration in the OS.

wtvamp · 2024-09-17T15:51:28Z

yes, I used kubeadmin to create kubernetes cluster on Amazon Linux 2 ami and found the pod cannot communicate with outside. some strange rules created on route table which overwrites the rules vpc-cni created.

You can find optimized ubuntu ami from https://cloud-images.ubuntu.com/aws-eks/ . Maybe it can fix your issue. You can build your self-managed kubernetes control plan on these amis. The optimized ami has disabled some services may affect network configuration in the OS.

It says clearly on the page: These images are customised specifically for the EKS service, and are not intended as general OS images.

is-it-ayush added needs investigation question labels Mar 12, 2024

is-it-ayush closed this as completed Mar 13, 2024

is-it-ayush reopened this Mar 14, 2024

orsenthil closed this as completed May 15, 2024

orsenthil reopened this Sep 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

using `amazon-vpc-cni-k8s` outside eks #2839

using `amazon-vpc-cni-k8s` outside eks #2839

is-it-ayush commented Mar 12, 2024 •

edited

Loading

kwohlfahrt commented Mar 12, 2024

is-it-ayush commented Mar 12, 2024 •

edited

Loading

kwohlfahrt commented Mar 13, 2024

is-it-ayush commented Mar 13, 2024 •

edited

Loading

github-actions bot commented Mar 13, 2024

is-it-ayush commented Mar 14, 2024 •

edited

Loading

kwohlfahrt commented Mar 14, 2024

is-it-ayush commented Mar 15, 2024 •

edited

Loading

is-it-ayush commented Mar 21, 2024 •

edited

Loading

orsenthil commented May 1, 2024

is-it-ayush commented May 1, 2024 •

edited

Loading

terryjix commented May 3, 2024

orsenthil commented May 3, 2024

orsenthil commented May 15, 2024

github-actions bot commented May 15, 2024

wtvamp commented Sep 16, 2024

wtvamp commented Sep 16, 2024

orsenthil commented Sep 16, 2024

wtvamp commented Sep 16, 2024

terryjix commented Sep 16, 2024

wtvamp commented Sep 16, 2024 •

edited

Loading

terryjix commented Sep 16, 2024 •

edited

Loading

wtvamp commented Sep 17, 2024

using amazon-vpc-cni-k8s outside eks #2839

using amazon-vpc-cni-k8s outside eks #2839

Comments

is-it-ayush commented Mar 12, 2024 • edited Loading

kwohlfahrt commented Mar 12, 2024

is-it-ayush commented Mar 12, 2024 • edited Loading

kwohlfahrt commented Mar 13, 2024

is-it-ayush commented Mar 13, 2024 • edited Loading

github-actions bot commented Mar 13, 2024

is-it-ayush commented Mar 14, 2024 • edited Loading

kwohlfahrt commented Mar 14, 2024

is-it-ayush commented Mar 15, 2024 • edited Loading

is-it-ayush commented Mar 21, 2024 • edited Loading

orsenthil commented May 1, 2024

is-it-ayush commented May 1, 2024 • edited Loading

terryjix commented May 3, 2024

orsenthil commented May 3, 2024

orsenthil commented May 15, 2024

github-actions bot commented May 15, 2024

wtvamp commented Sep 16, 2024

wtvamp commented Sep 16, 2024

orsenthil commented Sep 16, 2024

wtvamp commented Sep 16, 2024

terryjix commented Sep 16, 2024

wtvamp commented Sep 16, 2024 • edited Loading

terryjix commented Sep 16, 2024 • edited Loading

wtvamp commented Sep 17, 2024

using `amazon-vpc-cni-k8s` outside eks #2839

using `amazon-vpc-cni-k8s` outside eks #2839

is-it-ayush commented Mar 12, 2024 •

edited

Loading

is-it-ayush commented Mar 12, 2024 •

edited

Loading

is-it-ayush commented Mar 13, 2024 •

edited

Loading

is-it-ayush commented Mar 14, 2024 •

edited

Loading

is-it-ayush commented Mar 15, 2024 •

edited

Loading

is-it-ayush commented Mar 21, 2024 •

edited

Loading

is-it-ayush commented May 1, 2024 •

edited

Loading

wtvamp commented Sep 16, 2024 •

edited

Loading

terryjix commented Sep 16, 2024 •

edited

Loading