[EKS] unable to deploy the aws-efs-csi-driver #1111

fmedery · 2020-09-04T15:50:03Z

I deployed a sample EKS with bottlerocket cluster using the sample-eksctl.yaml and tried to deploy the aws-csi-driver using the documentation on the github page

Image I'm using:
1.0.0

❯ kubectl get nodes -o wide
NAME                                          STATUS   ROLES    AGE     VERSION   INTERNAL-IP     EXTERNAL-IP    OS-IMAGE                KERNEL-VERSION   CONTAINER-RUNTIME
ip-192-168-20-34.us-east-2.compute.internal   Ready    <none>   2d23h   v1.17.9   192.168.20.34   3.131.94.184   Bottlerocket OS 1.0.0   5.4.50           containerd://1.3.7+unknown
ip-192-168-36-82.us-east-2.compute.internal   Ready    <none>   2d23h   v1.17.9   192.168.36.82   3.22.217.215   Bottlerocket OS 1.0.0   5.4.50           containerd://1.3.7+unknown
ip-192-168-75-77.us-east-2.compute.internal   Ready    <none>   2d23h   v1.17.9   192.168.75.77   52.14.56.84    Bottlerocket OS 1.0.0   5.4.50           containerd://1.3.7+unknown

What I expected to happen:
aws-efs-csi in running state

What actually happened:

❯ k get pod -n kube-system -l app=efs-csi-node
NAME                 READY   STATUS             RESTARTS   AGE
efs-csi-node-7qkdw   2/3     CrashLoopBackOff   27         116m
efs-csi-node-fbhbq   2/3     CrashLoopBackOff   27         116m
efs-csi-node-sfvdw   2/3     CrashLoopBackOff   27         116m

logs

❯ k -n kube-system logs -f efs-csi-node-7qkdw -c efs-plugin
I0904 15:44:54.508728       1 mount_linux.go:163] Cannot run systemd-run, assuming non-systemd OS
I0904 15:44:54.508789       1 mount_linux.go:164] systemd-run failed with: exit status 1
I0904 15:44:54.508798       1 mount_linux.go:165] systemd-run output:
I0904 15:44:54.508969       1 driver.go:87] Starting watchdog
I0904 15:44:54.509054       1 efs_watch_dog.go:174] Copying /etc/amazon/efs/efs-utils.conf since it doesn't exist
F0904 15:44:54.509907       1 main.go:50] open /etc/amazon/efs/efs-utils.conf: permission denied

How to reproduce the problem:

eksctl create cluster -f sample-eksctl.yaml

and when the cluster is ready:

kubectl apply -k "github.com/kubernetes-sigs/aws-efs-csi-driver/deploy/kubernetes/overlays/stable/?ref=release-1.0"

The text was updated successfully, but these errors were encountered:

bcressey · 2020-09-05T17:07:45Z

The issue seems to have appeared in kubernetes-sigs/aws-efs-csi-driver@b3baff82, which added an attempt to persist a key to /etc/amazon on the host.

This is incompatible with Bottlerocket in two ways:

/etc is not persistent across node restarts
/etc is not writable by unprivileged containers

We end up with errors like this on the host:

[ 88.166848] audit: type=1400 audit(1599324659.394:4): avc: denied { write } for pid=4892 comm="aws-efs-csi-dri" name="efs" dev="tmpfs" ino=15602 scontext=system_u:system_r:container_t:s0 tcontext=system_u:object_r:etc_t:s0 tclass=dir permissive=0

Ideally the EFS CSI driver would persist to a different location.

fmedery · 2020-09-09T13:50:04Z

thank you I created an issue with the aws-efs-csi-driver repo:
kubernetes-sigs/aws-efs-csi-driver#246

webern · 2020-11-24T15:53:39Z

I'm going to re-open this issue and use it as the main issue, of which we are starting to see duplicates. This issue describes the problem crisply and links to the underlying cause, which we need to fix.

webern · 2020-11-25T04:04:57Z

I tried a patched version of the EFS CSI driver, basically like in this PR kubernetes-sigs/aws-efs-csi-driver#247, but I was trying to use a 'konfiguration' as was discussed in that PR.

That got me past the avc denial, but then I found:

Nov 25 00:23:10 ip-192-168-9-184.us-west-2.compute.internal kubelet[3122]: E1125
00:23:10.932184    3122 nestedpendingoperations.go:301] Operation for
"{volumeName:kubernetes.io/csi/efs.csi.aws.com^fs-xxxxxxxx podName: nodeName:}"
failed. No retries permitted until 2020-11-25 00:25:12.932131419 +0000 UTC
m=+98140.130189479 (durationBeforeRetry 2m2s). Error: "MountVolume.SetUp failed for
volume \"efs-pv\" (UniqueName: \"kubernetes.io/csi/efs.csi.aws.com^fs-xxxxxxxx\") pod
\"app1\" (UID: \"xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx\") : rpc error: code = Internal
desc = Could not mount \"fs-xxxxxxxx:/\" at \"/var/lib/kubelet
pods/xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx/volumes/kubernetes.io~csi/efs-pv/mount\":
mount failed: exit status 1
Mounting command: mount
Mounting arguments: -t efs fs-xxxxxxxx:/ /var/lib/kubelet/pods/xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx/volumes/kubernetes.io~csi/efs-pv/mount
Output: Traceback (most recent call last):
  File \"/sbin/mount.efs\", line 1537, in <module>
    main()
  File \"/sbin/mount.efs\", line 1517, in main
    bootstrap_logging(config)
  File \"/sbin/mount.efs\", line 1187, in bootstrap_logging
    raw_level = config.get(CONFIG_SECTION, 'logging_level')
  File \"/lib64/python2.7/ConfigParser.py\", line 607, in get
    raise NoSectionError(section)
ConfigParser.NoSectionError: No section: 'mount'

That error is opaque, I can't make out what the issue is. It's possible that my EFS/security-group setup is incorrect, or something else about Bottlerocket is at issue. Not sure.

Edit: I may have had an incorrect security group setup kubernetes-sigs/aws-efs-csi-driver#192 (comment)

webern · 2020-12-04T21:00:29Z

Progress Update: the error shown above was likely due to some misconfiguration of security groups. When proceeding more meticulously and carefully, I have been able to prove that changing the directory location allows the EFS CSI driver to work on Bottlerocket.

We have a plan to change the directory location in the CSI driver's code and specs in a way that will be backward compatible with (non-Bottlerocket) nodes that may already have EFS mounts. Don't have an ETA yet, but we are actively working on it.

faarshad · 2020-12-04T21:39:13Z

Fyi, I can confirm that when I deploy csi-driver with version tag v1.0.0, I encounter no errors on both AL2(amazon linux2) and Bottlerocket(version1.0.4). The single daemonset works across both type of VMs but when I use master(c82831cc10a291af18b30e4ca4060b0f53db96eb), the driver works fine on AL2 but I see the following in Bottlerocket pods for csi-driver:

# kubectl -nops logs efs-csi-node-twktb -c efs-plugin
I1204 19:51:11.033467       1 mount_linux.go:163] Cannot run systemd-run, assuming non-systemd OS
I1204 19:51:11.033583       1 mount_linux.go:164] systemd-run failed with: exit status 1
I1204 19:51:11.033596       1 mount_linux.go:165] systemd-run output:
I1204 19:51:11.033820       1 driver.go:87] Starting watchdog
I1204 19:51:11.033912       1 efs_watch_dog.go:174] Copying /etc/amazon/efs/efs-utils.conf since it doesn't exist
F1204 19:51:11.033962       1 main.go:50] open /etc/amazon/efs/efs-utils.conf: permission denied

webern · 2020-12-04T22:43:25Z

@farshad-hobsons, yes my update might not have been clear. I can get it to work on Bottlerocket when I edit the pod-spec to use /var/amazon/efs on the host instead of /etc/amazon/efs.

We can't change the spec in this way though because pre-existing (non-Bottlerocket) mounts whose configs were already written to a certain directory would hang if we change the config directory out from under them.

So we're working on a change to the CSI's Go code to detect and continue to use the existing directory, when present, else use a new preferred location that is Bottlerocket-friendly.

webern · 2021-01-07T04:21:04Z

Update, the fix just merged kubernetes-sigs/aws-efs-csi-driver#286, I still need to find out how/when the change will be released.

webern · 2021-01-28T23:06:53Z

Update, the fix is in the release process now kubernetes-sigs/aws-efs-csi-driver#315

webern · 2021-03-01T22:10:32Z

This is almost certainly released and done, but I haven't had a chance to re-test yet.

webern · 2021-03-12T23:12:40Z

@bcressey has verified this. Closing.

fmedery changed the title ~~[EKS] unable to deploy the aws-efs-csi-drive~~ [EKS] unable to deploy the aws-efs-csi-driver Sep 4, 2020

fmedery mentioned this issue Sep 9, 2020

Unable to deploy aws-efs-csi-driver on bottlerocket kubernetes-sigs/aws-efs-csi-driver#246

Closed

fmedery closed this as completed Sep 9, 2020

webern mentioned this issue Nov 23, 2020

How to install additional softwares in the Bottlerocket instance ? #1222

Closed

webern reopened this Nov 24, 2020

webern self-assigned this Nov 24, 2020

webern added type/bug Something isn't working area/kubernetes K8s including EKS, EKS-A, and including VMW labels Nov 24, 2020

webern added the type/enhancement New feature or request label Nov 25, 2020

webern mentioned this issue Dec 7, 2020

change config dir location kubernetes-sigs/aws-efs-csi-driver#286

Merged

jhaynes added the priority/p0 label Dec 10, 2020

webern closed this as completed Mar 12, 2021

webern mentioned this issue May 27, 2021

Support NFS on EKS #1225

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[EKS] unable to deploy the aws-efs-csi-driver #1111

[EKS] unable to deploy the aws-efs-csi-driver #1111

fmedery commented Sep 4, 2020 •

edited

Loading

bcressey commented Sep 5, 2020 •

edited

Loading

fmedery commented Sep 9, 2020 •

edited

Loading

webern commented Nov 24, 2020

webern commented Nov 25, 2020 •

edited

Loading

webern commented Dec 4, 2020

faarshad commented Dec 4, 2020

webern commented Dec 4, 2020 •

edited

Loading

webern commented Jan 7, 2021

webern commented Jan 28, 2021 •

edited

Loading

webern commented Mar 1, 2021

webern commented Mar 12, 2021

[EKS] unable to deploy the aws-efs-csi-driver #1111

[EKS] unable to deploy the aws-efs-csi-driver #1111

Comments

fmedery commented Sep 4, 2020 • edited Loading

bcressey commented Sep 5, 2020 • edited Loading

fmedery commented Sep 9, 2020 • edited Loading

webern commented Nov 24, 2020

webern commented Nov 25, 2020 • edited Loading

webern commented Dec 4, 2020

faarshad commented Dec 4, 2020

webern commented Dec 4, 2020 • edited Loading

webern commented Jan 7, 2021

webern commented Jan 28, 2021 • edited Loading

webern commented Mar 1, 2021

webern commented Mar 12, 2021

fmedery commented Sep 4, 2020 •

edited

Loading

bcressey commented Sep 5, 2020 •

edited

Loading

fmedery commented Sep 9, 2020 •

edited

Loading

webern commented Nov 25, 2020 •

edited

Loading

webern commented Dec 4, 2020 •

edited

Loading

webern commented Jan 28, 2021 •

edited

Loading