Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[EKS] unable to deploy the aws-efs-csi-driver #1111

Closed
fmedery opened this issue Sep 4, 2020 · 11 comments · Fixed by kubernetes-sigs/aws-efs-csi-driver#286
Closed

[EKS] unable to deploy the aws-efs-csi-driver #1111

fmedery opened this issue Sep 4, 2020 · 11 comments · Fixed by kubernetes-sigs/aws-efs-csi-driver#286
Assignees
Labels
area/kubernetes K8s including EKS, EKS-A, and including VMW type/bug Something isn't working type/enhancement New feature or request

Comments

@fmedery
Copy link

fmedery commented Sep 4, 2020

I deployed a sample EKS with bottlerocket cluster using the sample-eksctl.yaml and tried to deploy the aws-csi-driver using the documentation on the github page

Image I'm using:
1.0.0

❯ kubectl get nodes -o wide
NAME                                          STATUS   ROLES    AGE     VERSION   INTERNAL-IP     EXTERNAL-IP    OS-IMAGE                KERNEL-VERSION   CONTAINER-RUNTIME
ip-192-168-20-34.us-east-2.compute.internal   Ready    <none>   2d23h   v1.17.9   192.168.20.34   3.131.94.184   Bottlerocket OS 1.0.0   5.4.50           containerd://1.3.7+unknown
ip-192-168-36-82.us-east-2.compute.internal   Ready    <none>   2d23h   v1.17.9   192.168.36.82   3.22.217.215   Bottlerocket OS 1.0.0   5.4.50           containerd://1.3.7+unknown
ip-192-168-75-77.us-east-2.compute.internal   Ready    <none>   2d23h   v1.17.9   192.168.75.77   52.14.56.84    Bottlerocket OS 1.0.0   5.4.50           containerd://1.3.7+unknown

What I expected to happen:
aws-efs-csi in running state

What actually happened:

❯ k get pod -n kube-system -l app=efs-csi-node
NAME                 READY   STATUS             RESTARTS   AGE
efs-csi-node-7qkdw   2/3     CrashLoopBackOff   27         116m
efs-csi-node-fbhbq   2/3     CrashLoopBackOff   27         116m
efs-csi-node-sfvdw   2/3     CrashLoopBackOff   27         116m

logs

❯ k -n kube-system logs -f efs-csi-node-7qkdw -c efs-plugin
I0904 15:44:54.508728       1 mount_linux.go:163] Cannot run systemd-run, assuming non-systemd OS
I0904 15:44:54.508789       1 mount_linux.go:164] systemd-run failed with: exit status 1
I0904 15:44:54.508798       1 mount_linux.go:165] systemd-run output:
I0904 15:44:54.508969       1 driver.go:87] Starting watchdog
I0904 15:44:54.509054       1 efs_watch_dog.go:174] Copying /etc/amazon/efs/efs-utils.conf since it doesn't exist
F0904 15:44:54.509907       1 main.go:50] open /etc/amazon/efs/efs-utils.conf: permission denied

How to reproduce the problem:

eksctl create cluster -f sample-eksctl.yaml

and when the cluster is ready:

kubectl apply -k "github.com/kubernetes-sigs/aws-efs-csi-driver/deploy/kubernetes/overlays/stable/?ref=release-1.0"

@fmedery fmedery changed the title [EKS] unable to deploy the aws-efs-csi-drive [EKS] unable to deploy the aws-efs-csi-driver Sep 4, 2020
@bcressey
Copy link
Contributor

bcressey commented Sep 5, 2020

The issue seems to have appeared in kubernetes-sigs/aws-efs-csi-driver@b3baff82, which added an attempt to persist a key to /etc/amazon on the host.

This is incompatible with Bottlerocket in two ways:

  • /etc is not persistent across node restarts
  • /etc is not writable by unprivileged containers

We end up with errors like this on the host:

[ 88.166848] audit: type=1400 audit(1599324659.394:4): avc: denied { write } for pid=4892 comm="aws-efs-csi-dri" name="efs" dev="tmpfs" ino=15602 scontext=system_u:system_r:container_t:s0 tcontext=system_u:object_r:etc_t:s0 tclass=dir permissive=0

Ideally the EFS CSI driver would persist to a different location.

@fmedery
Copy link
Author

fmedery commented Sep 9, 2020

thank you I created an issue with the aws-efs-csi-driver repo:
kubernetes-sigs/aws-efs-csi-driver#246

@webern
Copy link
Contributor

webern commented Nov 24, 2020

I'm going to re-open this issue and use it as the main issue, of which we are starting to see duplicates. This issue describes the problem crisply and links to the underlying cause, which we need to fix.

@webern webern reopened this Nov 24, 2020
@webern webern self-assigned this Nov 24, 2020
@webern webern added type/bug Something isn't working area/kubernetes K8s including EKS, EKS-A, and including VMW labels Nov 24, 2020
@webern
Copy link
Contributor

webern commented Nov 25, 2020

I tried a patched version of the EFS CSI driver, basically like in this PR kubernetes-sigs/aws-efs-csi-driver#247, but I was trying to use a 'konfiguration' as was discussed in that PR.

That got me past the avc denial, but then I found:

Nov 25 00:23:10 ip-192-168-9-184.us-west-2.compute.internal kubelet[3122]: E1125
00:23:10.932184    3122 nestedpendingoperations.go:301] Operation for
"{volumeName:kubernetes.io/csi/efs.csi.aws.com^fs-xxxxxxxx podName: nodeName:}"
failed. No retries permitted until 2020-11-25 00:25:12.932131419 +0000 UTC
m=+98140.130189479 (durationBeforeRetry 2m2s). Error: "MountVolume.SetUp failed for
volume \"efs-pv\" (UniqueName: \"kubernetes.io/csi/efs.csi.aws.com^fs-xxxxxxxx\") pod
\"app1\" (UID: \"xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx\") : rpc error: code = Internal
desc = Could not mount \"fs-xxxxxxxx:/\" at \"/var/lib/kubelet
pods/xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx/volumes/kubernetes.io~csi/efs-pv/mount\":
mount failed: exit status 1
Mounting command: mount
Mounting arguments: -t efs fs-xxxxxxxx:/ /var/lib/kubelet/pods/xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx/volumes/kubernetes.io~csi/efs-pv/mount
Output: Traceback (most recent call last):
  File \"/sbin/mount.efs\", line 1537, in <module>
    main()
  File \"/sbin/mount.efs\", line 1517, in main
    bootstrap_logging(config)
  File \"/sbin/mount.efs\", line 1187, in bootstrap_logging
    raw_level = config.get(CONFIG_SECTION, 'logging_level')
  File \"/lib64/python2.7/ConfigParser.py\", line 607, in get
    raise NoSectionError(section)
ConfigParser.NoSectionError: No section: 'mount'

That error is opaque, I can't make out what the issue is. It's possible that my EFS/security-group setup is incorrect, or something else about Bottlerocket is at issue. Not sure.

Edit: I may have had an incorrect security group setup kubernetes-sigs/aws-efs-csi-driver#192 (comment)

@webern webern added the type/enhancement New feature or request label Nov 25, 2020
@webern
Copy link
Contributor

webern commented Dec 4, 2020

Progress Update: the error shown above was likely due to some misconfiguration of security groups. When proceeding more meticulously and carefully, I have been able to prove that changing the directory location allows the EFS CSI driver to work on Bottlerocket.

We have a plan to change the directory location in the CSI driver's code and specs in a way that will be backward compatible with (non-Bottlerocket) nodes that may already have EFS mounts. Don't have an ETA yet, but we are actively working on it.

@faarshad
Copy link

faarshad commented Dec 4, 2020

Fyi, I can confirm that when I deploy csi-driver with version tag v1.0.0, I encounter no errors on both AL2(amazon linux2) and Bottlerocket(version1.0.4). The single daemonset works across both type of VMs but when I use master(c82831cc10a291af18b30e4ca4060b0f53db96eb), the driver works fine on AL2 but I see the following in Bottlerocket pods for csi-driver:

# kubectl -nops logs efs-csi-node-twktb -c efs-plugin
I1204 19:51:11.033467       1 mount_linux.go:163] Cannot run systemd-run, assuming non-systemd OS
I1204 19:51:11.033583       1 mount_linux.go:164] systemd-run failed with: exit status 1
I1204 19:51:11.033596       1 mount_linux.go:165] systemd-run output:
I1204 19:51:11.033820       1 driver.go:87] Starting watchdog
I1204 19:51:11.033912       1 efs_watch_dog.go:174] Copying /etc/amazon/efs/efs-utils.conf since it doesn't exist
F1204 19:51:11.033962       1 main.go:50] open /etc/amazon/efs/efs-utils.conf: permission denied

@webern
Copy link
Contributor

webern commented Dec 4, 2020

@farshad-hobsons, yes my update might not have been clear. I can get it to work on Bottlerocket when I edit the pod-spec to use /var/amazon/efs on the host instead of /etc/amazon/efs.

We can't change the spec in this way though because pre-existing (non-Bottlerocket) mounts whose configs were already written to a certain directory would hang if we change the config directory out from under them.

So we're working on a change to the CSI's Go code to detect and continue to use the existing directory, when present, else use a new preferred location that is Bottlerocket-friendly.

@webern
Copy link
Contributor

webern commented Jan 7, 2021

Update, the fix just merged kubernetes-sigs/aws-efs-csi-driver#286, I still need to find out how/when the change will be released.

@webern
Copy link
Contributor

webern commented Jan 28, 2021

Update, the fix is in the release process now kubernetes-sigs/aws-efs-csi-driver#315

@webern
Copy link
Contributor

webern commented Mar 1, 2021

This is almost certainly released and done, but I haven't had a chance to re-test yet.

@webern
Copy link
Contributor

webern commented Mar 12, 2021

@bcressey has verified this. Closing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/kubernetes K8s including EKS, EKS-A, and including VMW type/bug Something isn't working type/enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants