Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade from Chart 2.4.4 #1372

Open
m-parrella opened this issue Jun 10, 2024 · 1 comment
Open

Upgrade from Chart 2.4.4 #1372

m-parrella opened this issue Jun 10, 2024 · 1 comment
Labels
kind/bug Categorizes issue or PR as related to a bug. lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale.

Comments

@m-parrella
Copy link

m-parrella commented Jun 10, 2024

/kind bug

What happened?

We recently upgraded our EKS cluster to 1.29. We are using Managed Nodes with amazon-eks-node-1.29-v20240227 AMI and we are using the EFS CSI Driver 1.5.6 deployed by Helm. Chart 2.4.4.

Following an upgrade of the driver from Chart 2.4.4 to Chart 2.4.5 (or higher), we encountered an issue where deployments using the EFS Storage Class ceased functioning correctly. Both Pods and Nodes failed to respond to the 'df' command. In examining /var/log/messages on the node, we found the following error message:

Jun 10 15:07:44 ip-XXX-XXX-XXX-XXX kernel: nfs: server 127.0.0.1 not responding, still trying

If we move the Pods mounting EFS volumenes to a new node, the Pod runs as expected.

Upon comparing both charts, the significant alteration lies in the EFS State Directory as outlined in the CHANGELOG. This leads us to suspect that stunnel may not be capable of resuming connections post-upgrade.

{
  "hostPath": {
    "path": "/var/run/efs",
    "type": "DirectoryOrCreate"
  },
  "name": "efs-state-dir"
}

To avoid refreshing the nodes, we have identified two workarounds. The first approach involves patching the DaemonSet to utilize the original path. This can be achieved by executing the following command:

kubectl patch daemonsets -n kube-system efs-csi-node --type json -p='[{"op": "replace", "path": "/spec/template/spec/volumes/3/hostPath/path", "value": "/var/run/efs-csi-driver"}]'

The second approach it to create a symbolic link prior the upgrade:

[root@ip-XXX-XXX-XXX-XXX /]# ln -s /var/run/efs-csi-driver /var/run/efs
[root@ip-XXX-XXX-XXX-XXX /]# ls -ld /var/run/efs /var/run/efs-csi-driver
lrwxrwxrwx 1 root root  23 Jun 10 18:15 /var/run/efs -> /var/run/efs-csi-driver
drwxr-xr-x 4 root root 160 Jun 10 18:21 /var/run/efs-csi-driver

Is this the expected behavior? Thanks in advance!

What you expected to happen?

Containers volumes should remain operational after the upgrade.

How to reproduce it (as minimally and precisely as possible)?

Upgrade from Chart 2.4.4 to Chart 2.4.5 or higher using Helmfile.

@k8s-ci-robot k8s-ci-robot added the kind/bug Categorizes issue or PR as related to a bug. label Jun 10, 2024
@m-parrella m-parrella changed the title Upgrade from Chart 2.4.4 Hungs. Upgrade from Chart 2.4.4 Jun 10, 2024
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale.
Projects
None yet
Development

No branches or pull requests

3 participants