Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multi Attach Error #477

Closed
jalberto opened this issue Jun 25, 2018 · 53 comments
Closed

Multi Attach Error #477

jalberto opened this issue Jun 25, 2018 · 53 comments

Comments

@jalberto
Copy link

AKS GA 1.9 in west EU

 Multi-Attach error for volume "pvc-2df1008d-7863-11e8-9ad1-0a58ac1f1349" Volume is already exclusively attached to one node and can't be attached to another
Unable to mount volumes for pod "grafana-67d785cdb5-clv9f_ops(f96e38a9-7866-11e8-9ad1-0a58ac1f1349)": timeout expired waiting for volumes to attach/mount for pod "ops"/"grafana-67d785cdb5-clv9f". list of unattached/unmounted volumes=[storage] 

This is common in acs-engine too, but I didn't expect it in AKS GA

rel:

@derekperkins
Copy link
Contributor

I'm getting the same error using 1.10.3 in canada-central.

@andyzhangx
Copy link
Contributor

@jalberto what's your k8s version? does it work after retry finally?

@andyzhangx
Copy link
Contributor

and also provide your steps about how to repro this issue on AKS, thanks.

@toorog
Copy link

toorog commented Jun 29, 2018

Same issue for me. Can be easily reproduced when using an arbitrary helm-chart do deploy a PVC and the pod using it:

  1. Execute helm chart
  2. Pod startup fails because volume is not yet available
  3. Delete pod (manually) immediately
  4. New instance of pod (started by still existing deployment) shows the error. The old pod is already gone at this point in time, so the PVC should not be locked anymore.

After that you can do 3 and 4 over and over again. The only thing that seems to fix the problem immediately is to also delete the PVC.

@jalberto
Copy link
Author

jalberto commented Jun 29, 2018

@andyzhangx version 1.10.3 just hit this again with grafana helm chart, using default StorageClass.

Why this issue re-appear so often?

@andyzhangx
Copy link
Contributor

@jalberto how do you repro this issue? I tried AKS 1.10.3 with grafana helm chart, there would be Multi-Attach error when I drain one node, after a while, it could be recovered.

Events:
  Type     Reason                  Age               From                               Message
  ----     ------                  ----              ----                               -------
  Normal   Scheduled               1m                default-scheduler                  Successfully assigned measly-mastiff-grafana-f8c797449-8cqx5 to aks-nodepool1-19606344-0
  Warning  FailedAttachVolume      1m                attachdetach-controller            Multi-Attach error for volume "pvc-ff169a18-7c75-11e8-83d3-0a58ac1f0385" Volume is already exclusively attached to one node and can't be attached to another
  Normal   SuccessfulMountVolume   1m                kubelet, aks-nodepool1-19606344-0  MountVolume.SetUp succeeded for volume "config-volume"
  Normal   SuccessfulMountVolume   1m                kubelet, aks-nodepool1-19606344-0  MountVolume.SetUp succeeded for volume "dashboard-volume"
  Normal   SuccessfulMountVolume   1m                kubelet, aks-nodepool1-19606344-0  MountVolume.SetUp succeeded for volume "default-token-bsgwl"
  Normal   SuccessfulAttachVolume  46s               attachdetach-controller            AttachVolume.Attach succeeded for volume "pvc-ff169a18-7c75-11e8-83d3-0a58ac1f0385"
  Normal   SuccessfulMountVolume   23s               kubelet, aks-nodepool1-19606344-0  MountVolume.SetUp succeeded for volume "pvc-ff169a18-7c75-11e8-83d3-0a58ac1f0385"
  Normal   Pulling                 3s (x3 over 22s)  kubelet, aks-nodepool1-19606344-0  pulling image "grafana/grafana:latest"
  Normal   Pulled                  2s (x3 over 21s)  kubelet, aks-nodepool1-19606344-0  Successfully pulled image "grafana/grafana:latest"
  Normal   Created                 2s (x3 over 21s)  kubelet, aks-nodepool1-19606344-0  Created container
  Normal   Started                 2s (x3 over 20s)  kubelet, aks-nodepool1-19606344-0  Started container

@jalberto
Copy link
Author

jalberto commented Jul 2, 2018

@andyzhangx how much is a "while" for you? I waited 45mins and it was not recovered.

If it helps, I just trigger the situation by upgrading the helm.

Now let's imagine it take an average of 30min, now imagine this happens because a real situation happens, I cannot wait 30 nor 15 nor 5min to a Disk to be mounted, that means production downtime.

@andyzhangx
Copy link
Contributor

@jalberto you could see in the above logs, "a while" means less than 1 min. How could I exactly repro your issue?

  • Set up a AKS cluster with k8s v1.10.3 with 2 nodes
  • helm install stable/grafana
  • drain a node?

@jalberto
Copy link
Author

jalberto commented Jul 2, 2018 via email

@andyzhangx
Copy link
Contributor

@jalberto what region is your AKS created? What's the config changes, delete pod? and make that pod reschedule on another node? Could you access azure portal to check whether that disk is still attached to original node after a few minutes?
Suppose there are two nodes, node#1, node#2, in the beginning, pod and azure disk is on node#1, and then rescheduled to node#2.
So you may provide me following info for both of these two nodes(node#1, node#2):

  1. https://resources.azure.com/subscriptions/{sub-id}/resourceGroups/{MC_resourcegroup_cluster_region}/providers/Microsoft.Compute/virtualMachines/{VM_NAME}
  2. kubectl get no NODE-NAME -o yaml > a.log

And could you also paste kubectl describe po POD-NAME info?

@andyzhangx
Copy link
Contributor

andyzhangx commented Jul 3, 2018

hi guys, I could not repro on my k8s cluster with helm grafana or wordpress, Multi-Attach error would only exist within about 1 min and then it recovered. One thing need to confirm that you are using azure disk as ReadWriteOnce accessModes, right? One azure disk could only be attached to one pod, otherwise it's not possible by design. My repro steps are as following:

  • Set up a AKS cluster with k8s v1.10.3 with 2 nodes
  • helm install stable/grafana
  • drain a node, and azure disk pvc and pod will be scheduled to another node
    BTW, I also tried delete pod, while still no repro since same pod and pvc will be scheduled in the same node, there would be no Multi-Attach error when delete a pod.

anything I am missing?

@jalberto
Copy link
Author

jalberto commented Jul 3, 2018

Sorry I am out right now, I will provide as much info as I can:

  • using k8s 1.10.3
  • 3 nodes (DS8)
  • I didn't try drain/delete but upgrade (for example change tag version)
  • This is random (as you say it only happens when new pod goes to different node, but as more nodes you have, more possibilities)

I still think 1-2mins downtime is too much for this kind of process (as mount/umount is mostly instant) the longer time I saw is 45min, but I usually purge and recreate if it's not fixed after 10mins)

@IainColledge
Copy link

Using AKS 1.9.6
Location West Europe
4 Nodes

It's taking a long time for disk attaches, normally with ACS I could work around by scaling up by one, shutting down the nodes in sequence and manually clearing any disks still attached, starting them up and finally scaling the cluster down by a node.

When I look at attached disks of nodes in the portal they are "updating" for a long time.

This basically means any pods have to be kept around otherwise it takes hours for them to reattach their PV's and start the container.

@vogelinho10
Copy link

We are also experiencing this issue when updating helm charts and the pod gets recreated on a different node. Downtime is always between 3 and 10 minutes which is not acceptable in our opinion.

@andyzhangx
Copy link
Contributor

andyzhangx commented Jul 5, 2018

@vogelinho10 what's your k8s version and node size? From v1.9.2 and v1.10.0, the detach or attach of one azure disk could cost around 30s, so one disk detach from one node and attach to another would cost around 1 min, so if you have 4 disks for example, total time of these 4 disks attach/detach from one node to anther would cost 4 min.

@vogelinho10
Copy link

@andyzhangx
AKS 1.9.6
West Europe
3 nodes (DS3 v2)
The pod only has one pvc / disk attached.

@andyzhangx
Copy link
Contributor

@vogelinho10 could you provide kubectl describe po POD-NAME logs?

@vogelinho10
Copy link

@andyzhangx yes:

Name:           jenkins-6746b45c94-sgm56
Namespace:      default
Node:           aks-agentpool-17149307-2/10.240.0.6
Start Time:     Thu, 05 Jul 2018 07:43:46 +0200
Labels:         app=jenkins
                chart=jenkins-0.16.3
                component=jenkins-jenkins-master
                heritage=Tiller
                pod-template-hash=2302601750
                release=jenkins
Annotations:    checksum/config=b0e9cd903ca69063fc53c2ebe6fa6500aa10bf35cae93edfc04dc4e276060970
Status:         Running
IP:             10.244.1.94
Controlled By:  ReplicaSet/jenkins-6746b45c94
Init Containers:
  copy-default-config:
    Container ID:  docker://d8e8429d5837bc223bbdaacc4964c7285370fa21a500d11cd0d5510dec5a582a
    Image:         jenkins/jenkins:lts
    Image ID:      docker-pullable://jenkins/jenkins@sha256:30793c79d6f59467bb8b821a8bc96cda24a61397f54658adb25825c68f8d464f
    Port:          <none>
    Host Port:     <none>
    Command:
      sh
      /var/jenkins_config/apply_config.sh
    State:          Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Thu, 05 Jul 2018 07:52:59 +0200
      Finished:     Thu, 05 Jul 2018 07:53:14 +0200
    Ready:          True
    Restart Count:  0
    Environment:    <none>
    Mounts:
      /usr/share/jenkins/ref/secrets/ from secrets-dir (rw)
      /var/jenkins_config from jenkins-config (rw)
      /var/jenkins_home from jenkins-home (rw)
      /var/jenkins_plugins from plugin-dir (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-gs645 (ro)
Containers:
  jenkins:
    Container ID:  docker://266c2c1de846f82ac2c5fd23d317f76dfd2b47c348c4fd2b3f6914e372673135
    Image:         jenkins/jenkins:lts
    Image ID:      docker-pullable://jenkins/jenkins@sha256:30793c79d6f59467bb8b821a8bc96cda24a61397f54658adb25825c68f8d464f
    Ports:         8080/TCP, 50000/TCP
    Host Ports:    0/TCP, 0/TCP
    Args:
      --argumentsRealm.passwd.$(ADMIN_USER)=$(ADMIN_PASSWORD)
      --argumentsRealm.roles.$(ADMIN_USER)=admin
    State:          Running
      Started:      Thu, 05 Jul 2018 07:53:25 +0200
    Ready:          True
    Restart Count:  0
    Requests:
      cpu:      500m
      memory:   512Mi
    Liveness:   http-get http://:http/login delay=90s timeout=5s period=10s #success=1 #failure=12
    Readiness:  http-get http://:http/login delay=60s timeout=1s period=10s #success=1 #failure=3
    Environment:
      JAVA_OPTS:       -Dorg.jenkinsci.plugins.durabletask.BourneShellScript.HEARTBEAT_CHECK_INTERVAL=300
      JENKINS_OPTS:
      ADMIN_PASSWORD:  <set to the key 'jenkins-admin-password' in secret 'jenkins'>  Optional: false
      ADMIN_USER:      <set to the key 'jenkins-admin-user' in secret 'jenkins'>      Optional: false
    Mounts:
      /usr/share/jenkins/ref/plugins/ from plugin-dir (rw)
      /usr/share/jenkins/ref/secrets/ from secrets-dir (rw)
      /var/jenkins_config from jenkins-config (ro)
      /var/jenkins_home from jenkins-home (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-gs645 (ro)
Conditions:
  Type           Status
  Initialized    True
  Ready          True
  PodScheduled   True
Volumes:
  jenkins-config:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      jenkins
    Optional:  false
  plugin-dir:
    Type:    EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:
  secrets-dir:
    Type:    EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:
  jenkins-home:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  jenkins
    ReadOnly:   false
  default-token-gs645:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-gs645
    Optional:    false
QoS Class:       Burstable
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type     Reason                 Age                From                               Message
  ----     ------                 ----               ----                               -------
  Normal   Scheduled              31m                default-scheduler                  Successfully assigned jenkins-6746b45c94-sgm56 to aks-agentpool-17149307-2
  Warning  FailedAttachVolume     31m                attachdetach-controller            Multi-Attach error for volume "pvc-97357b96-430b-11e8-a44d-0a58ac1f0a4f" Volume is already exclusively attached to one node and can't be attached to another
  Normal   SuccessfulMountVolume  31m                kubelet, aks-agentpool-17149307-2  MountVolume.SetUp succeeded for volume "secrets-dir"
  Normal   SuccessfulMountVolume  31m                kubelet, aks-agentpool-17149307-2  MountVolume.SetUp succeeded for volume "plugin-dir"
  Normal   SuccessfulMountVolume  31m                kubelet, aks-agentpool-17149307-2  MountVolume.SetUp succeeded for volume "jenkins-config"
  Normal   SuccessfulMountVolume  31m                kubelet, aks-agentpool-17149307-2  MountVolume.SetUp succeeded for volume "default-token-gs645"
  Warning  FailedMount            24m (x3 over 29m)  kubelet, aks-agentpool-17149307-2  Unable to mount volumes for pod "jenkins-6746b45c94-sgm56_default(62aabcff-8016-11e8-82bc-0a58ac1f1109)": timeout expired waiting for volumes to attach/mount for pod "default"/"jenkins-6746b45c94-sgm56". list of unattached/unmounted volumes=[jenkins-home]
  Normal   SuccessfulMountVolume  23m                kubelet, aks-agentpool-17149307-2  MountVolume.SetUp succeeded for volume "pvc-97357b96-430b-11e8-a44d-0a58ac1f0a4f"
  Normal   Pulling                23m                kubelet, aks-agentpool-17149307-2  pulling image "jenkins/jenkins:lts"
  Normal   Pulled                 22m                kubelet, aks-agentpool-17149307-2  Successfully pulled image "jenkins/jenkins:lts"
  Normal   Created                21m                kubelet, aks-agentpool-17149307-2  Created container
  Normal   Started                21m                kubelet, aks-agentpool-17149307-2  Started container
  Normal   Pulling                21m                kubelet, aks-agentpool-17149307-2  pulling image "jenkins/jenkins:lts"
  Normal   Pulled                 21m                kubelet, aks-agentpool-17149307-2  Successfully pulled image "jenkins/jenkins:lts"
  Normal   Created                21m                kubelet, aks-agentpool-17149307-2  Created container
  Normal   Started                21m                kubelet, aks-agentpool-17149307-2  Started container
  Warning  Unhealthy              20m (x2 over 20m)  kubelet, aks-agentpool-17149307-2  Readiness probe failed: Get http://10.244.1.94:8080/login: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
  Warning  Unhealthy              20m                kubelet, aks-agentpool-17149307-2  Readiness probe failed: HTTP probe failed with statuscode: 503

@andyzhangx
Copy link
Contributor

Thanks for the info, looks like disk detach/attach in West Europe region costs a lot more time, I will try on that region, try to repro that first, thanks.

@andyzhangx
Copy link
Contributor

andyzhangx commented Jul 5, 2018

Update:
I tried on West Europe, one singe azure disk attachment would cost 2min25s, that's too much time, as comparision in region westus2, it only costs 30s. @jalberto what's your region, then? (Update: I see, it's still West Europe, that region has slow disk attachment issue, already filed a ticket)
I will file a ticket to azure VM team.
FYI @brendandburns @khenidak @slack

@jalberto
Copy link
Author

jalberto commented Jul 5, 2018

@andyzhangx westeurope too (that info is in the description of this issue) Thanks for continuing digging this :)

In other hand, about your math of 4mins for 4 disks, I guess if umount+mount optimal time is 1min (still too much TBH) N disks should take same amount of time, as operations should be parallel not sequential, so 1 disk should take 1min as 5 disk should take 1min too

@IainColledge
Copy link

Here is one of mine, the events expired but generally just failing to attach and you can see it takes about 1:40 per pod to attach a disk:

Name:           teamcityba-2
Namespace:      master
Node:           aks-nodepool1-36127387-0/10.240.0.5
Start Time:     Wed, 04 Jul 2018 22:05:57 +0100
Labels:         application=teamcityba
                application_type=teamcity
                controller-revision-hash=teamcityba-559b6b8f74
                statefulset.kubernetes.io/pod-name=teamcityba-2
Annotations:    <none>
Status:         Running
IP:             10.244.3.77
Controlled By:  StatefulSet/teamcityba
Containers:
  teamcityba:
    Container ID:   docker://eff4f656337ae720ae0d91e14b9e1175243e2a43a8fbb722dc311a11042a0a4a
    Image:          aidockerrgy.azurecr.io/teamcityba:2018.1.411
    Image ID:       docker-pullable://aidockerrgy.azurecr.io/teamcityba@sha256:b2ab2297619edc5447a61b31d62881bcb3019d04bf456ddfe93211272335764d
    Port:           9090/TCP
    Host Port:      0/TCP
    State:          Running
      Started:      Wed, 04 Jul 2018 23:40:25 +0100
    Ready:          True
    Restart Count:  0
    Environment:
      TC_SERVER_URL:  http://teamcity.master.svc.cluster.local:8111
      APP_NAME:       teamcityba
    Mounts:
      /cacert from cacert (ro)
      /home/teamcityba/buildAgent/conf from teamcityba (rw)
      /var/run/docker.sock from docker-socket (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-g4x2z (ro)
Conditions:
  Type           Status
  Initialized    True
  Ready          True
  PodScheduled   True
Volumes:
  teamcityba:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  teamcityba-teamcityba-2
    ReadOnly:   false
  docker-socket:
    Type:          HostPath (bare host directory volume)
    Path:          /var/run/docker.sock
    HostPathType:
  cacert:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  cacert
    Optional:    false
  default-token-g4x2z:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-g4x2z
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:          <none>

@andyzhangx
Copy link
Contributor

@jalberto I could confirm attaching multiple disks is not parallel, it's sequential. You may try westus2 region if possible, attach/detach disk is much faster in that region.

@jalberto
Copy link
Author

jalberto commented Jul 5, 2018 via email

@IainColledge
Copy link

Just did another deploy:

Name:           teamcity-0
Namespace:      master
Node:           aks-nodepool1-36127387-0/10.240.0.5
Start Time:     Thu, 05 Jul 2018 14:39:38 +0100
Labels:         application=teamcity
                application_type=teamcity
                controller-revision-hash=teamcity-f4845cf48
                statefulset.kubernetes.io/pod-name=teamcity-0
Annotations:    <none>
Status:         Pending
IP:
Controlled By:  StatefulSet/teamcity
Containers:
  teamcity:
    Container ID:
    Image:          aidockerrgy.azurecr.io/teamcity:2018.1.367
    Image ID:
    Port:           8111/TCP
    Host Port:      0/TCP
    State:          Waiting
      Reason:       ContainerCreating
    Ready:          False
    Restart Count:  0
    Environment:    <none>
    Mounts:
      /cacert from cacert (ro)
      /var/opt/data from teamcity (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-g4x2z (ro)
Conditions:
  Type           Status
  Initialized    True
  Ready          False
  PodScheduled   True
Volumes:
  teamcity:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  teamcity-teamcity-0
    ReadOnly:   false
  cacert:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  cacert
    Optional:    false
  default-token-g4x2z:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-g4x2z
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type     Reason       Age                From                               Message
  ----     ------       ----               ----                               -------
  Warning  FailedMount  38s (x36 over 1h)  kubelet, aks-nodepool1-36127387-0  Unable to mount volumes for pod "teamcity-0_master(dce9ff91-8058-11e8-8bc3-0a58ac1f040b)": timeout expired waiting for volumes to attach/mount for pod "master"/"teamcity-0". list of unattached/unmounted volumes=[teamcity]

@andyzhangx
Copy link
Contributor

looks like the disk attachment time issue has been resolved in West Europe region

  • Attach one disk costs about 30s
Events:
  Type     Reason                  Age              From                               Message
  ----     ------                  ----             ----                               -------
  Warning  FailedScheduling        1m (x4 over 1m)  default-scheduler                  pod has unbound PersistentVolumeClaims (repeated 2 times)
  Normal   Scheduled               1m               default-scheduler                  Successfully assigned wrapping-angelfish-mariadb-0 to k8s-agentpool-10544097-0
  Normal   SuccessfulMountVolume   1m               kubelet, k8s-agentpool-10544097-0  MountVolume.SetUp succeeded for volume "config"
  Normal   SuccessfulMountVolume   1m               kubelet, k8s-agentpool-10544097-0  MountVolume.SetUp succeeded for volume "default-token-chcn2"
  Normal   SuccessfulAttachVolume  36s              attachdetach-controller            AttachVolume.Attach succeeded for volume "pvc-e6818ba3-8117-11e8-a9f5-000d3a3a51f5"
  Normal   SuccessfulMountVolume   5s               kubelet, k8s-agentpool-10544097-0  MountVolume.SetUp succeeded for volume "pvc-e6818ba3-8117-11e8-a9f5-000d3a3a51f5"
  Normal   Pulling                 4s               kubelet, k8s-agentpool-10544097-0  pulling image "docker.io/bitnami/mariadb:10.1.33"
  Normal   Pulled                  1s               kubelet, k8s-agentpool-10544097-0  Successfully pulled image "docker.io/bitnami/mariadb:10.1.33"
  Normal   Created                 1s               kubelet, k8s-agentpool-10544097-0  Created container
  Normal   Started                 0s               kubelet, k8s-agentpool-10544097-0  Started container
  • Drain two disks from one node to another costs about 2min
Events:
  Type     Reason                  Age   From                               Message
  ----     ------                  ----  ----                               -------
  Normal   Scheduled               3m    default-scheduler                  Successfully assigned wrapping-angelfish-wordpress-855796db6c-xqn8c to k8s-agentpool-10544097-0
  Warning  FailedAttachVolume      3m    attachdetach-controller            Multi-Attach error for volume "pvc-e57a9bd8-8117-11e8-a9f5-000d3a3a51f5" Volume is already used by pod(s) wrapping-angelfish-wordpress-855796db6c-prtnp
  Normal   SuccessfulMountVolume   3m    kubelet, k8s-agentpool-10544097-0  MountVolume.SetUp succeeded for volume "default-token-chcn2"
  Normal   SuccessfulAttachVolume  1m    attachdetach-controller            AttachVolume.Attach succeeded for volume "pvc-e57a9bd8-8117-11e8-a9f5-000d3a3a51f5"
  Warning  FailedMount             1m    kubelet, k8s-agentpool-10544097-0  Unable to mount volumes for pod "wrapping-angelfish-wordpress-855796db6c-xqn8c_default(2deb2d41-8118-11e8-a9f5-000d3a3a51f5)": timeout expired waiting for volumes to attach or mount for pod "default"/"wrapping-angelfish-wordpress-855796db6c-xqn8c". list of unmounted volumes=[wordpress-data]. list of unattached volumes=[wordpress-data default-token-chcn2]
  Normal   SuccessfulMountVolume   52s   kubelet, k8s-agentpool-10544097-0  MountVolume.SetUp succeeded for volume "pvc-e57a9bd8-8117-11e8-a9f5-000d3a3a51f5"
  Normal   Pulling                 47s   kubelet, k8s-agentpool-10544097-0  pulling image "docker.io/bitnami/wordpress:4.9.6"
  Normal   Pulled                  32s   kubelet, k8s-agentpool-10544097-0  Successfully pulled image "docker.io/bitnami/wordpress:4.9.6"
  Normal   Created                 15s   kubelet, k8s-agentpool-10544097-0  Created container
  Normal   Started                 15s   kubelet, k8s-agentpool-10544097-0  Started container

@IainColledge
Copy link

Yup, back to normal, thanks.

@jalberto
Copy link
Author

jalberto commented Jul 6, 2018 via email

@andyzhangx
Copy link
Contributor

pls try again in West Europe, and I will contact with support about your concern, thanks.

@luisdavim
Copy link

I'm havimg the same issue on east us

@peon-pasado-zeitnot
Copy link

Same here :(

@andyzhangx
Copy link
Contributor

@Viroos @luisdavim Multi-Attach error is expected (also happens on other could provider) when rescheduling one pod from one node to another, usually k8s will retry and recover in a few minutes. Just let me know if this issue exists for a long time.

@derekperkins
Copy link
Contributor

I'm having this issue in canada central. It has been an hour or so, and I've tried deleting the pod a few times to try and get it rescheduled on a matching node.

@andyzhangx
Copy link
Contributor

@derekperkins could you provide more details by:

  • your k8s version
  • output kubectl describe po POD-NAME result
  • check status of that disk, which vm is that disk attached to?

@derekperkins
Copy link
Contributor

derekperkins commented Aug 31, 2018

thanks for the response @andyzhangx. After looking at it, the PV is already attached to the correct node that the pod is running on, so I don't know why it's throwing the multi-attach error.
k8s: 1.11.2

StatefulSet Pod

$ kubectl describe pod -n clickhouse zk-zookeeper-0

Name:               zk-zookeeper-0
Namespace:          clickhouse
Priority:           0
PriorityClassName:  <none>
Node:               aks-nodepool1-39874233-5/10.0.2.47
Start Time:         Thu, 30 Aug 2018 12:27:06 -0600
Labels:             app=zk-zookeeper
                    chart=zookeeper-1.7.0
                    controller-revision-hash=zk-zookeeper-5fb8975954
                    release=zookeeper
                    statefulset.kubernetes.io/pod-name=zk-zookeeper-0
Annotations:        <none>
Status:             Running
IP:                 10.0.2.96
Controlled By:      StatefulSet/zk-zookeeper
Init Containers:
  zookeeper-init:
    Container ID:  docker://7b308bbe4d780f6beeee76d0bf234e111bec2e422cfeb5136697d621d8eb8953
    Image:         busybox
    Image ID:      docker-pullable://busybox@sha256:cb63aa0641a885f54de20f61d152187419e8f6b159ed11a251a09d115fdff9bd
    Port:          <none>
    Host Port:     <none>
    Command:
      sh
      -c
      chown -R 1000:1000 /var/lib/zookeeper
    State:          Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Thu, 30 Aug 2018 12:29:23 -0600
      Finished:     Thu, 30 Aug 2018 12:29:23 -0600
    Ready:          True
    Restart Count:  0
    Environment:    <none>
    Mounts:
      /var/lib/zookeeper from datadir (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-67j28 (ro)
Containers:
  zookeeper:
    Container ID:  docker://ffd1f8e1f61ce157432b8ae5cd57382f1146748d4df6cb73ffd00394fee1782b
    Image:         mirantisworkloads/zookeeper:3.5.3-beta
    Image ID:      docker-pullable://mirantisworkloads/zookeeper@sha256:e6bf6010556796a56c47e90d9bf5aa7e9245353b688025e8d6332dddf449ebf3
    Ports:         2181/TCP, 2888/TCP, 3888/TCP
    Host Ports:    0/TCP, 0/TCP, 0/TCP
    Command:
      entrypoint.sh
    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       Error
      Exit Code:    137
      Started:      Fri, 31 Aug 2018 13:29:55 -0600
      Finished:     Fri, 31 Aug 2018 13:31:04 -0600
    Ready:          False
    Restart Count:  407
    Requests:
      cpu:      1
      memory:   4Gi
    Liveness:   exec [zkCheck.sh] delay=15s timeout=5s period=10s #success=1 #failure=3
    Readiness:  exec [zkCheck.sh] delay=15s timeout=5s period=10s #success=1 #failure=3
    Environment:
      ZOO_LOG4J_PROP:    INFO,CONSOLE,ROLLINGFILE
      ZK_REPLICAS:       3
      ZK_LOG_LEVEL:      ERROR
      ZK_CLIENT_PORT:    2181
      ZK_SERVER_PORT:    2888
      ZK_ELECTION_PORT:  3888
    Mounts:
      /opt/zookeeper/configmap from zoo-cfg (rw)
      /var/lib/zookeeper from datadir (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-67j28 (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             False
  ContainersReady   False
  PodScheduled      True
Volumes:
  datadir:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  datadir-zk-zookeeper-0
    ReadOnly:   false
  zoo-cfg:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      zk-zookeeper
    Optional:  false
  default-token-67j28:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-67j28
    Optional:    false
QoS Class:       Burstable
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type     Reason     Age                  From                               Message
  ----     ------     ----                 ----                               -------
  Warning  Unhealthy  5m (x1220 over 1d)   kubelet, aks-nodepool1-39874233-5  Liveness probe failed:
  Warning  BackOff    29s (x4880 over 1d)  kubelet, aks-nodepool1-39874233-5  Back-off restarting failed container

PVC

$ kubectl describe pvc -n clickhouse datadir-zk-zookeeper-0

Name:          datadir-zk-zookeeper-0
Namespace:     clickhouse
StorageClass:  managed-premium
Status:        Bound
Volume:        pvc-f4b5e4c9-7e86-11e8-bf96-0a58ac1f00a5
Labels:        app=zk-zookeeper
               chart=zookeeper-1.7.0
               release=zookeeper
Annotations:   pv.kubernetes.io/bind-completed=yes
               pv.kubernetes.io/bound-by-controller=yes
               volume.beta.kubernetes.io/storage-provisioner=kubernetes.io/azure-disk
Finalizers:    [kubernetes.io/pvc-protection]
Capacity:      32Gi
Access Modes:  RWO
Events:        <none>

PV

$ kubectl describe pv -n clickhouse pvc-f4b5e4c9-7e86-11e8-bf96-0a58ac1f00a5
Name:            pvc-f4b5e4c9-7e86-11e8-bf96-0a58ac1f00a5
Labels:          <none>
Annotations:     pv.kubernetes.io/bound-by-controller=yes
                 pv.kubernetes.io/provisioned-by=kubernetes.io/azure-disk
                 volumehelper.VolumeDynamicallyCreatedByKey=azure-disk-dynamic-provisioner
Finalizers:      [kubernetes.io/pv-protection]
StorageClass:    managed-premium
Status:          Bound
Claim:           clickhouse/datadir-zk-zookeeper-0
Reclaim Policy:  Delete
Access Modes:    RWO
Capacity:        32Gi
Node Affinity:   <none>
Message:
Source:
    Type:         AzureDisk (an Azure Data Disk mount on the host and bind mount to the pod)
    DiskName:     kubernetes-dynamic-pvc-f4b5e4c9-7e86-11e8-bf96-0a58ac1f00a5
    DiskURI:      /subscriptions/123/resourceGroups/MC_nozzle_nozzle_canadacentral/providers/Microsoft.Compute/disks/kubernetes-dynamic-pvc-f4b5e4c9-7e86-11e8-bf96-0a58ac1f00a5
    Kind:         Managed
    FSType:
    CachingMode:  None
    ReadOnly:     false
Events:           <none>

Azure Portal

image

@ghost
Copy link

ghost commented Nov 21, 2018

I am having the same error as well on uksouth (on version 1.11.4). I am quite new to kubernetes but looks like the deployment isn't correctly scaling down the old replica set that is attached to the volume.

As a work around for now I just scale the old replica set but kind of annoying as it means I have to keep doing this for every deployment.

@andyzhangx
Copy link
Contributor

@stevetsang one pvc disk could only be used by one pod, and in some regions, like westeurope, the disk attachment is low, there would be a few multi-attach error and finally disk mount should recovered.
Back to your case, scale up and down won't fix your issue if one pvc disk is only used by one pod.

@ghost
Copy link

ghost commented Nov 22, 2018

@andyzhangx I am currently using one pvc for the one pod. What I am doing with scaling is setting it down to 0 to force the disk to unmount then scaling back to 1 again so it would mount. Do you recommend I just leave it until it figures it out to unmount from the old pod?

@andyzhangx
Copy link
Contributor

@stevetsang could you share the kubectl describe po POD-NAME info? If the multi-attach error only lasts for a few minutes, that would be ok, otherwise there is issue there.

@ghost
Copy link

ghost commented Nov 22, 2018

@andyzhangx I have use kubectl describe po POD-NAME and have the following information:

old pod that needs to unmount the volume:

Name:               data-786bb49677-92hpv
Namespace:          velox
Priority:           0
PriorityClassName:  <none>
Node:               aks-nodepool1-42408384-2/10.240.0.4
Start Time:         Wed, 21 Nov 2018 16:56:44 +0000
Labels:             name=data
                    pod-template-hash=3426605233
Annotations:        <none>
Status:             Running
IP:                 10.244.2.174
Controlled By:      ReplicaSet/data-786bb49677
Containers:
  data:
    Container ID:   docker://01330cc69021146865a8fae9450414164fe799a4d225d96ef22d39370db82b77
    Image:          etechsolutions/velox:datavelox_master_20181121.5
    Image ID:       docker-pullable://etechsolutions/velox@sha256:45146f2cc061caf247956c78988ca690586c434cdca526a033e0afd99da98f6d
    Port:           80/TCP
    Host Port:      0/TCP
    State:          Running
      Started:      Wed, 21 Nov 2018 17:11:14 +0000
    Ready:          True
    Restart Count:  0
    Liveness:       http-get http://:80/ok delay=30s timeout=1s period=10s #success=1 #failure=3
    Environment:
      NODE_ENV:                      production
    Mounts:
      /usr/git-store from volume (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-6hvj6 (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             True
  ContainersReady   True
  PodScheduled      True
Volumes:
  volume:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  data-service-disk
    ReadOnly:   false
  default-token-6hvj6:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-6hvj6
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:          <none>

new pod that needs to mount the volume:

Name:               data-7bdf6cff6f-xgjtf
Namespace:          velox
Priority:           0
PriorityClassName:  <none>
Node:               aks-nodepool1-42408384-0/10.240.0.6
Start Time:         Thu, 22 Nov 2018 13:51:06 +0000
Labels:             name=data
                    pod-template-hash=3689279929
Annotations:        <none>
Status:             Pending
IP:
Controlled By:      ReplicaSet/data-7bdf6cff6f
Containers:
  data:
    Container ID:
    Image:          etechsolutions/velox:datavelox_master_20181122.2
    Image ID:
    Port:           80/TCP
    Host Port:      0/TCP
    State:          Waiting
      Reason:       ContainerCreating
    Ready:          False
    Restart Count:  0
    Liveness:       http-get http://:80/ok delay=30s timeout=1s period=10s #success=1 #failure=3
    Environment:
      NODE_ENV:                      production
    Mounts:
      /usr/git-store from volume (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-6hvj6 (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             False
  ContainersReady   False
  PodScheduled      True
Volumes:
  volume:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  data-service-disk
    ReadOnly:   false
  default-token-6hvj6:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-6hvj6
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type     Reason              Age               From                               Message
  ----     ------              ----              ----                               -------
  Normal   Scheduled           22m               default-scheduler                  Successfully assigned velox/data-7bdf6cff6f-xgjtf to aks-nodepool1-42408384-0
  Warning  FailedAttachVolume  22m               attachdetach-controller            Multi-Attach error for volume "pvc-a146519b-e820-11e8-8716-2e147cf732b6" Volume is already used by pod(s) data-786bb49677-92hpv
  Warning  FailedMount         2m (x9 over 20m)  kubelet, aks-nodepool1-42408384-0  Unable to mount volumes for pod "data-7bdf6cff6f-xgjtf_velox(a8329388-ee5d-11e8-96fa-1adab9cd1b4d)": timeout expired waiting for volumes to attach or mount for pod "velox"/"data-7bdf6cff6f-xgjtf". list of unmounted volumes=[volume]. list of unattached volumes=[volume default-token-6hvj6]```

@andyzhangx
Copy link
Contributor

@stevetsang the error is clear, you can not use one PVC disk data-service-disk for two pods, you may delete the old pod and then let the new pod use that PVC disk.

Multi-Attach error for volume "pvc-a146519b-e820-11e8-8716-2e147cf732b6" Volume is already used by pod(s) data-786bb49677-92hpv

@ghost
Copy link

ghost commented Nov 22, 2018

@andyzhangx I am using helm upgrade to upgrade the pod to a new release should it not be tearing down the old pod and unmounting the volume before bringing up the new pod?

@andyzhangx
Copy link
Contributor

@stevetsang I don't know the helm upgrade logic, anyway, the old pod is still living well...

@ghost
Copy link

ghost commented Nov 22, 2018

ah finally figured it out! By default the deployment file strategy is 'RollingUpdate', due to this the old pod doesn't get teared down until the new one is up and because we are unable to mount the drive onto the new pod it gets stuck. I was able to fix this by change the strategy to 'Recreate'.

Many thanks to @andyzhangx for your help :)

@andyzhangx
Copy link
Contributor

@stevetsang RollingUpdate should be ok in StatefulSet even replica is 1, it will delete the pod first and then create the new pod. I am not sure why your old pod is not deleted.

@ghost
Copy link

ghost commented Nov 27, 2018

@andyzhangx I am using a Deployment set to replica 1. Unsure if your fix kubernetes/kubernetes#71377 would fix my issue when my deployment strategy set to RollingUpdate. I guess I will give it a try once 1.11.5 is available.

@andyzhangx
Copy link
Contributor

andyzhangx commented Nov 27, 2018

@ghost
Copy link

ghost commented Nov 27, 2018

@andyzhangx okies well I'll will just leave it on Recreate. 👍

@AceHack
Copy link

AceHack commented Dec 17, 2018

@andyzhangx I'm on v1.11.5 and I'm still having the issue in east us.

@andyzhangx
Copy link
Contributor

andyzhangx commented Dec 18, 2018

FYI. Last month I fixed an azure disk attach/detach issue caused by dirty vm cache which was introduced from k8s v1.9.2
This issue could lead to long time disk attach/detach(failure in long time) and it’s not easy to repro since it only happens on a little possibility when attach/detach multiple disks on a single VM.
Related PR: kubernetes/kubernetes#71495
All details: https://github.com/andyzhangx/demo/blob/master/issues/azuredisk-issues.md#14-azure-disk-attachdetach-failure-mount-issue-io-error

This fix has been verified on ~150 clusters(~750 nodes totally) with disk attach/detach scenario by one customer running for more than one month.

k8s version fixed version
v1.10 1.10.12
v1.11 1.11.6
v1.12 1.12.4
v1.13 no such issue

Please upgrade to the above k8s versions when they are available on AKS(it's not available now), thanks.
This fix could be the silver bullet for all of these weird disk attach/detach issues from v1.9
cc @brendandburns @khenidak

@andyzhangx
Copy link
Contributor

I will close this issue, jut let me know if you any question. Thanks.

@Marusyk
Copy link

Marusyk commented Aug 22, 2019

@andyzhangx still reproducible in 1.12.8. could you please help?

Multi-Attach error for volume "pvc-6844019c-b9d1-11e9-95d1-8225c29d0220" Volume is already exclusively attached to one node and can't be attached to another
Back-off restarting failed container

on my stateful set with Elasticsearch

@andyzhangx
Copy link
Contributor

@andyzhangx still reproducible in 1.12.8. could you please help?

Multi-Attach error for volume "pvc-6844019c-b9d1-11e9-95d1-8225c29d0220" Volume is already exclusively attached to one node and can't be attached to another
Back-off restarting failed container

on my stateful set with Elasticsearch

If that Multi-Attach error lasts from more than 10min, that could be due to detach failure, it could be disk RP busy, controller manager crash, etc. Workaround is detach the disk in problem manually.

I just added disk attach/detach self-healing feature for azure disk driver, azure cloud provider would check and detach this disk if it's already attached to the other node, that's like self-healing. This PR could fix lots of such disk attachment issue.

19. disk attach/detach self-healing

Issue details:
There could be disk detach failure due to many reasons(e.g. disk RP busy, controller manager crash, etc.), and it would fail when attach one disk to other node if that disk is still attached to the old node, user needs to manually detach disk in problem in the before, with this fix, azure cloud provider would check and detach this disk if it's already attached to the other node, that's like self-healing. This PR could fix lots of such disk attachment issue.

Fix

Following PR would first check whether current disk is already attached to other node, if so, it would trigger a dangling error and k8s controller would detach disk first, and then do the attach volume operation.

This PR would also fix a "disk not found" issue when detach azure disk due to disk URI case sensitive case, error logs are like following(without this PR):

azure_controller_standard.go:134] detach azure disk: disk  not found, diskURI: /subscriptions/xxx/resourceGroups/andy-mg1160alpha3/providers/Microsoft.Compute/disks/xxx-dynamic-pvc-41a31580-f5b9-4f08-b0ea-0adcba15b6db
k8s version fixed version
v1.12 no fix
v1.13 1.13.11
v1.14 1.14.7
v1.15 1.15.4
v1.15 1.16.0

Work around:

manually detach disk in problem

@ghost ghost locked as resolved and limited conversation to collaborators Aug 5, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

10 participants