Skip to content

(3.0.0 3.1.3) AWSBatch Multi node Parallel jobs fail if no EBS defined in cluster

Francesco Giordano edited this page May 4, 2022 · 3 revisions

The issue

When the scheduler of the cluster is awsbatch and a Multi-node Parallel (MNP) job is submitted, in case there is no EBS shared volume defined in the cluster, the job will fail. The failure is caused by the requirement of a shared volume to exchange information regarding the nodes involved in the multi node execution.

The MNP job fails with error:

Error executing script: Shared directory /NONE does not exist

Affected versions (OSes, schedulers)

  • All versions of ParallelCluster >= 3.0.0 are affected when run a multi node parallel job and there is not a EBS defined in the cluster

Mitigation

Create a cluster with at least one shared EBS volume https://docs.aws.amazon.com/parallelcluster/latest/ug/SharedStorage-v3.html#SharedStorage-v3-EbsSettings

Clone this wiki locally