Skip to content

Upgrade the OpenPMIx package on a Slurm cluster managed with AWS ParallelCluster

Jacopo De Amicis edited this page Sep 11, 2023 · 1 revision

Upgrade the OpenPMIx package on a Slurm cluster managed with AWS ParallelCluster

An AWS ParallelCluster release comes with a set of AMIs for the supported operating systems and EC2 platforms. Each AMI contains a software stack, including the OpenPMIx package, that has been validated at ParallelCluster release time. If you wish to upgrade the OpenPMIx on your cluster you can follow this guide.

Step 1. Upgrading the OpenPMIx package in the head node

If you wish to upgrade the OpenPMIx on the head node of your cluster, you cannot rely on upgrading your AMI, since as of ParallelCluster 3.7.0 the AMI of the head node cannot be changed with a pcluster update-cluster operation. In this case, please follow these steps (here we are installing version 4.2.6).

  1. Stop the compute fleet on the cluster via a pcluster update-compute-fleet -n <cluster_name> --status STOP_REQUESTED operation, and wait for the compute fleet to be stopped.
  2. Verify installed version of PMIx:
[ec2-user@ip-10-0-0-80 ~]$ srun --mpi=list
MPI plugin types are...
    cray_shasta
    none
    pmi2
    pmix
specific pmix plugin versions available: pmix_v3

[ec2-user@ip-10-0-0-80 ~]$ /opt/pmix/bin/pmix_info
                 Package: PMIx root@ip-172-31-0-53.ec2.internal Distribution
                    PMIX: 3.2.3
                    ...
  1. As root, execute the following script in the head node to install an updated version of PMIx
#!/bin/bash

# activate python virtual env
source /opt/parallelcluster/pyenv/versions/3.9.16/envs/cookbook_virtualenv/bin/activate

# Go in the pmix local cache folder and uninstall it
# NOTE: you have to modify the pmix version accordingly to your pcluster version
cd /etc/chef/local-mode-cache/cache/pmix-3.2.3/
sudo make uninstall

# be sure to remove all previous version files
sudo rm -rf /opt/pmix

# download new version
wget https://github.com/openpmix/openpmix/releases/download/v4.2.6/pmix-4.2.6.tar.gz
tar xf pmix-4.2.6.tar.gz

# compile and install it
cd pmix-4.2.6/
./autogen.pl
./configure --prefix=/opt/pmix
CORES=$(grep processor /proc/cpuinfo | wc -l)
make -j $CORES
sudo make install
  1. As root, recompile Slurm, by executing the following script in the head node
#!/bin/bash

# stop slurm daemon
systemctl stop slurmctld

# activate python virtual env
source /opt/parallelcluster/pyenv/versions/3.9.16/envs/cookbook_virtualenv/bin/activate

# Go in the Slurm local cache folder, compile and install it
# NOTE: you have to modify the slurm version accordingly to your pcluster version
cd /etc/chef/local-mode-cache/cache/slurm-slurm-23-02-4-1/

# uninstall it
sudo make uninstall
sudo make clean

# re-install it
./configure --prefix=/opt/slurm --with-pmix=/opt/pmix --with-jwt=/opt/libjwt --enable-slurmrestd
CORES=$(grep processor /proc/cpuinfo | wc -l)
make -j $CORES
sudo make install
sudo make install-contrib

# deactivate python virtual env
deactivate

# start slurm daemon
systemctl start slurmctld
  1. Verify installed version of PMIx:
[ec2-user@ip-10-0-0-80 ~]$ srun --mpi=list
MPI plugin types are...
 cray_shasta
 none
 pmi2
 pmix
specific pmix plugin versions available: pmix_v4

[ec2-user@ip-10-0-0-80 ~]$ /opt/pmix/bin/pmix_info
 Package: PMIx ec2-user@ip-10-0-0-80 Distribution
 PMIX: 4.2.6
 ...

Step 2. Upgrading the OpenPMIx package in the compute nodes through a custom bootstrap action

To upgrade the OpenPMIx package on the compute nodes, it is advised to use a custom bootstrap script.

  1. Create a update-pmix.sh script with the following content
#!/bin/bash

# remove old version
sudo rm -rf /opt/pmix

# download new version
wget https://github.com/openpmix/openpmix/releases/download/v4.2.6/pmix-4.2.6.tar.gz
tar xf pmix-4.2.6.tar.gz

# compile and install it
cd pmix-4.2.6/
./autogen.pl
./configure --prefix=/opt/pmix
CORES=$(grep processor /proc/cpuinfo | wc -l)
make -j $CORES
sudo make install
  1. Upload your script in a bucket
aws s3 cp --acl public-read update-pmix.sh s3://<bucket-name>/update-pmix.sh
  1. Add the following configuration in all your queues
Scheduling:
  Scheduler: slurm
  SlurmQueues:
    - Name: queue1
      [...]
      CustomActions:
        OnNodeConfigured:
          Script: s3://bucket-name/update-pmix.sh
      Iam:
        S3Access:
          - BucketName: bucket-name
            EnableWriteAccess: false
  1. Update your cluster via a pcluster update-cluster -n <cluster_name> -c cluster-config.yaml
  2. Restart the compute fleet via a pcluster update-compute-fleet -n <cluster_name> --status START_REQUESTED operation.
Clone this wiki locally