-
Notifications
You must be signed in to change notification settings - Fork 77
-
Notifications
You must be signed in to change notification settings - Fork 77
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Question] How to parallelize replicas in REMD simulations? #648
Comments
Based on issue 516 and some of the replex discussion and scripts from @zhang-ivy it's clear you have to run it with a hostfile and configfile. You can generate the files with clusterutils build_mpirun_configfile.py. The file contents are very simple and can be reused - just need to change the hostfile contents to whatever node name you're running your job on and you can skip the build_mpirun_configfile for other runs. Unless I have 4 separate instances of replica exchange writing to the same file, it appears to be running successfully for me. When I was playing around with it in an interactive session, I had to change the "srun" call in the build_mpirun_configfile.py (line 215) to "mpirun" and manually set 2 environment variables to get build_mpirun_configfile.py to run.
(or whatever the available device ids are) The hostfile only contains the node name repeated for the number of gpus you'll use (assuming you're on a single node):
and configfile contains:
Running it just looks like this:
I'll follow up if it turns out I was dreaming. |
Of course the slurm output makes it look like 4 independent processes started since several print statements are repeated 4 times. These are simulation setup steps though and maybe they're being executed several times but ReplicaExchangeSampler run() is probably running a single instance because if I do
for each gpu process id, only one of the gpu processes is accessing and writing to the log files. Seems like a good sign. |
Hi @felixmusil and @Dan-Burns! I am trying to set up HREX MD with mpi, but in the tutorial, I can find only how to run all replicas on a single core. I would like to run each replica independently on several cores, but all replicas on the same GPU (we have 1 GPU per node). Do you know how to set up such a run? |
Hi,
I would like to run REMD simulations using the
ReplicaExchangeSampler
from this library. I have quite a few replicas for which I would like to run simulations in parallel to reduce the overall runtime. I see that the current implementation is usingmpiplus
to distribute the workload over MPI workers and it works fine using the standard MPI script call over CPU workers.Is it possible to assign different GPUs to different replicas using the current infrastructure? The best I could achieve is to have several MPI workers that simulate different replicas on the same GPU.
If it's not possible at the moment, how can I set a different
DeviceIndex
to the platforms associated with eachthermodynamic_state
?The text was updated successfully, but these errors were encountered: