Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding particles to pset in MPI run #1193

Closed
claudiofgcardoso opened this issue Jul 8, 2022 · 6 comments
Closed

Adding particles to pset in MPI run #1193

claudiofgcardoso opened this issue Jul 8, 2022 · 6 comments

Comments

@claudiofgcardoso
Copy link

Hello all,
I'm trying to run parcels by executing pset at every output dt (following the example provided in this tutorial) because I want to conditionally release particles every 5 days, depending if there isn't any existing particles in a range of 3 km from the new ones. Because of the size of the pset, I'm trying to do this in parallel but apparently I'm not able to add new particles to the existing pset, raising the following error:

File "/gpfs/work/cardoso/scripts/parcels/TATL3_PARCELS_OMZ_conditional.py", line 584, in execute_conditional_release_daily
    execute_conditional_release_daily(fieldset, particles, finish_date, outfile_name, 
  File "/gpfs/home/cardoso/anaconda3/envs/py3_parcels_mpi/lib/python3.10/site-packages/parcels/particleset/particlesetsoa.py", line 195, in __init__
    pset_init = ParticleSet(fieldset=fieldset, pclass=O2Particle, 
  File "/gpfs/home/cardoso/anaconda3/envs/py3_parcels_mpi/lib/python3.10/site-packages/parcels/particleset/particlesetsoa.py", line 195, in __init__
    self._collection = ParticleCollectionSOA(
  File "/gpfs/home/cardoso/anaconda3/envs/py3_parcels_mpi/lib/python3.10/site-packages/parcels/collection/collectionsoa.py", line 112, in __init__
    lon = lon[self._pu_indicators == mpi_rank]
    lon = lon[self._pu_indicators == mpi_rank]
IndexError: boolean index did not match indexed array along dimension 0; dimension is 28044 but corresponding boolean dimension is 2217

Is there a way to close the pset from the parallel computing and maintain the object to add new particles and then distribute again these particles through the processos without pset.export() or pset.close()?

Cláudio

@erikvansebille
Copy link
Member

Thanks for reporting, @claudiofgcardoso. I think that I created a quick fix for this bug in #1194, can you check whether this works for you?

Note though, that because every change in the ParticleSet length now triggers a call to KMeans(), this can become very slow for large ParticleSets with many .add() calls. So be warned...

@claudiofgcardoso
Copy link
Author

Hi @erikvansebille, many thanks for the quick response and fix! I tested this fix for two case studies (both of them perform the execution and the writting of the pset in a daily loop):
a) particles are added to the pset from 5 to 5 days without any condition: Here, the algorithim runs well!
b) particles are added to the pset from 5 to 5 days, but only if these new particles are not within the range of 3 km from the existing particles in pset. Here the algorithim gives an error (sorry for the long report, error was reported for the 4 procs):

----- Waiting for run to be saved proc 3 ... ---------
----- Waiting for run to be saved proc 1 ... ---------
----- Waiting for run to be saved proc 0 ... ---------
----- Waiting for run to be saved proc 2 ... ---------
Daily execution took 0:02:47.623624
Daily execution took 0:02:47.624248
Daily execution took 0:02:47.626928
Daily execution took 0:02:47.627175
Number of avoided duplicated particles so far --> 710
Number of avoided duplicated particles so far --> 2255
Number of avoided duplicated particles so far --> 9221
Number of avoided duplicated particles so far --> 10566
Traceback (most recent call last):
Traceback (most recent call last):
  File "/gpfs/work/cardoso/scripts/parcels/TATL3_PARCELS_OMZ_conditional.py", line 901, in <module>
  File "/gpfs/work/cardoso/scripts/parcels/TATL3_PARCELS_OMZ_conditional.py", line 901, in <module>
Traceback (most recent call last):
  File "/gpfs/work/cardoso/scripts/parcels/TATL3_PARCELS_OMZ_conditional.py", line 901, in <module>
    execute_conditional_release_daily(fieldset, particles, finish_date, outfile_name, 
  File "/gpfs/work/cardoso/scripts/parcels/TATL3_PARCELS_OMZ_conditional.py", line 630, in execute_conditional_release_daily
    execute_conditional_release_daily(fieldset, particles, finish_date, outfile_name, 
  File "/gpfs/work/cardoso/scripts/parcels/TATL3_PARCELS_OMZ_conditional.py", line 630, in execute_conditional_release_daily
    execute_conditional_release_daily(fieldset, particles, finish_date, outfile_name, 
  File "/gpfs/work/cardoso/scripts/parcels/TATL3_PARCELS_OMZ_conditional.py", line 630, in execute_conditional_release_daily
    pset_init, _ = initialize_pset(fieldset, particles_instant.iloc[idx_to_add], 
  File "/gpfs/work/cardoso/scripts/parcels/TATL3_PARCELS_OMZ_conditional.py", line 497, in initialize_pset
    pset_init, _ = initialize_pset(fieldset, particles_instant.iloc[idx_to_add], 
  File "/gpfs/work/cardoso/scripts/parcels/TATL3_PARCELS_OMZ_conditional.py", line 497, in initialize_pset
    pset = ParticleSet(fieldset=fieldset, pclass=O2Particle, 
  File "/gpfs/home/cardoso/anaconda3/envs/py3_parcels_mpi/lib/python3.10/site-packages/parcels/particleset/particlesetsoa.py", line 195, in __init__
    pset = ParticleSet(fieldset=fieldset, pclass=O2Particle, 
    self._collection = ParticleCollectionSOA(
  File "/gpfs/home/cardoso/anaconda3/envs/py3_parcels_mpi/lib/python3.10/site-packages/parcels/collection/collectionsoa.py", line 112, in __init__
    pset_init, _ = initialize_pset(fieldset, particles_instant.iloc[idx_to_add], 
  File "/gpfs/work/cardoso/scripts/parcels/TATL3_PARCELS_OMZ_conditional.py", line 497, in initialize_pset
    lon = lon[self._pu_indicators == mpi_rank]
  File "/gpfs/home/cardoso/anaconda3/envs/py3_parcels_mpi/lib/python3.10/site-packages/parcels/particleset/particlesetsoa.py", line 195, in __init__
IndexError: boolean index did not match indexed array along dimension 0; dimension is 15596 but corresponding boolean dimension is 14251
    pset = ParticleSet(fieldset=fieldset, pclass=O2Particle, 
  File "/gpfs/home/cardoso/anaconda3/envs/py3_parcels_mpi/lib/python3.10/site-packages/parcels/particleset/particlesetsoa.py", line 195, in __init__
    self._collection = ParticleCollectionSOA(
  File "/gpfs/home/cardoso/anaconda3/envs/py3_parcels_mpi/lib/python3.10/site-packages/parcels/collection/collectionsoa.py", line 112, in __init__
    self._collection = ParticleCollectionSOA(
  File "/gpfs/home/cardoso/anaconda3/envs/py3_parcels_mpi/lib/python3.10/site-packages/parcels/collection/collectionsoa.py", line 112, in __init__
    lon = lon[self._pu_indicators == mpi_rank]
IndexError: boolean index did not match indexed array along dimension 0; dimension is 22562 but corresponding boolean dimension is 14251
    lon = lon[self._pu_indicators == mpi_rank]
IndexError: boolean index did not match indexed array along dimension 0; dimension is 24107 but corresponding boolean dimension is 14251
srun: Job step aborted: Waiting up to 32 seconds for job step to finish.

I think the problem is the following: To determine which new particles are to be added, I need to check which new particles are not within the 3km range from the existing particles (that is devided along the 4 procs in this case). Because I'm doing this check in the 4 psets indepedently, the nr. of new particles to be added will be different from proc to proc, which then leads to this error. So, in order to determine correctly the nr of new particles to be released and thus have the same nr of new particles to add to pset, I would need to access the full pset. Does this make sense? And if so, is it feaseble?

Cláudio

@erikvansebille
Copy link
Member

Hi @claudiofgcardoso, good to hear that PR #1194 fixes your simple scenario of ParticleSet.add() in MPI without conditions!

I'm afraid, however, that the more complicated scenario will be difficult to fix on the backend. And unfortunately we don't have the development capacity now to work on this for the foreseeable future.

You're very welcome to try come up with your own fix (Parcels is a community code!), but be aware that MPI implementation is still a bit rudimentary.

If you do want to give it a try, the code in collectionsoa.py below is probably a good entry point https://github.com/OceanParcels/parcels/blob/989eb67294ee6b8b4c292cb7371ccf3944fd5ee4/parcels/collection/collectionsoa.py#L101-L120

An alternative is to work around this limitation in the code and use your scenario a) but then immediately remove the unwanted particles again? Or would that defeat the purpose?

@claudiofgcardoso
Copy link
Author

Hi @erikvansebille! Unfortunately I don't think I can contribute to Parcel's MPI implementation at the moment, as my knowledge on MPI is very limited. Also regarding your suggestion, I doubt that it would be effective because new particles added in proc 0 could pass the check in relation to existing particles in proc 0, but could still be in range of existing particles in other procs .. So I think that my only option is to run it in serial processing and be patient :)
Many thanks for your help!

@claudiofgcardoso
Copy link
Author

Hi @erikvansebille, I'm not sure if this error is related with this issue, but I suppose it is because I couldn't find any other similar report on this in repository... When the MPI run finished, I got the following error during pset.export():

/gpfs/home/cardoso/anaconda3/envs/py3_parcels_mpi/lib/python3.10/site-packages/numpy/lib/arraysetops.py:270: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray.
  ar = np.asanyarray(ar)
Traceback (most recent call last):
  File "/gpfs/work/cardoso/scripts/parcels/TATL3_PARCELS_OMZ_conditional.py", line 898, in <module>
  File "/gpfs/work/cardoso/scripts/parcels/TATL3_PARCELS_OMZ_conditional.py", line 716, in execute_daily_run
  File "/gpfs/home/cardoso/anaconda3/envs/py3_parcels_mpi/lib/python3.10/site-packages/parcels/particlefile/particlefilesoa.py", line 137, in export
    data = self.read_from_npy(global_file_list, len(self.time_written), var)
  File "/gpfs/home/cardoso/anaconda3/envs/py3_parcels_mpi/lib/python3.10/site-packages/parcels/particlefile/particlefilesoa.py", line 73, in read_from_npy
    data = np.nan * np.zeros((self.maxid_written+1, time_steps))
numpy.core._exceptions._ArrayMemoryError: Unable to allocate 455. GiB for an array with shape (2790936, 21892) and data type float64
srun: error: n129: task 0: Killed

Since my run has 1825 timesteps and I ran the sim. with 12 cpus, this may explain the enormous shape of the array being created (21892 / 1825 =~ 12). How can I solve this?

@claudiofgcardoso
Copy link
Author

I managed to export the temporary files to a netCDF using the script developed by @JamiePringle and shared in #1091 (many thanks for this!). So, considering that this should be fixed with the output migration of .npy to zarr #1199, will close the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants