-
Notifications
You must be signed in to change notification settings - Fork 123
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Saving Parcels output directly in zarr format #1199
Conversation
Did not work because can't append in two dimensions at the same time
Simplifying (and speeidng up) the writing to zarr in MPI mode, by letting each processor write to its own file and then combine with `xr.merge()` at the end
For easier merging of multiple zarr outputs in MPI mode
In xarray version 2022.6.0. Thanks for noticing, @JamiePringle!
Code to combine output from MPI run into a single Zarr file, re-chunk the data, and change variable types, and add variables to the zarr file even for very large output sizes.
Note link will not work yet; only when merged into master
change documentation_MPI.ipynb and documentation_LargeRunsOutput.ipynb to work around slow .to_zarr() of datasets that contain datetime variables.
Comment: while working on #1247 , I get errors that are related to the new ParticleSet writing.
In short: When writing the particle set, there is some incorrect conversion of its attributes in terms of its dtype when writing the file as NetCDF. This is currently not picked up by any tests cause the set zarr as dependency, auto-detect zarr, and only test zarr in all the tests. I can't fix it anymore due to time and cause I also didn't look on the zarr implementation. But may be good to have a look on here (@erikvansebille ). |
With this PR, we are implementing directly saving Parcels trajectory output data in
zarr
format. With the merging of #1165 in v2.3.1, it was already possible to store the output after conversion to a zarr file; after this PR there is no need for conversion at all.Advantages are:
ParticleFile.export()
callOnly major disadvantage so far is that
zarr
folders are not as 'transportable' asnetCDF
files. But it's fairly trivial to convertzarr
data to anetcdf
file usingxarray.open_zarr()
andxarray.to_netcdf()
Developments to do before merging:
chunks
sizes in theParticleFile
objectParticleSet.write()
(especially thecollection.toDictionary()
method)