Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Details of *.npy files in temporary output #1159

Closed
JamiePringle opened this issue Apr 4, 2022 · 2 comments
Closed

Details of *.npy files in temporary output #1159

JamiePringle opened this issue Apr 4, 2022 · 2 comments

Comments

@JamiePringle
Copy link
Collaborator

I am writing a python code to convert the temporary *.npy output files to netCDF with minimal memory overhead, so that the conversion succeeds even if the output file is larger than the computers memory. For my runs, even with the fixes in #1095, my large runs fail. I have some questions. I am working from the current development branch.

In pset_info.npy:

  • What is the purpose of the fields 'var_names_once', 'var_dtypes_once', and 'file_list_once'?
  • Is 'lonlatdepth_dtype' ever inconsistent with 'var_dtypes', and if so which should be preferred?

Finally, are the values in "id" unique to each drifter, or will they be re-used for multiple drifters? I am finding cases where a drifter id is only present in some output steps for a given rank. I.e. in directory "out-PZHRJZWC/0/" id=33 will be present in files 0.npy to 66.npy, except for 63.npy, to pick a random example. I am trying to figure out how this could be.

All runs are made with MPI, so there are multiple directories.

I will share this code when I have these issues fixed.

Thanks,
Jamie

@erikvansebille
Copy link
Member

Thanks so much for taking up this challenging biut very relevant task, @JamiePringle! And sorry that I've been quiet for a few weeks and also didn't respond to the discussion in #1091; I've been inundated with teaching and marking, as this is my annual intense-teaching period.

To quickly answer your questions so you can continue your work

What is the purpose of the fields 'var_names_once', 'var_dtypes_once', and 'file_list_once'?

These are for variables that are only written once (to_write='once'), for example because they don't change during a simulation. This saves storage, as each particle only requires one item stored, instead of a vector

Is 'lonlatdepth_dtype' ever inconsistent with 'var_dtypes', and if so which should be preferred?

Not sure if they could be inconsistent (I'd hope not!) but I would say that lonlatdepth_dtype has precedence. Perhaps issue a warning if the two are not the same, so we can keep an eye on it?

Finally, are the values in "id" unique to each drifter, or will they be re-used for multiple drifters?

Hmm, @CKehl has a much deeper understanding of this; I think that for ParticleFiles the IDs are unique

Hope this helps! I'm going back to teaching but should resurface to engage with Parcels development in a week or two

@erikvansebille
Copy link
Member

This has been solved now with the implementation of native zarr in #1199

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants