Skip to content

Chunking on Output File #1401

Answered by JamiePringle
ah-dinh asked this question in Q&A
Jul 21, 2023 · 1 comments · 1 reply
Discussion options

You must be logged in to vote

If you request a chunksize of 10000 trajectories by 730 at the start of the run it should be respected, and I have not found that these slow things down too much. The chunks will be re-written, but are often cached by the filesystem. Also, have you tried rechunking? I do not find it slow even for these sizes. I suggest experimenting.

Also, you are likely running on multiple processors? If you look at the documentation on how to deal with large runs and MPI, you will see examples of where the data is re-chunked when merging from multiple output zarr stores (one per process) to a single zarr store. This is a relatively efficient process (e.g. 25 minutes for a 250Gb output).

Jamie

Replies: 1 comment 1 reply

Comment options

You must be logged in to vote
1 reply
@ah-dinh
Comment options

Answer selected by ah-dinh
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants