Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Option: output sparse binary dose file instead of 3ddose in DOSXYZnrc #405

Merged
merged 3 commits into from
Jul 25, 2022

Conversation

marenaud
Copy link
Contributor

@marenaud marenaud commented Jan 13, 2018

Sparse binary 3ddose output

Just throwing this pull request up here to see if there is any interest. It was a fairly simple change to make.

Use case

As part of my research, I use DOSXYZnrc to generate electron beamlets for mixed electron-photon optimisation. A full set of electron beamlets for 5 gantry angles for a single chest wall patient (15,000 beamlets total) with 3mm^3 voxels takes up 2.1 TB of disk space when they're stored as 3ddose files. Just the process of converting beamlets to a nicer format takes a long time. When stored as sparse binary files, the same set of beamlets takes ~14 GB. Being able to output directly to binary dose saves tons of space and time.

The format

The format is similar the 3ddose format, except that voxels with 0 dose are omitted. The binary file is constructed as:

  • x_num_voxels, y_num_voxels, z_num_voxels (3 * sizeof(i32) bytes)
  • Voxel coordinates in x direction ((x_num_voxels + 1) * sizeof(f32) bytes)
  • Voxel coordinates in y direction ((y_num_voxels + 1) * sizeof(f32) bytes)
  • Voxel coordinates in z direction ((z_num_voxels + 1) * sizeof(f32) bytes)
  • Number of nonzero voxels (1 * sizeof(i32) bytes)
  • 0-indexed linearized voxel indices (num_nonzero * sizeof(i32) bytes)
  • Non-zero dose values (num_nonzero * sizeof(f32) bytes)
  • Uncert values for non-zero voxels (num_nonzero * sizeof(f32) bytes)

The format was chosen to allow codes reading in the binary dose to avoid having to read in uncertainties if they're not needed. Dose/uncert values are stored as floats to save space since there's no accumulation of floating point errors to worry about as this is a storage format.

I added a flag to toggle between 3ddose/binary dose output in the input file. Default value is 3ddose and the input file is kept backwards compatible for people who don't specify the flag.

Downsides

  • The binary files aren't as portable as text files, though this is becoming less and less true as little-endian architecture seem to have won the endianness war?
  • To output the binary dose file efficiently, I added an array to hold indices of nonzero voxels in the write_dose routine, adds $MAXDOSE * sizeof(i32) bytes to RAM usage of DOSXYZnrc. This can be avoided but the routine to output the binary dose becomes less efficient (need to do more passes through the dose arrays).
  • Uses access='stream' in fortran to write C-style binary files. Apparently this was added with fortran2003 so there could be compatibility issues with older compilers.
  • May not really be needed by anyone else?

Let me know if there is interest to include this in the official distribution. I can fill in the gaps in the documentation and add a checkbox to the GUI. Otherwise I'll just keep this on my branch.

Cheers,
Marc

@randlet
Copy link
Contributor

randlet commented Jan 15, 2018

Looks pretty cool Marc! Did you investigate how well your 3ddose files compress? In egs_brachy we have an option to output gzipped 3ddose (and egsdat) files (via gzstream) instead of text. IIRC the gzipped files were roughly an order of magnitude smaller so not as impressive as your gains here. If you have lots of zero dose voxels though you may get better compression than our typical case.

@marenaud
Copy link
Contributor Author

marenaud commented Jan 15, 2018

Hey Randy!
Quick test with a 200x60x200 phantom (unrealistic situation, 2MeV electrons with a large phantom, just took the first input file I could find).

-rw------- 1 marc users 120M Jan 15 11:45 minidos_test.3ddose
-rw------- 1 marc users 343K Jan 15 11:47 minidos_test.3ddose.tar.gz
-rw------- 1 marc users  25K Jan 15 11:46 minidos_test.bindos
-rw------- 1 marc users  19K Jan 15 11:50 minidos_test.bindos.tar.gz

Not too bad! For my specific use case, I'd still want to use a binary file because everytime I do an optimisation run, I need to read in those 15000 dose files. As you can imagine, 3ddose files are extremely inefficient to parse compared to binary files.

edit: This is another test with realistic 6 MeV/20MeV 1x1 cm^2 beamlets on a 120x53x100 phantom (so, small end of the phantoms I deal with)

-rw------- 1 marc users  32M Jan 15 12:06 test_6MeV.3ddose
-rw------- 1 marc users 1.5M Jan 15 12:09 test_6MeV.3ddose.tar.gz
-rw------- 1 marc users 706K Jan 15 12:01 test_6MeV.bindos
-rw------- 1 marc users 506K Jan 15 12:09 test_6MeV.bindos.tar.gz

-rw------- 1 marc users  32M Jan 15 12:09 test_20MeV.3ddose
-rw------- 1 marc users 1.7M Jan 15 12:10 test_20MeV.3ddose.tar.gz
-rw------- 1 marc users 838K Jan 15 12:03 test_20MeV.bindos
-rw------- 1 marc users 600K Jan 15 12:10 test_20MeV.bindos.tar.gz

@randlet
Copy link
Contributor

randlet commented Jan 15, 2018

Thanks for posting your numbers! Your technique is still a pretty nice gain over just gzipping.

I wonder if you could make it more generally applicable by storing start and end indices (or block sizes) for blocks of non zero doses instead of storing the individual voxel indices directly Something like:

  1. x_num_voxels, y_num_voxels, z_num_voxels
  2. Voxel coordinates in x direction
  3. Voxel coordinates in y direction
  4. Voxel coordinates in z direction
    5) 0-indexed linearized voxel indices
  5. List of pairs of start and end indices (or size of block) for contiguous non zero block of doses
  6. Non-zero dose values
  7. Uncert values for non-zero voxels

Worst case for both methods (all voxels non zero for yours, every other voxel non zero for mine) stores Nvox indices so I guess it would be fairly dependent on how the zero doses are distributed throughout your dose array.

@rtownson
Copy link
Collaborator

I think this option for sparse binary output is long overdue! I've coded a similar solution for my own purposes before, due to the recombination of 3ddose files to be the slowest part of certain highly parallel simulations.

I think Randle's suggestion of storing pairs makes sense, since dose usually occurs in contiguous regions.

@marenaud
Copy link
Contributor Author

marenaud commented Jan 15, 2018

It would be interesting to profile the different methods vs their savings in space.

Doing blocks of indices would only save space on the indices, so you'd be optimising the 1/3 of the file size that can be optimised (ignoring the header). The cost would be an increase in complexity for writing/reading those files. Having 3 neat blocks means you can read the entire dose content of the file with 3 freads (or a single fread since they're all 4 bytes but then there's more work to do in the background). Since the index-pair format would implicitly optimise the format for storage over write/load performance, I'd be especially curious about the difference between a gzipped version of the flat indices array format vs the index-block format.

Another thought is that while dose does indeed occur in contiguous regions physically, it doesn't necessarily occur in contiguous memory regions. Linearized indices are only "optimised" for a beam incident normal to the xy plane (though my bias is showing there, brachy dose distributions would have different sparseness properties :P).

@marenaud
Copy link
Contributor Author

I'm sitting down and implementing the "RandyDos" file format now, will have some benchmarks soon!

@marenaud
Copy link
Contributor Author

marenaud commented Jan 16, 2018

OK so I implemented the randydos file format in another branch.

Here's some numbers for a monoenergetic 16 MeV electron beam incident on a the xy plane of a 128x128x128 phantom

105M Jan 15 21:12 dummy.3ddose
2.9M Jan 15 21:17 dummy.bindos
2.2M Jan 15 22:27 dummy.randydos

5.5M Jan 15 22:41 dummy.3ddose.tar.gz
2.0M Jan 15 22:41 dummy.bindos.tar.gz
1.8M Jan 15 22:41 dummy.randydos.tar.gz

randydos is 2.1% the file size of 3ddose, and bindos is 2.8%

I then made a python class which deserializes each of these files into the same format. I print some statistics for (basic) validation

Loading dummy.3ddose
Number of nonzero dose voxels: 249672
Average dose of nonzero dose voxels: 1.20855807419e-10
Average uncertainty of nonzero dose voxels: 0.20378229854

Loading dummy.bindos
Number of nonzero dose voxels: 249672
Average dose of nonzero dose voxels: 1.20855807421e-10
Average uncertainty of nonzero dose voxels: 0.203782298532

Loading dummy.randydos
Number of nonzero dose voxels: 249672
Average dose of nonzero dose voxels: 1.20855807421e-10
Average uncertainty of nonzero dose voxels: 0.203782298532

For the randydos, the number of index pairs was 33862, so 67724 ints to store instead of 249672, a nice reduction.

then I went into ipython to see their load times

In [5]: %timeit Dose.from_3ddose("dummy.3ddose")
1 loop, best of 3: 914 ms per loop

In [7]: %timeit Dose.from_bindos("dummy.bindos")
100 loops, best of 3: 7.07 ms per loop

In [8]: %timeit Dose.from_randydos("dummy.randydos")
10 loops, best of 3: 34.6 ms per loop

This is far from scientific though since I can't guarantee that my from_randydos method is the most efficient way to deserialize a randydos. I'm not used to reading binary files from Python either so who knows how else this could be done. See this gist for the code I wrote for each file type.

@randlet
Copy link
Contributor

randlet commented Jan 16, 2018

Nice work!

voxel_indices2 = []
for vstart, vend in zip(blocks[::2], blocks[1::2]):
    voxel_indices2.extend(range(vstart, vend))

voxel_indices3 = [vi for vstart, vend in zip(blocks[::2], blocks[1::2]) for vi in range(vstart, vend)]

Either of those solutions for getting your voxel indices would be a bit more idiomatic in Python (may or may not be quicker, vi3 is probably quickest) but you could also just fill your doses array rather than creating the intermediate voxel indices array:

# not tested :p

doses = np.zeros(num_voxels[0] * num_voxels[1] * num_voxels[2])
uncerts = np.zeros(num_voxels[0] * num_voxels[1] * num_voxels[2])

processed = 0
for block_start, block_end in zip(blocks[::2], blocks[1::2]):
    block_size = block_end - block_start
    doses[block_start:block_end] = nonzero_doses[processed:processed + block_size]
    uncerts[block_start:block_end] = nonzero_doses[processed:processed + block_size]
    processed += block_size

That said, you need to be cautious extrapolating the performance of Python to (Fortran|C|C++|etc)!

@marenaud
Copy link
Contributor Author

marenaud commented Jan 16, 2018

I'll test the performance, my logic was to use as much of numpy as possible over python loops because of the speed difference (only chose python because of how simple it is to profile), but that could be premature optimisation! Thanks for the snippets :D

edit:

Filling the dose arrays directly:

10 loops, best of 3: 28.9 ms per loop

vi3:

10 loops, best of 3: 33 ms per loop

@marenaud
Copy link
Contributor Author

marenaud commented Jan 16, 2018

C++11 version that I rapidly cooked up, don't mind the code quality :P

Loading dummy.3ddose
Time to load 3ddose: 439 ms

Loading dummy.bindos
Time to load bindos: 3 ms

Loading dummy.randydos
Time to load randydos: 3 ms

edit: forgot to compile with -O3

I think it more or less settles the question. We've only tested with one dose in the optimal orientation, but performance is identical so they can't be too far apart in the worst case. It's probably worth it to go with randydos, I'll amend the PR later today.

@randlet
Copy link
Contributor

randlet commented Jan 16, 2018

Thanks for running all the numbers...that'll be a nice addition to EGSnrc :)

@marenaud
Copy link
Contributor Author

marenaud commented Jan 16, 2018

I've updated the pull request and added some documentation / GUI input. I have absolutely no idea what I'm doing for the GUI, but I just copied whatever was there for ihowfarless and it seems to work nicely. Oh and I don't have all the packages to compile the .tex document so I don't know if it compiles after my changes.

Some things left to discuss would probably be:

  • Choice of 0-indexed indices over 1-indexed.
  • I kept num_nonzero in the format even though it's redundant as it can be calculated from the block sizes. I liked having it as a sanity check.

edit: not sure why CI is failing, would be nice to see the log but it's just blank.

Copy link
Collaborator

@rtownson rtownson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Works as expected, backwards compatible. Just needs some minor changes as noted.

@ftessier
Copy link
Member

Deferring this to Release 2022. This ought to be merged in develop as soon as possible after Release 2021. @marenaud, Are there any updates at your end: have you or others been using this without issue since implementation?

@ftessier ftessier modified the milestones: Release 2021, Release 2022 Mar 25, 2021
@marenaud marenaud requested a review from a team as a code owner April 15, 2021 13:28
@marenaud
Copy link
Contributor Author

Deferring this to Release 2022. This ought to be merged in develop as soon as possible after Release 2021. @marenaud, Are there any updates at your end: have you or others been using this without issue since implementation?

Sorry for the late reply. I've used it since implementation with no issues, but it's been a good amount of time since I've generated my last bindos file :(

@ftessier
Copy link
Member

Ok thanks for the update. We will go ahead and merge this in develop after the 2021 release tomorrow!

@ftessier
Copy link
Member

Polished up the commits, empty git diff with original branch.

@ftessier
Copy link
Member

Rebased on develop.

@ftessier
Copy link
Member

ftessier commented Jul 4, 2022

Rebased on develop for merging in release 2022.

@ftessier
Copy link
Member

ftessier commented Jul 4, 2022

@mainegra please review when you have a chance.

@ftessier
Copy link
Member

Added author Marc-Andre Renaud in files that were modified to implement the .bindos format. Rebased on develop.

Change the binary format to the one suggested by Randle Taylor, and add
information about the sparse matrix binary dose format in the DOSXYZnrc
documentation.
Add an option to select the dose output format in the gui. Just
emulating what was done for the howfarless option.
@ftessier
Copy link
Member

Add the contributor name in the same commit where the file is mofidied, and squashed the small commit to tweak the dosxyznrc gui menu item strings.

@ftessier ftessier merged commit ab8758b into nrc-cnrc:develop Jul 25, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants