Reducing memory errors in dict_to_netcdf #1095

erikvansebille · 2021-10-18T15:00:28Z

This PR is a rewrite of read_from_npy so that it has a smaller memory footprint.

Previously, an array was created with the total number of unique timesteps, which could become very large when lots of particles were deleted at different times. Excess rows and columns were later deleted. This has now been changed so that the initial data array is much smaller and less likely to get out of Memory errors

This PR is in parallel to #1092, which would still be needed on very large files

… files Previously, an array was created with the total number of unique timesteps, which could become very large when lots of particles were deleted at different times. Access rows and columns were later deleted. This has now been changed so that the initial data array is much smaller and less likely to get out of Memory errors

As not needed anymore in new `ParticleFile.read_from_npy()` implementation

As it's needed for NoteList ParticleSet

CKehl

The changes will create some headache for other running PRs, especially the ones concerning non-indexable particle sets.

That said: it does seem to work as you intend it to work, and I understand the motivation behind the change. So, yeah: fine with this implementation. Can be merged.

CKehl · 2021-10-22T16:53:26Z

parcels/collection/collectionaos.py

@@ -32,7 +32,9 @@ def _to_write_particles(pd, time):
    """We don't want to write a particle that is not started yet.
    Particle will be written if particle.time is between time-dt/2 and time+dt (/2)
    """
-    return [i for i, p in enumerate(pd) if time - np.abs(p.dt/2) <= p.time < time + np.abs(p.dt) and np.isfinite(p.id)]
+    return [i for i, p in enumerate(pd) if (((time - np.abs(p.dt/2) <= p.time < time + np.abs(p.dt))
+                                             or (np.isnan(p.dt) and np.equal(time, p.time)))


Why this part ? Where does that come from ? How could that condition occur ?

CKehl · 2021-10-22T16:53:59Z

parcels/collection/collectionsoa.py

-            & np.greater_equal(time + np.abs(pd['dt'] / 2), pd['time'], where=np.isfinite(pd['time']))
+    return ((np.less_equal(time - np.abs(pd['dt']/2), pd['time'], where=np.isfinite(pd['time']))
+             & np.greater_equal(time + np.abs(pd['dt'] / 2), pd['time'], where=np.isfinite(pd['time']))
+             | ((np.isnan(pd['dt'])) & np.equal(time, pd['time'], where=np.isfinite(pd['time']))))


Why this part ? Where does that come from ? How could that condition occur ?

This is because particle.dt is not set yet before a call to pset.execute; so previously we couldn't write particles before execution. Because of a bug in the unit tests (now fixed), this wasn't picked up

CKehl · 2021-10-22T16:57:01Z

parcels/particlefile/particlefileaos.py

-        data = np.nan * np.zeros((self.maxid_written+1, time_steps))
-        time_index = np.zeros(self.maxid_written+1, dtype=np.int64)
-        t_ind_used = np.zeros(time_steps, dtype=np.int64)
+        maxtime_steps = max(time_steps.values()) if time_steps.keys() else 0


I think here the variable name is misleading - is it really the max. 'timestep' or rather the max. 'time' itself ?

Good point, I've now renamed to n_timesteps and max_timesteps

CKehl · 2021-10-22T16:59:33Z

parcels/particlefile/particlefileaos.py

+        time_index = np.zeros(len(time_steps))
+        id_index = {}
+        count = 0
+        for i in sorted(time_steps.keys()):


sorting the field here can potentially be quite time-consuming ... especially with millions of elements. But if your approach requires it: ok.

Well, we only do this ParticleFile.export() once. If we want reproducibility of the output file (i.e. the particles IDs increase with row number), then it's important

k, I see that. Obviously, for ordered-collection psets, I would not do that sorting (as the collection is pre-sorted). That said, for AoS and SoA, one may need to do that trick indeed.

erikvansebille and others added 7 commits October 18, 2021 16:57

Fixing pep8 error

7f8bde8

Fixing _to_write_particles when dt=nan

5261c12

Removing stray debugging print statement

be176e6

Merge branch 'master' into better_dict_to_netcdf

fad2d38

Also implementing new ParticleFile creation for AOS

ae5bd09

Removing maxid_written variable

1d7a81b

As not needed anymore in new `ParticleFile.read_from_npy()` implementation

erikvansebille requested a review from CKehl October 19, 2021 12:59

Reverting removal of maxid_written

d6969eb

As it's needed for NoteList ParticleSet

nvogtvincent mentioned this pull request Oct 19, 2021

Chunkwise netcdf export #1092

Closed

Merge branch 'master' into better_dict_to_netcdf

cf45d21

CKehl approved these changes Oct 22, 2021

View reviewed changes

Renaming variables following @CKehl's reviewer comments

491d966

CKehl approved these changes Oct 25, 2021

View reviewed changes

erikvansebille merged commit c045b4b into master Oct 25, 2021

erikvansebille mentioned this pull request Dec 20, 2021

Integration of the double-linked node list ParticleSet structure to 'master' #1034

Merged

JamiePringle mentioned this pull request Apr 4, 2022

Details of *.npy files in temporary output #1159

Closed

erikvansebille deleted the better_dict_to_netcdf branch June 23, 2023 13:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reducing memory errors in dict_to_netcdf #1095

Reducing memory errors in dict_to_netcdf #1095

erikvansebille commented Oct 18, 2021

CKehl left a comment

CKehl Oct 22, 2021

CKehl Oct 22, 2021

erikvansebille Oct 25, 2021

CKehl Oct 22, 2021

erikvansebille Oct 25, 2021

CKehl Oct 22, 2021

erikvansebille Oct 25, 2021

CKehl Oct 25, 2021

Reducing memory errors in dict_to_netcdf #1095

Reducing memory errors in dict_to_netcdf #1095

Conversation

erikvansebille commented Oct 18, 2021

CKehl left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment