Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Python vs Matlab speed differences #57

Closed
kamwal opened this issue May 28, 2020 · 10 comments
Closed

Python vs Matlab speed differences #57

kamwal opened this issue May 28, 2020 · 10 comments

Comments

@kamwal
Copy link
Contributor

kamwal commented May 28, 2020

I found relatively large differences (~40 seconds) in speed per profile in the update salinity mapping calulations. This potentially could be related with a capability of my computer or there is something that could be improved in Python code.
Did anyone have a simmilar Python code behaviour?

WMO num: 1901847
WMO boxes: wmo_boxes_ctd.mat
Refence data: CTD_for_DMQC_2019V01
Code evaliation times for Matlab and Python:
Matlab 1901847.docx
Python 1901847.docx

Config_file:

`% ===============================
%
% Objective Mapping Parameters
%

% max number of historical casts used in objective mapping
CONFIG_MAX_CASTS=300

% 1=use PV constraint, 0=don't use PV constraint, in objective mapping
MAP_USE_PV=0

% 1=use SAF separation criteria, 0=don't use SAF separation criteria, in objective mapping
MAP_USE_SAF=0

% spatial decorrelation scales, in degrees
MAPSCALE_LONGITUDE_LARGE=8
MAPSCALE_LONGITUDE_SMALL=4
MAPSCALE_LATITUDE_LARGE=4
MAPSCALE_LATITUDE_SMALL=2

% cross-isobath scales, dimensionless, see BS(2005)
MAPSCALE_PHI_LARGE=0.1
MAPSCALE_PHI_SMALL=0.02

% temporal decorrelation scale, in years
MAPSCALE_AGE=5
MAPSCALE_AGE_LARGE=20

% exclude the top xxx dbar of the water column
MAP_P_EXCLUDE=100

% only use historical data that are within +/- yyy dbar from float data
MAP_P_DELTA=200`

@cabanesc
Copy link

Hi Kamila,

I noticed the same, with longer run times with the python code. In the matlab version, most of the time is spent reading the climatology files (update_salinity_mapping stage). I guess it can be the same with the python code . Also, python code currently reads .mat files, which may not be optimal..??

@edsmall-bodc
Copy link
Collaborator

@cabanesc That's correct, this isn't optimal. We made this decision initially to make it super simple to compare the outputs from the matlab version. It's a simple change, and @gmaze and I have already started to engage in discussions about we we should use instead, perhaps even allowing users to choose what file formats to use.

Above that, I already have some places that may be acting as bottle necks in the back of my mind. This is going to be a wider investigation task, which @kamwal has already labelled it as. When we have completed the investigation, we will make separate tasks for each of the areas that are a problem and then create fixes and changes.

For now, what I would recommend is that UATs use this thread to post floats and parameter settings that seem to be excessively slow, so that we have examples that we can reproduce and test.

@edsmall-bodc
Copy link
Collaborator

I've been doing some general experimentation, and I think a lot of the slowdown is most likely coming from the use of the stack and append functions for forming arrays.

Most of the time when we are generating a large array, we know how big the final array is going to be. Because of this, we can create an NxM array in memory and fill it with random values, and then we can change these values to the correct ones as we go through the array. This means that we only have one memory allocation step, and it's done right at the beginning.

However, sometimes we don't know how large our final array is going to be, so instead we have to append columns or rows as we go through our analysis. This is much slower because every time we append to an array what we actually do is allocate a whole heap of memory for a slightly larger array, put the old data and the new data there, and then delete the old array. The bigger the array, the more time consuming this is, which is why we are seeing slowdown for analyses that have a lot of profiles (as the matrices slowly grow larger over time.

To add rows/columns to a matrix I have been using the "stack" and "append" numpy function (or variants of it, such as vstack or hstack, as you can see here and append here).

Basically, appending a matrix is slow, but pre-allocating memory is fast. There must be better ways to do this, by either:

a) finding out ahead of time how large our final matrix will be, or
b) Using better functionality from another library?

@gmaze, Do you have any recommendations on functions or techniques that could provide a speed up here?

It's possible after investigating this further that we will find that we can actually preallocate the memory, but would have to look a little deeper. For example, if a user highlights that they want 300 historical climatology points, we know that the mapped salinity matrix will be Nx300. Is there a way we can find out N right from the get go? Perhaps after the find_besthist stage?

Any comments are, of course, welcome!

@edsmall-bodc
Copy link
Collaborator

edsmall-bodc commented Jun 4, 2020

As an example, we have to continually add NaNs to all of the climatological so that each column is the same length. That can be seen here. I can't think of a way around that for this specific example, but just wanted to show what I mean.

I'm planning on using a python profiler to analyse the code at the end of this month (or at the end of testing). This will reveal things like

  • How long each function takes to run
  • How many times a function is called

I think we might even be able to add it to the CI pipeline to make sure certain functions continue to operate at an acceptable speed.

@edsmall-bodc
Copy link
Collaborator

@gmaze I will post the results from the profiler on this thread, if that suits you? I have a couple of profiles that are particularly slow, so will run the same profiles in the matlab version so we can compare results.

The problem is in update salinity mapping. For all the testing so far, the calibration section has been fast.

@gmaze
Copy link
Member

gmaze commented Jul 9, 2020

I've been doing some general experimentation, and I think a lot of the slowdown is most likely coming from the use of the stack and append functions for forming arrays.

pre-allocating memory is fast

Yes, with the current loop design, we can't expect any significant changes before being able to pre-allocate array in memory.

However, we could also try some parallelization of the loop over profiles, gathering a bunch of arrays for each profile and then we would simply sort and concatenate/merge arrays at the end of the process

@apswong
Copy link

apswong commented Jul 9, 2020

For example, if a user highlights that they want 300 historical climatology points, we know that the mapped >salinity matrix will be Nx300. Is there a way we can find out N right from the get go? Perhaps after the >find_besthist stage?

If CONFIG_MAX_CASTS = n = number of historical profiles used in each mapping, then you'd need a matrix of m x n to put together la_grid_sal, etc, for input into the mapping routine. m = the maximum length amongst the n number of historical profiles. You can determine what m is in find_besthist, then pre-assign it in retr_region, instead of appending rows in a loop.

Not sure if this will help or not.

@kamwal
Copy link
Contributor Author

kamwal commented Aug 6, 2020

On of the approach to improve the speed of calulations is to replace current mechanism of running multiple floats in sequence by running multiple floats in parallel.
Our current test of speed comparison between both methods of calulations showed that in running 4 floats with around 150 profiles each the use of the parallel method of calulations of floats is about 1.4 x quicker that running floats in sequence.
Total elapset time of

  • 4 floats run in sequence: 33646.36 seconds

  • 4 floats run parallel: 21228.26 seconds

Moreover, an increase of the number of floats used in parrallel calultions will lead to increase of total time of calualtions compared with the previously run in sequence.

The second improvement in speed between Matlab and Python code will be the use of the pre allocation of memory during the calulations. This work in still ongoing.

@kamwal kamwal transferred this issue from euroargodev/User-Acceptance-Test-Python-version-of-the-OWC-tool Feb 11, 2021
@kamwal
Copy link
Contributor Author

kamwal commented Apr 28, 2021

The recent works on the updated salinity maping leading to improve the speed of pyowc code are wider descibed in #37

@kamwal
Copy link
Contributor Author

kamwal commented Oct 25, 2022

This ticket can be closed.
The speed of the Python code has been significantly improved.

@kamwal kamwal closed this as completed Oct 25, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants