Publishing on Pypi #23

gmaze · 2020-05-14T16:48:09Z

For standard users of the software, they should not need to clone the repository to be able to run it, but rather simply install it using pip or conda.
This implies 2 things:

a proper setup.py and packaging
a serious refactoring of the code to manage options and configuration parameters outside of the software code (to be honest, I'd like this to be my 1st contribution to the code)

This is also to let you known that when this will be ready to publish euroargodev python softwares on pypi, I created a euroargodev account at https://test.pypi.org/user/euroargodev/ and https://www.pypi.org/user/euroargodev/
(don't be fulled by the picture, this is retrieved automatically from my email but will change soon, so that it is the EA logo).

edsmall-bodc · 2020-05-15T08:43:54Z

The packaging should be relatively straight forward.

The refactor may be slightly more tricky, though your current codes to fetch argo data will be a big help.

Currently the user has to define many things for the OWC toolbox to run:

For the update salinity mapping, they need to define:

Which data set they want to use from each WMO box (argo/CTD/bottle data) [though we have discussed how the WMO boxes could be phased out!]
The float name they want to analyse
The maximum number of historical casts they want to use during analysis
Whether or not they would like to use potential vorticity
Whether or not they would like to use special constraints around the subantarctic front
The latitude and longitude scales used for decorrelation (4 values)
The cross isobath scales (2 values)
The age scales (2 values)
The exclusion depth
The age exclusion parameter

Many of these parameters have a default value, but I know operators change them regularly

That brings variables for salinity mapping to 15, which is somewhat tricky to manage in a usable way. Either we will need to group some of these variables together (into classes or dictionaries?), OR we can split the update salinity mapping routine up into separate routines that can be called?

For calculate piecewise fit (which I have not pushed to master yet, as I'm still working on some bugs) users need the output from the salinity mapping routine, plus:

Number of breaks in the fitting line
(Optional) where to place these breaks
(Optional) the boundaries for potential temperature (two values)
(Optional) The boundaries for pressure (two values)

You can see an example of the set up here: https://github.com/ArgoDMQC/matlab_owc/blob/master/matlab_codes/set_calseries.m

Teddyzander · 2020-05-15T10:13:17Z

Could we write a class that has setters and getters for these values, and then we can just pass in the class to an update_salinity_mapping routine? Or just have the salinity mapping routine be a method in the class, of course.

Class could contain an object containing all of the argo_profiles fetched from the argo_fetcher routine, variables for the update salinity mapping route, and the update salinity mapping method..?

gmaze · 2020-05-15T10:26:52Z

Discussion moves toward another topic here, I suggest to continue here euroargodev/User-Acceptance-Test-Python-version-of-the-OWC-tool#7

gmaze · 2020-07-06T11:00:44Z

Dear all,
In order to package the software correctly, we need to make a decision with regard to third party libraries used by py_owc, especially for plotting functionalities, to be included.
This means that when a new user will install py_owc, it will have to install other stuff as well. In same cases, users already have these (eg numpy), but in others, this may not be the case (eg geopandas).
The lesser "requirements" we have, the easier it is to install and debug for specific user working environments.

Solution 1

py_owc does the mathematics of calibration, users handle visualization.

In order to run and compute the calibration, we don't need to plot anything. py_owc still provide plotting features, but does not make it a requirement for installation.
So, third party libraries are limited to math stuff.
pros:

light weight
fast and simple
near 100% success of install for the vast majority of users

cons:

users have to manage additional software installation to use all py_owc features.

Solution 2

py_owc does the mathematics of calibration and shows users the results for decision making.

We consider that plotting is a key feature or step of the analysis.
Third party libraries for plotting are required for the software to run.
pros:

complete experience for users, all included

cons:

more complex software management and sustainability
more issues during initial install step for users

Please provide comments or choose which solution you promote 1 vs 2

apswong · 2020-07-06T11:29:10Z

The plots are essential. I don't understand the difference between (1) and (2). One way or another the user will have to produce the plots. Does (1) make the plots optional?

gmaze · 2020-07-06T12:35:44Z

The plots are essential. I don't understand the difference between (1) and (2).

When we want to distribute a python software, we need to explicitly define a list of third party libraries, supposedly required for the software to run. Solution 1 does not include plotting 3rd party lib., Solution 2 does.

One way or another the user will have to produce the plots.

Solution 1 makes sure that users will be able to do the OWC maths, we don't care how user will to plots
Solution 2 makes sure that users will be able to do the OWC maths and use py_owc plotting functions (even if users won't use it in practice).

Does (1) make the plots optional?

No, in both cases, py_owc includes plotting functions.

This is like a radio in a car, it's an optional feature.
We are the car manufacturer, we need decide if our own design of the radio will be optional or not.
If the radio is "optional", the buyer can decide to have it or not. But in this later case, buyer can still install its own radio system.
In any case, we manufacturer builds the car and design a radio to come with it.

We need to decide if plotting capabilities comes by default (solution 2) or if it's an optional feature (solution 1).

apswong · 2020-07-06T13:36:20Z

OK, I see - this is more of a software packaging dilemma. (1) is the more modular approach, which may be better for software management.

gmaze · 2020-07-06T14:28:30Z

While investigating #3 , I have discovered that pip does not use underscores in package names (they are discouraged in PEP8).

So, even if we work with py_owc internally, once published, users will have to install it with py-owc:

pip install py-owc

I dislike this idea and would prefer to work everywhere with the same name.

So, new proposition, what about a simple software named:

pyowc

I think we still do the link with matlab_owc, and yet a fully embrace the python way.

gmaze · 2020-07-07T07:34:10Z

ok, let's go everywhere (but not repo name) with pyowc

edsmall-bodc · 2020-07-07T12:32:38Z

@apswong @gmaze I understand the conflict here - the plotting section is not needed to create the mapped salinity, nor is it needed to calibrate the salinity of floats.

There are two crucial points to make though:

The plots are needed for our DMQC operators to make well informed decisions about whether or not to accept the calibration. Without the plots, no decision can be reached, and therefore the outputs are somewhat useless.
One of the things we have pushed for with this is to try and create a CONSISTENT and UNIFIED approach to the decision making process, and that includes documentation. Part of this plan is that plot outputs from DMQC operators should be consistent with each other in an attempt to ensure a more consistent analysis. Therefore, we should not be implying to the operators that it is up to them to how things are displayed. We also want our operators to share their possible improvements (via github issues/pull requests), and that includes the plotting routines. I worry that, if we make the plotting routine seem optional, when they make meaningful changes it won't be shared with the rest of the community.

I realise that this could make pyowc a little more clunky to install, and more challenging to maintain, but it seems to me that, if we want this consistency, then it really does need to be included.

Interested to hear what @matdon17 and @kamwal have to say here too.

gmaze · 2020-07-07T13:08:22Z

re-opening this issue because we didn't decided solution 1 vs 2

ps: yes @edsmall-bodc you're right, your point 2 is very good

matdon17 · 2020-07-07T18:28:27Z

What @edsmall-bodc has said reflects my position. I do see there being the potential for pyowc being a module in a wider ecosystem of Argo QC tools though, and could be seen as a module of that wider landscape. Although quite what the shape that might take is one for the future.

The only thing which has just occurred to me is whether we treat the trajectory plot alone as a module - as that is the one that seems to bring the most headaches with setup and dependencies (?). I am not convinced in either direction though.

edsmall-bodc · 2020-07-08T11:27:19Z

I like this idea as well - the trajectory plot is probably the least useful plot in terms of decision making (though, if I am wrong someone is free to correct me on this) and is probably the plot that causes the biggest headache. We could make the plot optional in installation, but provide some basic instructions/suggestions on how to plot trajectory.

There is also the fact that the trajectory plot is probably the plot most likely to be able to be used by other software that might have nothing to do with salinity calibration, so it would make sense to have a stand alone trajectory plotting routine that every argo tool could use..? Or we add it as a feature in argopy? If you are fetching argo data, it is likely you might want to plot the locations, perhaps?

gmaze · 2020-07-08T12:04:54Z

Plotting the trajectory is definitely a generic plot
We also have one in argopy that produces a plot like this:

see here: https://argopy.readthedocs.io/en/latest/visualisation.html#trajectories-from-an-index

this is clearly a very simple plot and yet that needs a lot of tuning to suit all possible floats.
I would keep it here as it is at this time.
We'll need to talk later about common visualization tools

edsmall-bodc · 2020-07-08T12:25:26Z

Something I think we should consider adding at some point is something that fetches topology data from somewhere like GEBCO for the area of interest. This would mean the plot would contain very accurate and detailed bathymetry and coastlines, which is probably important in complex regions, near sandbars and islands etc.

The topology data that I am currently using isn't very precise purely because storing all that data locally would take up a huge amount of space.

gmaze · 2020-07-08T12:28:02Z

Yes, I also had this in argopy :https://github.com/euroargodev/argopy/blob/6273ddef6c22c1676aec56d9689612dbfb8ab902/argopy/utilities.py#L451
for the ETOPO bathy
but this is not reliable

gmaze transferred this issue from euroargodev/User-Acceptance-Test-Python-version-of-the-OWC-tool Jul 6, 2020

gmaze mentioned this issue Jul 6, 2020

Repository and package name euroargodev/User-Acceptance-Test-Python-version-of-the-OWC-tool#17

Closed

gmaze closed this as completed Jul 7, 2020

gmaze mentioned this issue Jul 7, 2020

Managing calibration and other software options and parameters #29

Open

gmaze reopened this Jul 7, 2020

kamwal added Priority_high priority-high Top priority for next release and removed Priority_high labels Oct 25, 2022

gmaze pinned this issue Dec 8, 2022

gmaze added this to the 1st release on Pypi milestone Dec 8, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Publishing on Pypi #23

Publishing on Pypi #23

gmaze commented May 14, 2020

edsmall-bodc commented May 15, 2020

Teddyzander commented May 15, 2020

gmaze commented May 15, 2020

gmaze commented Jul 6, 2020 •

edited

Loading

apswong commented Jul 6, 2020

gmaze commented Jul 6, 2020

apswong commented Jul 6, 2020

gmaze commented Jul 6, 2020

gmaze commented Jul 7, 2020

edsmall-bodc commented Jul 7, 2020

gmaze commented Jul 7, 2020

matdon17 commented Jul 7, 2020

edsmall-bodc commented Jul 8, 2020

gmaze commented Jul 8, 2020

edsmall-bodc commented Jul 8, 2020

gmaze commented Jul 8, 2020

Publishing on Pypi #23

Publishing on Pypi #23

Comments

gmaze commented May 14, 2020

edsmall-bodc commented May 15, 2020

Teddyzander commented May 15, 2020

gmaze commented May 15, 2020

gmaze commented Jul 6, 2020 • edited Loading

Solution 1

Solution 2

apswong commented Jul 6, 2020

gmaze commented Jul 6, 2020

apswong commented Jul 6, 2020

gmaze commented Jul 6, 2020

pyowc

gmaze commented Jul 7, 2020

edsmall-bodc commented Jul 7, 2020

gmaze commented Jul 7, 2020

matdon17 commented Jul 7, 2020

edsmall-bodc commented Jul 8, 2020

gmaze commented Jul 8, 2020

edsmall-bodc commented Jul 8, 2020

gmaze commented Jul 8, 2020

gmaze commented Jul 6, 2020 •

edited

Loading