Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Publishing on Pypi #23

Open
gmaze opened this issue May 14, 2020 · 16 comments
Open

Publishing on Pypi #23

gmaze opened this issue May 14, 2020 · 16 comments
Labels
priority-high Top priority for next release

Comments

@gmaze
Copy link
Member

gmaze commented May 14, 2020

For standard users of the software, they should not need to clone the repository to be able to run it, but rather simply install it using pip or conda.
This implies 2 things:

  • a proper setup.py and packaging
  • a serious refactoring of the code to manage options and configuration parameters outside of the software code (to be honest, I'd like this to be my 1st contribution to the code)

This is also to let you known that when this will be ready to publish euroargodev python softwares on pypi, I created a euroargodev account at https://test.pypi.org/user/euroargodev/ and https://www.pypi.org/user/euroargodev/
(don't be fulled by the picture, this is retrieved automatically from my email but will change soon, so that it is the EA logo).

@edsmall-bodc
Copy link
Collaborator

The packaging should be relatively straight forward.

The refactor may be slightly more tricky, though your current codes to fetch argo data will be a big help.

Currently the user has to define many things for the OWC toolbox to run:

For the update salinity mapping, they need to define:

  • Which data set they want to use from each WMO box (argo/CTD/bottle data) [though we have discussed how the WMO boxes could be phased out!]
  • The float name they want to analyse
  • The maximum number of historical casts they want to use during analysis
  • Whether or not they would like to use potential vorticity
  • Whether or not they would like to use special constraints around the subantarctic front
  • The latitude and longitude scales used for decorrelation (4 values)
  • The cross isobath scales (2 values)
  • The age scales (2 values)
  • The exclusion depth
  • The age exclusion parameter

Many of these parameters have a default value, but I know operators change them regularly

That brings variables for salinity mapping to 15, which is somewhat tricky to manage in a usable way. Either we will need to group some of these variables together (into classes or dictionaries?), OR we can split the update salinity mapping routine up into separate routines that can be called?

For calculate piecewise fit (which I have not pushed to master yet, as I'm still working on some bugs) users need the output from the salinity mapping routine, plus:

  • Number of breaks in the fitting line
  • (Optional) where to place these breaks
  • (Optional) the boundaries for potential temperature (two values)
  • (Optional) The boundaries for pressure (two values)

You can see an example of the set up here: https://github.com/ArgoDMQC/matlab_owc/blob/master/matlab_codes/set_calseries.m

@Teddyzander
Copy link
Collaborator

Could we write a class that has setters and getters for these values, and then we can just pass in the class to an update_salinity_mapping routine? Or just have the salinity mapping routine be a method in the class, of course.

Class could contain an object containing all of the argo_profiles fetched from the argo_fetcher routine, variables for the update salinity mapping route, and the update salinity mapping method..?

@gmaze
Copy link
Member Author

gmaze commented May 15, 2020

Discussion moves toward another topic here, I suggest to continue here euroargodev/User-Acceptance-Test-Python-version-of-the-OWC-tool#7

@gmaze
Copy link
Member Author

gmaze commented Jul 6, 2020

Dear all,
In order to package the software correctly, we need to make a decision with regard to third party libraries used by py_owc, especially for plotting functionalities, to be included.
This means that when a new user will install py_owc, it will have to install other stuff as well. In same cases, users already have these (eg numpy), but in others, this may not be the case (eg geopandas).
The lesser "requirements" we have, the easier it is to install and debug for specific user working environments.

Solution 1

py_owc does the mathematics of calibration, users handle visualization.

In order to run and compute the calibration, we don't need to plot anything. py_owc still provide plotting features, but does not make it a requirement for installation.
So, third party libraries are limited to math stuff.
pros:

  • light weight
  • fast and simple
  • near 100% success of install for the vast majority of users

cons:

  • users have to manage additional software installation to use all py_owc features.

Solution 2

py_owc does the mathematics of calibration and shows users the results for decision making.

We consider that plotting is a key feature or step of the analysis.
Third party libraries for plotting are required for the software to run.
pros:

  • complete experience for users, all included

cons:

  • more complex software management and sustainability
  • more issues during initial install step for users

Please provide comments or choose which solution you promote 1 vs 2

@apswong
Copy link

apswong commented Jul 6, 2020

The plots are essential. I don't understand the difference between (1) and (2). One way or another the user will have to produce the plots. Does (1) make the plots optional?

@gmaze
Copy link
Member Author

gmaze commented Jul 6, 2020

The plots are essential. I don't understand the difference between (1) and (2).

When we want to distribute a python software, we need to explicitly define a list of third party libraries, supposedly required for the software to run. Solution 1 does not include plotting 3rd party lib., Solution 2 does.

One way or another the user will have to produce the plots.

  • Solution 1 makes sure that users will be able to do the OWC maths, we don't care how user will to plots
  • Solution 2 makes sure that users will be able to do the OWC maths and use py_owc plotting functions (even if users won't use it in practice).

Does (1) make the plots optional?

No, in both cases, py_owc includes plotting functions.

This is like a radio in a car, it's an optional feature.
We are the car manufacturer, we need decide if our own design of the radio will be optional or not.
If the radio is "optional", the buyer can decide to have it or not. But in this later case, buyer can still install its own radio system.
In any case, we manufacturer builds the car and design a radio to come with it.

We need to decide if plotting capabilities comes by default (solution 2) or if it's an optional feature (solution 1).

@apswong
Copy link

apswong commented Jul 6, 2020

OK, I see - this is more of a software packaging dilemma. (1) is the more modular approach, which may be better for software management.

@gmaze gmaze transferred this issue from euroargodev/User-Acceptance-Test-Python-version-of-the-OWC-tool Jul 6, 2020
@gmaze
Copy link
Member Author

gmaze commented Jul 6, 2020

While investigating #3 , I have discovered that pip does not use underscores in package names (they are discouraged in PEP8).

So, even if we work with py_owc internally, once published, users will have to install it with py-owc:

pip install py-owc

I dislike this idea and would prefer to work everywhere with the same name.

So, new proposition, what about a simple software named:

pyowc

I think we still do the link with matlab_owc, and yet a fully embrace the python way.

@gmaze
Copy link
Member Author

gmaze commented Jul 7, 2020

ok, let's go everywhere (but not repo name) with pyowc

@edsmall-bodc
Copy link
Collaborator

@apswong @gmaze I understand the conflict here - the plotting section is not needed to create the mapped salinity, nor is it needed to calibrate the salinity of floats.

There are two crucial points to make though:

  1. The plots are needed for our DMQC operators to make well informed decisions about whether or not to accept the calibration. Without the plots, no decision can be reached, and therefore the outputs are somewhat useless.

  2. One of the things we have pushed for with this is to try and create a CONSISTENT and UNIFIED approach to the decision making process, and that includes documentation. Part of this plan is that plot outputs from DMQC operators should be consistent with each other in an attempt to ensure a more consistent analysis. Therefore, we should not be implying to the operators that it is up to them to how things are displayed. We also want our operators to share their possible improvements (via github issues/pull requests), and that includes the plotting routines. I worry that, if we make the plotting routine seem optional, when they make meaningful changes it won't be shared with the rest of the community.

I realise that this could make pyowc a little more clunky to install, and more challenging to maintain, but it seems to me that, if we want this consistency, then it really does need to be included.

Interested to hear what @matdon17 and @kamwal have to say here too.

@gmaze
Copy link
Member Author

gmaze commented Jul 7, 2020

re-opening this issue because we didn't decided solution 1 vs 2

ps: yes @edsmall-bodc you're right, your point 2 is very good

@gmaze gmaze reopened this Jul 7, 2020
@matdon17
Copy link
Collaborator

matdon17 commented Jul 7, 2020

What @edsmall-bodc has said reflects my position. I do see there being the potential for pyowc being a module in a wider ecosystem of Argo QC tools though, and could be seen as a module of that wider landscape. Although quite what the shape that might take is one for the future.

The only thing which has just occurred to me is whether we treat the trajectory plot alone as a module - as that is the one that seems to bring the most headaches with setup and dependencies (?). I am not convinced in either direction though.

@edsmall-bodc
Copy link
Collaborator

I like this idea as well - the trajectory plot is probably the least useful plot in terms of decision making (though, if I am wrong someone is free to correct me on this) and is probably the plot that causes the biggest headache. We could make the plot optional in installation, but provide some basic instructions/suggestions on how to plot trajectory.

There is also the fact that the trajectory plot is probably the plot most likely to be able to be used by other software that might have nothing to do with salinity calibration, so it would make sense to have a stand alone trajectory plotting routine that every argo tool could use..? Or we add it as a feature in argopy? If you are fetching argo data, it is likely you might want to plot the locations, perhaps?

@gmaze
Copy link
Member Author

gmaze commented Jul 8, 2020

Plotting the trajectory is definitely a generic plot
We also have one in argopy that produces a plot like this:

see here: https://argopy.readthedocs.io/en/latest/visualisation.html#trajectories-from-an-index

this is clearly a very simple plot and yet that needs a lot of tuning to suit all possible floats.
I would keep it here as it is at this time.
We'll need to talk later about common visualization tools

@edsmall-bodc
Copy link
Collaborator

Something I think we should consider adding at some point is something that fetches topology data from somewhere like GEBCO for the area of interest. This would mean the plot would contain very accurate and detailed bathymetry and coastlines, which is probably important in complex regions, near sandbars and islands etc.

The topology data that I am currently using isn't very precise purely because storing all that data locally would take up a huge amount of space.

@gmaze
Copy link
Member Author

gmaze commented Jul 8, 2020

Yes, I also had this in argopy :https://github.com/euroargodev/argopy/blob/6273ddef6c22c1676aec56d9689612dbfb8ab902/argopy/utilities.py#L451
for the ETOPO bathy
but this is not reliable

@kamwal kamwal added Priority_high priority-high Top priority for next release and removed Priority_high labels Oct 25, 2022
@gmaze gmaze pinned this issue Dec 8, 2022
@gmaze gmaze added this to the 1st release on Pypi milestone Dec 8, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
priority-high Top priority for next release
Projects
None yet
Development

No branches or pull requests

6 participants