Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create r-py and py-r images (R-based images with Python and Julia, and Python-based images with R) #53

Open
Robinlovelace opened this issue Sep 18, 2024 · 16 comments · Fixed by #54 or #56

Comments

@Robinlovelace
Copy link
Contributor

No description provided.

@Robinlovelace
Copy link
Contributor Author

Planning to use the install_python.sh and install_julia.sh scripts from rocker: https://github.com/rocker-org/rocker-versioned2/tree/master/scripts

@Robinlovelace Robinlovelace changed the title Create r-py and py-r images (R-based images with Python and Python-based images with R) Create r-py and py-r images (R-based images with Python and Julia, and Python-based images with R) Sep 18, 2024
@martinfleis
Copy link

Also see this https://github.com/darribas/gds_env which has both Python and R spatial ecosystems.

@Robinlovelace
Copy link
Contributor Author

Looks good but thinking here is to have R-based image with Py (and Julia) and Py-based with R, building on the well-maintained rocker project. On the topic of maintenance, just opened an issue, was trying to run the example but couldn't find the flavour descriptions: darribas/gds_env#90

@Robinlovelace
Copy link
Contributor Author

@martinfleis
Copy link

The gds_env stacks are described at https://darribas.org/gds_env/stacks/. The great thing about that container is that it first pulls Python stack from conda-forge and then builds R stack from source against the same versions of GEOS, GDAL and PROJ. If you pull R stuff from CRAN and Python stuff from PyPI or elsewhere, you will likely end up with different versions of these dependencies.

@Robinlovelace
Copy link
Contributor Author

👍 to sharing deps where possible. Do you have info on gds image sizes?

@Robinlovelace Robinlovelace linked a pull request Sep 19, 2024 that will close this issue
@martinfleis
Copy link

gds_py is 1.14 GB when compressed, gds that also includes R is 3 GB.

@Robinlovelace
Copy link
Contributor Author

gds_py is 1.14 GB when compressed, gds that also includes R is 3 GB.

micromamba image here is 300 MB. I'm currently exploring pixi to install all deps efficiently #54

@mdsumner
Copy link

The gds_env stacks are described at https://darribas.org/gds_env/stacks/. The great thing about that container is that it first pulls Python stack from conda-forge and then builds R stack from source against the same versions of GEOS, GDAL and PROJ. If you pull R stuff from CRAN and Python stuff from PyPI or elsewhere, you will likely end up with different versions of these dependencies.

oh thank goodness, this is the first time I've heard this desired by anyone else. I have a docker image for R and Python aligned to daily build of GDAL, but my fu is not excellent and running into problems a few months later.

@Robinlovelace Robinlovelace reopened this Sep 20, 2024
@Robinlovelace
Copy link
Contributor Author

Not quite as done as I would have liked.

Daily GDAL is next level @mdsumner could you share examples? See #40 for lots of other examples.

@Robinlovelace
Copy link
Contributor Author

@mdsumner
Copy link

mdsumner commented Sep 20, 2024

my crufty dockerfiles basically take from rocker and from the ci builds done by GDAL itself, I wanted "layering" but the fact is I use my monolithic R and Python image every day now (there's an issue with numpy and sometimes pyproj but nothing stopping me from working). Also my python should use environments, but it's just another thing to learn as ever.

note that @cboettig pursued getting the GDAL builder images published here, but is not considered desirable, of course anyone can just go and do that, but while I've played around with multi-stage builds I'm certainly not very adept yet.

OSGeo/gdal#9824

The things I wanted that aren't provide by the otherwise excellent rocker is:

  • libs that are common across all Python and R packages (python and pak pull in binaries, sometimes private to each package)
  • fold in a particular bleeding edge library version (prior to geospatial step, probably)
  • wire up osgeo.gdal to actually work from a GDAL build
  • install a huge list of Python packages and R packages

With all the python packages I just had to pick through the right order to avoid anything bringing its own proj, geos, gdal. and I drew the line at compiling NetCDF/HDF5/HDF4 but consistency there isn't so important to me anyway.

But, I can't see how to do all that with and also keep rocker cleanly layered as it is now. But, I was advised during Posit conf that "why not take the higher level rocker, and then just clobber the libs and installs you want on that". I haven't tried that yet, I thought it would be "bad practice", but apparently not.

@Robinlovelace
Copy link
Contributor Author

Not to mention the GeoJulia stack although that's different because according to @evetion Julia packages will never link to system versions, each package downloading its own binaries apparently, which is good for reproducibility but perhaps not so good for image sizes and the ability to quickly test different versions of GDAL etc. Your list looks good, I agree Rocker is rock solid so will look to continue to build on that (continuing experiments with pixi and more), and keep open eyes on GDAL-based builds.

@mdsumner
Copy link

mdsumner commented Sep 20, 2024

fwiw, I probably don't need bleeding edge GDAL every day, and I see Pangeo docker is now at GDAL 3.9.0 which is excellent (and a bit surprising). I should probably tone it down and align to the latest release, if I need really latest GDAL it's not hard to build and I do that anyway because of pending PRs (it's a ten minute turnaround at most to check out one of Even Rouault's PRs and test a fix, as I found yesterday).

Just, a long-winded way of saying it's time I reviewed how I tap in here.

@Robinlovelace
Copy link
Contributor Author

Conda seems to have 3.9.2 as per reprex below:

docker run -it ghcr.io/geocompx/docker:pixi-r bash
root@5cac6168a483:/# R

R version 4.4.1 (2024-06-14) -- "Race for Your Life"
Copyright (C) 2024 The R Foundation for Statistical Computing
Platform: x86_64-conda-linux-gnu

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(sf)
Linking to GEOS 3.12.2, GDAL 3.9.2, PROJ 9.5.0; sf_use_s2() is TRUE

@Robinlovelace
Copy link
Contributor Author

See discussion here: prefix-dev/pixi#2088 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment