Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using pip without venv or conda env will stop working soon #670

Closed
yuvipanda opened this issue Jun 21, 2023 · 17 comments
Closed

Using pip without venv or conda env will stop working soon #670

yuvipanda opened this issue Jun 21, 2023 · 17 comments
Labels
pre-built images Related to pre-built images rocker scripts Related to rocker scripts

Comments

@yuvipanda
Copy link
Contributor

https://peps.python.org/pep-0668/ was adopted 2 years ago, and it's finally rolling around to distros downstream. The upshot is that just using pip with system python will no longer work. With the recently released ubuntu:23.04 (or debian:bookworm), if you try this:

FROM ubuntu:23.04
RUN apt-get update && apt-get -y install python3 python3-pip
RUN pip install flask

You'll get the following error:

 > [3/3] RUN pip install flask:                                                     #6 4.003 error: externally-managed-environment                                      #6 4.003                                                                            #6 4.003 × This environment is externally managed                                   #6 4.003 ╰─> To install Python packages system-wide, try apt install                #6 4.003     python3-xyz, where xyz is the package you are trying to
#6 4.003     install.
#6 4.003     
#6 4.003     If you wish to install a non-Debian-packaged Python package,
#6 4.003     create a virtual environment using python3 -m venv path/to/venv.
#6 4.003     Then use path/to/venv/bin/python and path/to/venv/bin/pip. Make
#6 4.003     sure you have python3-full installed.
#6 4.003     
#6 4.003     If you wish to install a non-Debian packaged Python application,
#6 4.003     it may be easiest to use pipx install xyz, which will manage a
#6 4.003     virtual environment for you. Make sure you have pipx installed.
#6 4.003     
#6 4.003     See /usr/share/doc/python3.11/README.venv for more information.
#6 4.003 
#6 4.003 note: If you believe this is a mistake, please contact your Python installation or OS distribution provider. You can override this, at the risk of breaking your Python installation or OS, by passing --break-system-packages.
#6 4.003 hint: See PEP 668 for the detailed specification.
------

https://pythonspeed.com/articles/externally-managed-environment-pep-668/ has more useful information.

The install_jupyter.sh script currently does this (

python3 -m pip install --no-cache-dir jupyter-rsession-proxy notebook jupyterlab
), and hence will stop working once we bump to the newer version of Ubuntu.

There are two options going forward to prevent breaking:

  1. Switch to using a venv, and install stuff inside that.
  2. Switch to getting python from conda, via mambaforge. This also has the additional benefit of supporting arbitrary python versions without needing to change base image. I <3 and have a lot of experience with apt, but have eventually accepted that for datascience images, this is the way to go.

Am happy to put some effort into this, whichever option y'all wanna go with!

@benz0li
Copy link
Contributor

benz0li commented Jun 21, 2023

https://peps.python.org/pep-0668/ was adopted 2 years ago, and it's finally rolling around to distros downstream. The upshot is that just using pip with system python will no longer work.

One may override this using --break-system-packages or set environment variable PIP_BREAK_SYSTEM_PACKAGES=1.

Cross-references:

@benz0li
Copy link
Contributor

benz0li commented Jun 21, 2023

That is not the recommended way, though. See python/cpython#102134 (comment) for alternatives.

@benz0li
Copy link
Contributor

benz0li commented Jun 21, 2023

  1. Switch to using a venv, and install stuff inside that.
  2. Switch to getting python from conda, via mambaforge. This also has the additional benefit of supporting arbitrary python versions without needing to change base image. I <3 and have a lot of experience with apt, but have eventually accepted that for datascience images, this is the way to go.
  1. Install and manage your own version of Python separate from the one bundled with your operating system.

@eitsupi
Copy link
Member

eitsupi commented Jun 21, 2023

Hi, thank you for opening this!

I ran into this in another repository (rocker-org/devcontainer-features#162) and it is a very headache.
I feel this error is excessive, as there is little need to worry about breaking the Python environment within a Docker container. (If it breaks, simply rebuild the container.)

The main problem I am aware of is that when you install Python with pyenv or similar, it is generally associated with a user. On the other hand, the images in this repository expect both root and non-root users to be able to run the same commands.......
I am reluctant to write and maintain long scripts to solve this.

So I vote to use PIP_BREAK_SYSTEM_PACKAGES.

@eitsupi eitsupi added rocker scripts Related to rocker scripts pre-built images Related to pre-built images labels Jun 21, 2023
@benz0li
Copy link
Contributor

benz0li commented Jun 21, 2023

(For many reasons) I prefer a separate Python installation from the one bundled with the OS.

That is why I am creating customised builds of the official Python Docker Hub Image: https://gitlab.b-data.ch/python/psi

Docker images: https://gitlab.b-data.ch/python/psi/container_registry

  • Multi-arch: linux/amd64, linux/arm64/v8
  • Python versions: latest, the last two older
  • Base images: Debian (slim): stable, oldstable; Ubuntu: current LTS, former LTS

Usage:

ARG BASE_IMAGE=ubuntu
ARG BASE_IMAGE_TAG=22.04

ARG PYTHON_VERSION=3.11.4

FROM glcr.b-data.ch/python/psi/${PYTHON_VERSION}/${BASE_IMAGE}:${BASE_IMAGE_TAG} as psi

FROM ${BASE_IMAGE}:${BASE_IMAGE_TAG}

ARG DEBIAN_FRONTEND=noninteractive

ARG BASE_IMAGE
ARG BASE_IMAGE_TAG

ARG PYTHON_VERSION

ENV BASE_IMAGE=${BASE_IMAGE}:${BASE_IMAGE_TAG} \
    PYTHON_VERSION=${PYTHON_VERSION}

COPY --from=psi /usr/local /usr/local

RUN apt-get update \
  ## Python: Runtime dependencies
  && apt-get -y install --no-install-recommends \
    ca-certificates \
    netbase \
    tzdata \
  ## Clean up
  && rm -rf /var/lib/apt/lists/*

@yuvipanda
Copy link
Contributor Author

The main problem I am aware of is that when you install Python with pyenv or similar, it is generally associated with a user. On the other hand, the images in this repository expect both root and non-root users to be able to run the same commands.......
I am reluctant to write and maintain long scripts to solve this.

This is true for pyenv, but pyenv is a completely different project solving completely different issues than a virtualenv. I agree pyenv should not be used here, as it adds unnecessary complexity for no positive use here.

A virtualenv ships built in with python (as part of the venv module), and virtualenvs can be owned by root without any issues - this is the method we're going to be taking in other Jupyter project based images. Smallest change from current status quo to that without using a flag that both Python and the distro recommend against using. I can provide a PR if interested.

@eitsupi
Copy link
Member

eitsupi commented Jun 21, 2023

Hmmm, I guess we expect to be able to use jupyter commands everywhere here, but is venv, which requires activation, a solution?

@yuvipanda
Copy link
Contributor Author

You don't need activation to use a venv, just setting PATH is enough. Here's a super simple example Dockerfile that helps:

FROM ubuntu:23.04
RUN apt-get update && apt-get -y install python3 python3-pip python3-venv

# Setup virtualenv in this path
ENV VIRTUAL_ENV=/opt/venv
# Put the bin/ dir of the venv before other items in path, so `python` refers to this
# No activation script is needed, just this PATH setting is enough
ENV PATH=${VIRTUAL_ENV}/bin:${PATH}

RUN python3 -m venv ${VIRTUAL_ENV}

RUN pip install jupyterlab

Note two things here:

  1. The last line's pip worked, and installed things into the venv correctly. No activation was aneeded
  2. When a container is run with this image, you can simply invoke jupyter lab or whatever command, and it just works. The fact it's in a venv is completely transparent to the end user. The venv is also owned by root, but can be owned by whatever user as you see fit.

The activate script is mostly a convenience method helpful when managing multiple virtual environments in a desktop setting. The only part of it that affects us is setting PATH: https://github.com/python/cpython/blob/c01da2896ab92ba7193bcd6ae56908c5c7277e75/Lib/venv/scripts/common/activate#L52.

@cboettig
Copy link
Member

@yuvipanda thanks very much for this. I agree with @eitsupi 's point that it feels weird to worry about clobbering system python in a container setting, but overall I think it makes sense to align python use in Rocker with idiomatic use in other 'data science container' ecosystems (including ones you maintain!).

Python development is clearly going this way, and we already see friction with tools like RStudio's reticulate which has a similar reticence about system python because of course it can't assume it's in a containerized environment.

Putting /opt/venv/bin on PATH before default PATH, rather than 'activating' the venv, seems clever but I worry about complications here. (among them, RStudio doesn't inherit system env vars by default, for reasons I've never completely understood, so we need to make sure this propagates to Renviron too). I think we'd also want to make /opt/venv user-write-able? Otherwise the default (single) user won't be able to install into the default venv.

I do like that you're using /opt/venv here because I guess we're old-school in rocker but we've always set up standard shared-library design, and we have users run our containers in single user, multi-user, and root-user modes for various use cases. That said, it seems some tools (looking at you,reticulate) like to put virtualenvs (and every other library, binary, etc) at the user home level instead. We might want to be setting the reticulate env var (is it WORKON_HOME? it's been a while) to /opt/venv then as well...

@yuvipanda
Copy link
Contributor Author

Thanks, @cboettig! I've currently just come to accept the benefits of using conda to get python, to fully decouple it from system python and get different versions. In my example on how to inherit from rocker images, I use mambaforge to get python from conda-forge (https://jupyterhub-image.guide/rocker.html#step-3-construct-your-dockerfile-to-add-python), and it has worked surprisingly well. Despite being a massive fan of apt, and wishing for less fragmentation in the packaing communtities, I do still recommend mamba+conda-forge as the 'right' solution for ease of maintenance of scientific python involved Docker images. See berkeley-dsep-infra/datahub#2934 (comment) for a detailed example, particularly involving geo based packages.

However, if you don't want to use conda-forge here, I'll happily implement the venv solution as well. LMK which way you'd like to go, and I'll provide a PR.

@benz0li
Copy link
Contributor

benz0li commented Jun 22, 2023

IMHO Python-wise, a separate Python installation is the cleanest solution.

Conda/Mamba is just another unnecessary dependency.

@eitsupi
Copy link
Member

eitsupi commented Jun 23, 2023

@yuvipanda Thank you for the detailed explanation!
However, I do not want to update PATH (Same as #538 (comment)).

Having said that, I actually do not use Python installed by this script, so I will leave the final decision to @cboettig.
(Again, my personal preference is to use PIP_BREAK_SYSTEM_PACKAGES=1.)

For users who want to use Python in earnest, I recommend that they install their favorite version of Python with micromamba or rye, not the one installed by this script......

@eddelbuettel
Copy link
Member

eddelbuettel commented Jun 23, 2023

I am really far from being a typical Python user (and not that much of Python user in the first place) but if I were to vote here it would be to continue with what has been done and rely on the package manager's python as much as we can ... while also avoid venv. And almost surely avoid Conda/Mamba. We start of Debian/Ubuntu (respectively) for a reason. And we build R solutions here that generally do not require that much Python so that the pain from mismatched versions etc hopefully is not too large. Then again I don't build Jupyter-based approaches so maybe it is all much worse than I imagine.

@yuvipanda
Copy link
Contributor Author

The only thing I would recommend against is PIP_BREAK_SYSTEM_PACKAGES :) I think the python community choose the word 'break' in there for a good reason! I'm happy to drop my suggestion for conda / mamba though. I think continuing to use the python from apt + venv is the least headache way to go.

For users who want to use Python in earnest, I recommend that they install their favorite version of Python with micromamba or rye, not the one installed by this script......

Yep that makes sense!

@cboettig
Copy link
Member

cboettig commented Jul 4, 2023

ok, shall we go ahead with apt-based install of python3-venv and a shared default venv of VIRTUAL_ENV=/opt/venv?

@eitsupi
Copy link
Member

eitsupi commented Jul 7, 2023

ok, shall we go ahead with apt-based install of python3-venv and a shared default venv of VIRTUAL_ENV=/opt/venv?

Does this mean that the stuff group will own it? (Perhaps it should.)

@eitsupi eitsupi mentioned this issue Oct 30, 2023
6 tasks
yuvipanda added a commit to yuvipanda/rocker-versioned2 that referenced this issue Nov 2, 2023
- Continues to use python and venv from Ubuntu LTS repositories,
  so they are supported as with everything else that is gotten
  from apt (see rocker-org#670 (comment))
- Doesn't currently change any permissions, so present behavior
  is preserved. However, in the future, we should probably change
  ownership so end users can install packages in there at runtime
  (see rocker-org#670 (comment))
- Sets the VIRTUAL_ENV environment variable to path of the venv
  we create. This is what the `source activate` script does,
  and Reticulate also looks for this to discover which
  python to use (see point 4 of https://rstudio.github.io/reticulate/articles/versions.html#order-of-discovery)
- Sets up PATH appropriately, so python and python3 refer to
  what is in our venv. This, along with the previous step, ensures
  same behavior as users typing `source ${VIRTUAL_ENV}/bin/activate`
  without actually having to do that, preserving end user behavioral
  semantics.
- Remove the explicit symlink of python3 -> python, as venv handles
  this automatically.

Decisions to be made:

- Where do we set the appropriate env variables (VIRTUAL_ENV and
  PATH)? They need to be set for `install_python.sh` to work
  correctly. I've set them in the binder image for now, but it
  should probably be set on a more base image. This is a no-op if
  `install-python.sh` is not called anywhere.

Ref rocker-org#670
yuvipanda added a commit to yuvipanda/rocker-versioned2 that referenced this issue Nov 2, 2023
- Continues to use python and venv from Ubuntu LTS repositories,
  so they are supported as with everything else that is gotten
  from apt (see rocker-org#670 (comment))
- Doesn't currently change any permissions, so present behavior
  is preserved. However, in the future, we should probably change
  ownership so end users can install packages in there at runtime
  (see rocker-org#670 (comment))
- Sets the VIRTUAL_ENV environment variable to path of the venv
  we create. This is what the `source activate` script does,
  and Reticulate also looks for this to discover which
  python to use (see point 4 of https://rstudio.github.io/reticulate/articles/versions.html#order-of-discovery)
- Sets up PATH appropriately, so python and python3 refer to
  what is in our venv. This, along with the previous step, ensures
  same behavior as users typing `source ${VIRTUAL_ENV}/bin/activate`
  without actually having to do that, preserving end user behavioral
  semantics.
- Remove the explicit symlink of python3 -> python, as venv handles
  this automatically.

Decisions to be made:

- Where do we set the appropriate env variables (VIRTUAL_ENV and
  PATH)? They need to be set for `install_python.sh` to work
  correctly. I've set them in the binder image for now, but it
  should probably be set on a more base image. This is a no-op if
  `install-python.sh` is not called anywhere.

Ref rocker-org#670
yuvipanda added a commit to yuvipanda/rocker-versioned2 that referenced this issue Nov 2, 2023
- Continues to use python and venv from Ubuntu LTS repositories,
  so they are supported as with everything else that is gotten
  from apt (see rocker-org#670 (comment))
- Doesn't currently change any permissions, so present behavior
  is preserved. However, in the future, we should probably change
  ownership so end users can install packages in there at runtime
  (see rocker-org#670 (comment))
- Sets the VIRTUAL_ENV environment variable to path of the venv
  we create. This is what the `source activate` script does,
  and Reticulate also looks for this to discover which
  python to use (see point 4 of https://rstudio.github.io/reticulate/articles/versions.html#order-of-discovery)
- Sets up PATH appropriately, so python and python3 refer to
  what is in our venv. This, along with the previous step, ensures
  same behavior as users typing `source ${VIRTUAL_ENV}/bin/activate`
  without actually having to do that, preserving end user behavioral
  semantics.
- Remove the explicit symlink of python3 -> python, as venv handles
  this automatically.

Decisions to be made:

- Where do we set the appropriate env variables (VIRTUAL_ENV and
  PATH)? They need to be set for `install_python.sh` to work
  correctly. I've set them in the binder image for now, but it
  should probably be set on a more base image. This is a no-op if
  `install-python.sh` is not called anywhere.

Ref rocker-org#670
yuvipanda added a commit to yuvipanda/rocker-versioned2 that referenced this issue Nov 6, 2023
- Continues to use python and venv from Ubuntu LTS repositories,
  so they are supported as with everything else that is gotten
  from apt (see rocker-org#670 (comment))
- Doesn't currently change any permissions, so present behavior
  is preserved. However, in the future, we should probably change
  ownership so end users can install packages in there at runtime
  (see rocker-org#670 (comment))
- Sets the VIRTUAL_ENV environment variable to path of the venv
  we create. This is what the `source activate` script does,
  and Reticulate also looks for this to discover which
  python to use (see point 4 of https://rstudio.github.io/reticulate/articles/versions.html#order-of-discovery)
- Sets up PATH appropriately, so python and python3 refer to
  what is in our venv. This, along with the previous step, ensures
  same behavior as users typing `source ${VIRTUAL_ENV}/bin/activate`
  without actually having to do that, preserving end user behavioral
  semantics.
- Remove the explicit symlink of python3 -> python, as venv handles
  this automatically.

Decisions to be made:

- Where do we set the appropriate env variables (VIRTUAL_ENV and
  PATH)? They need to be set for `install_python.sh` to work
  correctly. I've set them in the binder image for now, but it
  should probably be set on a more base image. This is a no-op if
  `install-python.sh` is not called anywhere.

Ref rocker-org#670
eitsupi pushed a commit that referenced this issue Dec 21, 2023
- Continues to use python and venv from Ubuntu LTS repositories, so they
are supported as with everything else that is gotten from apt (see
#670 (comment))
- Doesn't currently change any permissions, so present behavior is
preserved. However, in the future, we should probably change ownership
so end users can install packages in there at runtime (see
#670 (comment))
- Sets the VIRTUAL_ENV environment variable to path of the venv we
create. This is what the `source activate` script does, and Reticulate
also looks for this to discover which python to use (see point 4 of
https://rstudio.github.io/reticulate/articles/versions.html#order-of-discovery)
- Sets up PATH appropriately, so python and python3 refer to what is in
our venv. This, along with the previous step, ensures same behavior as
users typing `source ${VIRTUAL_ENV}/bin/activate` without actually
having to do that, preserving end user behavioral semantics. See
#670 (comment)
- RStudio is also told about new `PATH` and `VIRTUAL_ENV`, using the
same pattern as `install_texlive.sh`
- Remove the explicit symlink of python3 -> python, as venv handles this
automatically.
- `install_python.sh` now needs to be `source`d, following same pattern
as `install_texlive.sh`

Decisions to be made:

- Where do we set the appropriate env variables (VIRTUAL_ENV and PATH)?
They need to be set for `install_python.sh` to work correctly. I've set
them in the binder image for now, but it should probably be set on a
more base image. This is a no-op if `install-python.sh` is not called
anywhere.

TODO:
- [x] Update `NEWS` (can be done once everything else is finalized)

Ref #670

---------

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
@yuvipanda
Copy link
Contributor Author

I think this can now be closed, given #718 has been merged.

Thanks for the thoughtful review, everyone!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pre-built images Related to pre-built images rocker scripts Related to rocker scripts
Projects
None yet
Development

No branches or pull requests

5 participants