Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PIP dependency support #1

Merged
merged 14 commits into from
Sep 15, 2017
Merged

PIP dependency support #1

merged 14 commits into from
Sep 15, 2017

Conversation

mattmoor
Copy link
Contributor

@mattmoor mattmoor commented Sep 5, 2017

This is a proposal for some PIP rules that are heavily based on a prototype I wrote a month or two back. Very open to suggestions here, but it seems to work across a number of examples I've tried so far.

This is broken off of my prototype repository, and there are a handful of TODOs left to resolve.
Elaborate on why the TODO to rewrite {pip,whl}.sh as Python is harder than it looks.
Copy link

@damienmg damienmg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No big remarks, except adding the bazel ci configuration.

@@ -0,0 +1,39 @@
# Compiled Object files
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why have all those parts? emacs exclude belongs to the ~/.gitignore, all the bin files should not be generated by bazel

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just my standard .gitignore file I copy from repo to repo. I never touch it again.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Repo .gitignore are supposed to be repo specific, so not includes things like *~...

Copy link

@duggelz duggelz Sep 8, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I advice I got, didn't like, but eventually learned to appreciate the wisdom of, is to create a ~/.gitignore_global with all the editor autosave filenames, etc (things that are particular to me, not to the repo I'm working on), and do git config --global core.excludesfile ~/.gitignore_global

Otherwise, you're like "What part of this Python program creates Fortran executables?!"

@@ -0,0 +1,27 @@
sudo: required
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add .ci/rules_python.json for CI. see https://github.com/bazelbuild/continuous-integration/blob/master/docs/owner.md#customizing-a-project

Seeing the build, you probably just want:

[
    {
          "variation": "",
           "configurations": [
                {"node": "linux-x86_64"},
                {"node": "ubuntu_16.04-x86_64"},
                {"node": "darwin-x86_64"}
           ]
    }
]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

README.md Outdated
which can be used directly in `deps=[]` to reference an imported `py_library`.

```python
load("@my_deps//:requirements.txt", "packages")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You probably meant requirements.bzl here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not updated? Maybe you just want to delete that part all in profit of the skydoc generated.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ha, I think this bug was in two places. thanks for catching it again :)

python/pip.sh Outdated
@@ -0,0 +1,84 @@
#!/bin/bash -e
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

General comment for this file: the file already complex, why not in python instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See line 17. I was halfway through rewriting one of these scripts in Python before I realized: I can't invoke a py_binary during WORKSPACE, I'd have to publish a PAR file. I immediately git stashed and rewrote my TODO instead. :(

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:( I actually do things without py_binary for simple python_script (I know that is a shame) using simply python <script.py>. Anyway up to you.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose that is true. I want to be able to factor libraries and have a clean separation of unit tests etc. I'll experiment with my stashed changes with this in mind.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would say it's ok to use shell here, not Python. But you actually need Python inside the script to parse json, so yeah.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For historical reasons that are still sometimes relevant, don't put the -e in the shebang line, instead add set -euo pipefail around line 16

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only remaining shell script now does this (I copied it from you :D)

python/whl.sh Outdated
@@ -0,0 +1,82 @@
#!/bin/bash -e

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, bash can become quickly hard to read...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack

@mattmoor
Copy link
Contributor Author

mattmoor commented Sep 6, 2017

@duggelz Do you have any suggestions on the best way to download and invoke pip without using get-pip.py to install it as a system-wide dependency? This seems to be why the OSX build is failing.

@mattmoor
Copy link
Contributor Author

mattmoor commented Sep 6, 2017

@sebgoa You'd asked for this at one point, if you are still interested I'd love your feedback.

README.md Outdated
See Bazel core [documentation](https://docs.bazel.build/versions/master/be/python.html#py_test).

<a name="pip_import"></a>
## pip_import
Copy link

@duggelz duggelz Sep 8, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a thing called skydoc that will turn structured docstrings into Bazel documentation. However, it's kind of basic and a bit, well, not beautiful. https://google.github.io/subpar/subpar.html

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll TAL at skydoc for this repo, but in rules_docker and rules_k8s I'm blocked on this

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I've switched over to skydoc for the docs here (based on how google/subpar does it). Feedback welcome :)

@mattmoor mattmoor force-pushed the master branch 5 times, most recently from 76a4bcb to 91f1f9b Compare September 8, 2017 15:08
@mattmoor
Copy link
Contributor Author

mattmoor commented Sep 8, 2017

I moved whl.sh into a little Python script that I can invoke directly, but also reasonably unit test.

I have not moved pip.sh yet, and thinking about it left me with an idea. Considering:

  • Shelling out to pip from Python is a bit silly,
  • @damienmg's trick won't work if I have a pip library dependency,
  • I still haven't solved the pip fetching problem.

What if we build / publish a PAR (similar to the process rules_docker uses), which links the pip library, and removes the need to get-pip.py somehow? WDYT?

I still haven't found a clean process for doing this from a single repo (google/containerregistry and bazelbuild/rules_docker are separate repos), but that seems like a smaller problem to solve.

@mattmoor
Copy link
Contributor Author

I found this, which may be relevant to why certs aren't working properly from pip in a PAR file. It is similar to the trick we do for httplib2 in google/containerregistry tools.

This migrates the logic from pip.sh into piptool.py, which should improve portability by removing the bash dependency.

This also has the beginnings of wrapping piptool as a closed redistributable that doesn't rely on a system-installed copy of PIP, but instead uses these rules to pull pip into a PAR bundle.  Besides needing to work out the details of releasing and redistributing the PAR, we have two unresolved issues:
 * When bundled as a PAR (vs. py_binary), piptool seems to pick up the system-installed version of pip.
 * When bundled as a PAR, piptool sometimes sees cert issues resolving requirements (similar to what we see with httplib2).

def test_version(self):
parts = pip.__version__.split('.')
self.assertEqual(3, len(parts))
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

self.assertEqual(pip.__version__, '9.0.1')? You might be overthinking this :) There's no standard format for __version__, as you discovered, sometimes it's a tuple, sometimes a string, sometimes it doesn't exist.

# being invoked as a raw .py file. Once bundled, we should be able
# to remove this fallback on a stub implementation of Wheel.
try:
from python.whl import Wheel
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like you're really playing with fire calling your package python.

move python tools under a top-level rules_python package

simplify version_test
@mattmoor
Copy link
Contributor Author

mattmoor commented Sep 12, 2017

@duggelz I think that's a very good question, and I think it breaks down where you have:

A and B have a_install() and b_install() functions that are also called in c/WORKSPACE

As you are clearly driving at, we have 3 sets of requirements that are being merged after the constraints have been solved, which I think is a losing proposition.

I think that my expectation is that instead of a_install and b_install we would pip_import a blended requirements.txt from A, B, C, and just abc_install.

This story is only half-baked because:

  1. We don't have a rule for combining @A//:requirements.txt and @B//:requirements.txt (but I suspect this is pretty easy).
  2. A, B, and C may each references this collection of dependencies with different repository names (but I believe this is doable by aliasing repositories with bind). At one point, with this exact problem in mind, I debated removing users capacity to name the repository itself, which would effectively guarantee a single resolution step, but I thought this was a bit extreme (e.g. what if I want to write an invoked tool that has its own requirements.txt, that's ok).

I think the crux of this issue is that WORKSPACE currently serves two roles (I hint at this above):

  1. Resolving dependencies based on fuzzy constraints (incl. transitive resolution).
  2. Fetching dependencies based on concrete specification.

The analogs to this outside of Bazel are things like [frozen] requirements.txt or package[-lock].json.

I think the guidance should be for users not to mix independently resolved external dependencies, but I'm not 100% sure if/how we can enforce it. Perhaps we can do this through Bazel's constraint system, but I have not looked any further at how we might express this.

Is what I'm describing clear? Does my logic make sense?

@duggelz
Copy link

duggelz commented Sep 15, 2017

@mattmoor I agree with what you wrote. Some additional points:

  1. Bazel has a native.existing_rules that make fixing my points 1 and 2 easy.
  2. Looking at various repository rules, there is a nice omit_foo pattern that allows C to override A and B as desired.
  3. The conflict between "fuzzy constraints" and "concrete specification" is real. Python aims to address this with Pipfile vs Pipfile.lock, but meanwhile we muddle through.

I don't think we need to make a tool that tries to merge A and B automatically or anything. I think it's reasonable to say that C must provide a requirements.txt that satisfies every need that A and B have.

After reading some discussions about Java/Maven, I do think that we should put the names in a canonical format that we document and agree on. Specifically, I think we should have the literal prefix pypi_ on names that would otherwise be very short and ambigous. E.g. @pypi__package_name_here//:package_name or @pypi__package_name_here__version_here//:package_name

`whl_library` rules generated by `pip_import` are now named as:
```
   pypi__{distribution}_{version}
```

Substituting illegal characters (e.g. `-`, `.`) with underscores.
@mattmoor
Copy link
Contributor Author

@duggelz Given that things are hidden behind package(...) in the expected usage today, I think this is completely fine. The main change is that we wouldn't namespace packages based on the enclosing pip_import, which I think is fine.

I've gone with pypi__{distribution}_{version}, used existing_rules to avoid multiple imports, but each pip_import will still exposes the library through package() even if it didn't import it. I included a section in the README.md covering canonical naming.

Add a script for updating the tools and document this and the docs script.
@mattmoor
Copy link
Contributor Author

@duggelz Hooray, green CI :)

@mattmoor mattmoor changed the title [WIP] Proposal for PIP rules PIP dependency support Sep 15, 2017
@mattmoor mattmoor merged commit 7f4cc92 into bazelbuild:master Sep 15, 2017
scele referenced this pull request in scele/rules_python Mar 12, 2018
Make pip_import more verbose
mdelaney pushed a commit to tubular/rules_python that referenced this pull request Mar 11, 2019
dhalperi added a commit to dhalperi/rules_python that referenced this pull request Mar 2, 2020
This looks like it's left over from before the switch to buildkite.
The Travis tests are not run in precommit now, and I can't even tell
if they were run in PR bazelbuild#1.

This causes pre-commit errors in forks if folks have Travis
enabled in other projects.
@dhalperi dhalperi mentioned this pull request Mar 2, 2020
lberki pushed a commit that referenced this pull request Mar 3, 2020
This looks like it's left over from before the switch to buildkite.
The Travis tests are not run in precommit now, and I can't even tell
if they were run in PR #1.

This causes pre-commit errors in forks if folks have Travis
enabled in other projects.
alexeagle pushed a commit to alexeagle/rules_python that referenced this pull request Aug 19, 2020
meastham referenced this pull request in tecton-ai/rules_python Aug 14, 2021
Add extra_index_urls parameter to pip_parse
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants