Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

For a script that depends on google-cloud re-builds are taking 70s #38

Open
macrael opened this issue Jun 28, 2017 · 10 comments
Open

For a script that depends on google-cloud re-builds are taking 70s #38

macrael opened this issue Jun 28, 2017 · 10 comments

Comments

@macrael
Copy link

macrael commented Jun 28, 2017

I don't know if the pex rule is re-downloading it every time or what but it makes development pretty impossible. Is this expected? Shouldn't it be cached in some way?

here's what my rule looks like:

pex_binary(
    name = "update-dns",
    main = "update_dns.py",
    reqs = [
        "enum34==1.1.6",
        "google-cloud==0.26.0"
    ],
    zip_safe = False,
    deps = [":update-dns-lib"],
)

py_library(
    name = "update-dns-lib",
    srcs = glob(["update_dns.py"]),
)

If I make a change to update_dns.py and then re-run the script with bazel it takes 70 seconds or so. Reruns without any changes to the file are quick.

@macrael
Copy link
Author

macrael commented Jul 5, 2017

Any thoughts on this? Could it be related to building the binary itself, not caching the library? Is there a way I can figure out what is happening for those 70 seconds?

@Victorystick
Copy link

Take a look at bazelbuild/rules_python#1. They just added support for pip dependencies, which are cached making rebuilds pretty snappy. Perhaps you could try on that to speed up builds? I think you'd need some changes to this repo for that to work though. Haven't quite yet sorted it out myself. 😑

@benley
Copy link
Owner

benley commented Sep 18, 2017

oh shoot, I should have responded to this months ago. The issue you're seeing with rules_pex is that pex's own caching is disabled, because it's incompatible with bazel's concurrency. That is, if bazel happens to run two instances of pex at the same time, pex's cache ends up corrupted and builds will become inconsistent.

If you are able to use egg or wheel dependencies instead of having pex resolve requirements from pypi, it should be much faster.

@macrael
Copy link
Author

macrael commented Sep 26, 2017 via email

@macrael
Copy link
Author

macrael commented Sep 27, 2017

I spent today trying to make this work, but unfortunately it seems that pex doesn't have support for manylinux1 wheels,( pex-tool/pex#281 ) which is what most things seem to distribute on pypi. As far as I can tell that means that I can't make this work on my own.

For reference, I tried to get it so that I could import grpc. See the wheels here: https://pypi.python.org/pypi/grpcio/1.6.0 the only available wheels are manylinux1's.

Before I realized that this might be the problem, I got this far:

WORKSPACE

http_file(
    name = "pypi_grpcio",
    urls = ["https://pypi.python.org/packages/d0/67/cccd0e58d169cc7077425b296056b553acee7a8fe45ad8e52dce2fe66ab7/grpcio-1.6.0-cp35-cp35m-manylinux1_x86_64.whl"],
)

... repeat for setup tools, protobuf, and six

BUILD

pex_binary(
    name = "smoketest",
    interpreter = "/usr/bin/python3",
    main = "smoketest.py",
    pex_use_wheels = True,
    eggs = [
        "@pypi_setuptools//file",
        "@pypi_six//file",
        "@pypi_protobuf//file",
        "@pypi_grpcio//file",
    ],
)

And the error I got when I run it:

Failed to execute PEX file, missing compatible dependencies for:
protobuf
grpcio

Does bazel use multiple threads by default? It's pretty hampering to have to rebuild all reqs from source on every build. Makes local development essentially impossible.

I also tried to use the new rules_python, but found it impossible to get them to use python3. Any pointers there would also be appreciated.

Bazel can be so frustrating!

Is it possible to create a different build rule that builds these, then import that so as to create my own cache?

@benley
Copy link
Owner

benley commented Sep 27, 2017

I think the way to do it is probably comparable to what rules_python does: have a rule for each external dependency that builds a pex containing just that dep (and maybe its transitive deps?), and add a way to roll several pex archives together into a final pex_binary that includes all of them. That way you would end up with a working cache. Sometime last year I spent a few hours trying to hack that together, but I couldn't figure out how to combine pexes in a sane way. If you know of a way to do that, please share :-)

@benley
Copy link
Owner

benley commented Sep 27, 2017

As for grpc, the best solution is likely to be grpc/grpc#8079 if they ever manage to implement it.

@evanj
Copy link
Contributor

evanj commented Dec 11, 2017

I have a disgusting hack to pex to make the manylinux1 wheels work. See my comment on pex-tool/pex#281 . With this, I have a very hacky tool that generates the WORKSPACE rules to depend on a set of requirements.txt dependencies. I'm hoping to release these tools as open source once I get the whole thing actually working. However, its also possible that the work on https://github.com/bazelbuild/rules_python will "catch up" and actually work for real applications and this may become unnecessary.

@macrael
Copy link
Author

macrael commented Dec 11, 2017

For now, (also waiting for rules_python to catch up and actually support dependencies and python3 at the same time) I've written a script that reads in a single requirements.txt file, creates a venv and builds all the dependencies locally which avoids using any of the cached manylinux1 wheels and makes wheels that work on the linux host we are targeting.

Then we upload all those built wheels to our google cloud storage and generate a bunch of entries for the WORKSPACE to reference them. Finally, it generates some entries for a BUILD file that just creates file groups for each of the top level dependencies that can be passed in as wheels to the pex_binary command. It is ugly but it works.

It could definitely stand to be turned into a proper library but I'm hesitant to do that when it seems like the canonical python rules are slowly getting to be usable.

@evanj
Copy link
Contributor

evanj commented Dec 11, 2017

Amazing this sounds basically identical to what I am doing, with the minor manylinux1 exception. So far it seems to be working, although the time to create pexes with lots of dependencies is understandably huge, so I may end up needing to make some changes to support PEX_PATH linking, at least for tests.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants