Need a method to build native libraries for pip imports #48

hwright · 2017-12-04T19:00:39Z

This is probably a long-term / aspirational wish, though I'd be happy to be shown otherwise.

Packages installed via pip often depend on native libraries. As a random example, https://pypi.python.org/pypi/cairocffi is a set of python bindings around the native cairo library. The bindings themselves don't include the native library, but they do require it to be installed on the system in a way which can be found by setuptools when compiling native code to build the relevant wheel.

It seems like requiring a system-installed library (which could change between builds) would violate hermeticity. We should instead provide a way to specify the dependent libraries required for a given pip import as Bazel targets, which can themselves be build.

This is likely related to bazelbuild/bazel#1475

The text was updated successfully, but these errors were encountered:

excavador · 2018-01-27T12:20:35Z

#61

duggelz · 2018-02-16T00:28:38Z

Some discussion here: https://groups.google.com/forum/#!topic/bazel-sig-python/ESnkuJ4udkc

1e100 · 2022-11-02T02:11:05Z

Another scenario I'm struggling with: building *.so's for PyTorch. PyTorch installed through rules_python works just fine. I can also compile programs for libtorch (installed using workspace declaration) and they work OK. What doesn't work is compiling PyTorch extensions and then using them directly with the Bazel-installed PyTorch - you get all sorts of undefined symbols errors.

If I use libtorch as a dep for the extension (which works fine for C++ code) it seems to not be able to find the libs of transitive deps unless I provide LD_LIBRARY_PATH separately. I did verify that the libs themselves are where you'd expect them, and ldd shows nothing as "missing".

If I set things up such that the libs that I'd normally get from libtorch are taken directly from PyTorch site-package/torch/lib, things work OK.

So I need a way to get at both the headers and *.so's that are bundled inside the PyTorch wheel installed by Bazel. Currently, to the best of my knowledge, these are not exposed. There should be an option to expose them though.

yijianli-autra · 2022-11-08T07:06:23Z

@1e100 I met the same issue, do you find any solution?

1e100 · 2022-11-08T07:58:19Z

I'm going to patch rules_python to optionally expose the include and lib subdirectories, globbing all *.so's in the latter case and all headers in the former. It'll be only for PyTorch though, super ugly. I'm not sure what a general purpose solution would look like here. Normally you don't want to spill the guts of a lib like that. You want to expose the bare minimum necessary.

1e100 · 2022-11-10T07:13:02Z

@yijianli-autra, after some amount of hacking I was able to accomplish what I wanted without patching, in part because rules_python supports injection of build rules into PIPs via annotations property on pip_parse. Basically, this option lets you inject whatever rules you like into whatever PIPs you like, and get to their internal files, headers and *.so's.

Here's what I did:

In WORKSPACE add something like below to your pip_parse declaration:

PIP_ANNOTATIONS = {
    "torch": package_annotation(
        additive_build_content = """\
cc_library(
    name = "libtorch",
    hdrs = glob([
        "site-packages/torch/include/torch/**/*.h",
        "site-packages/torch/include/c10/**/*.h",
        "site-packages/torch/include/ATen/**/*.h",
        "site-packages/torch/include/caffe2/**/*.h",
        ]),
    srcs = [
        "site-packages/torch/lib/libc10.so",
        "site-packages/torch/lib/libtorch.so",
        "site-packages/torch/lib/libtorch_cpu.so",
        "site-packages/torch/lib/libcaffe2_nvrtc.so",
    ],
    includes = [
        "site-packages/torch/include",
        "site-packages/torch/include/torch/csrc/api/include"
    ],
    deps = [
        "@pybind11",
        "@local_config_cuda//cuda",
    ],
    linkopts = ["-ltorch", "-ltorch_cpu", "-lc10", "-lcaffe2_nvrtc"],
    copts = ["-D_GLIBCXX_USE_CXX11_ABI=0"],
)
""",
    ),
}

pip_parse(
    name = "pypi",
    annotations = PIP_ANNOTATIONS,
    requirements_lock = "//:requirements.txt",
)

load("@pypi//:requirements.bzl", "install_deps")

install_deps()

Some of the stuff in the rule may be unnecessary - this is a result of a lot of stumbling in the dark and trying to compile. CUDA is provided by TensorFlow's workspace rules, so hypothetically with some elbow grease this might also work for ROCm (AMD) as well.

In your library BUILD rule:

cc_library(    
    name = "cpu",    
    srcs = [    
        "dcn_v2_cpu.cpp",    
        "dcn_v2_im2col_cpu.cpp",    
        "dcn_v2_psroi_pooling_cpu.cpp",    
    ],    
    hdrs = [    
        "dcn_v2_im2col_cpu.h",    
        "vision.h",    
    ],      
    deps = ["@pypi_torch//:libtorch"],    
    copts = ["-D_GLIBCXX_USE_CXX11_ABI=0"],    
)

*.so's are binaries, not libraries, so it'll need a cc_binary rule as well:

cc_binary(     
    name = "dcn_v2.so",    
    srcs = [    
        "dcn_v2.h",          
        "vision.cpp",               
    ],                                     
    linkshared = True,    
    linkstatic = False,    
    linkopts = ["-Wl,-rpath,$$ORIGIN"],    
    deps = [           
        "//third_party/dcn_v2/src/cpu",    
        "@pypi_torch//:libtorch",         
    ],                                         
    copts = ["-D_GLIBCXX_USE_CXX11_ABI=0"],    
)

Again, some of this might not be necessary.

And then you have to load it into Python, so in your py_library rule you go:

py_library(    
    name = "dcn_v2",    
    srcs = glob(    
        [    
            "*.py",    
        ],    
        exclude = ["**/*_test.py"],    
    ),    
    deps = [    
        requirement("torch"),    
    ],    
    data = ["//third_party/dcn_v2/src:dcn_v2.so"],    
)

Obviously modulo your own paths and library names.

And finally in your Python library:

torch.ops.load_library("third_party/dcn_v2/src/dcn_v2.so")
dcn_v2 = torch.ops.dcn_v2

1e100 · 2022-11-10T07:15:45Z

Note that I haven't yet got this to work with the actual CUDA libs, but not because it's failing - I simply didn't get to it yet. CPU extension builds, loads, and passes the tests.

shahms · 2023-08-07T16:45:05Z

Is there any idea of how to accommodate this in a bzlmod world as well? Without the flat repository namespace, just adding additional BUILD files becomes less viable for sharing dependencies across package managers. Say, using a multi-language repository and sharing a common zlib dependency between PIP-installed Python and C++.

chrislovecnm · 2023-08-07T18:09:00Z

@shahms can you elaborate more?

shahms · 2023-08-07T19:38:22Z

I'm not sure which aspect you want elaboration on, but consider a repository which contains both C++ and Python and uses protocol buffers from both. There is a PIP protobuf package for the Python side, but this package needs to invoke protoc (written in C++) to generate the Python code and this binary must generally be kept in sync with the C++ protobuf library in use.

There are a few different (conceptual) ways of stitching this together, but they all need coordination between rules_python, pip and Bazel. When bzlmod is involved you can't easily just drop a BUILD file into the pip repository using additive_build_content_file due to the repo mapping involved, since that pip repository won't have access to the "main" repositories. You might be able to get away with it by hard-coding the "canonical" repo names in the overlayed BUILD file, but that's incredibly fragile and fiddly. There's also the issue of ensuring that pip uses that dependency rather than trying to fetch it itself.

meteorcloudy · 2023-08-08T06:46:25Z

Is it possible to use toolchain resolution to solve this problem? @fmeum @Wyverald

fmeum · 2023-08-08T07:23:41Z

@meteorcloudy I think that relying on toolchain resolution for this would be overkill. The same patch that one would apply in a WORKSPACE setup should also work with Bzlmod.

For that, we may need to make a "reverse use_repo", let's call it make_visible_to, available in MODULE.bazel that can be used to inject a repo mapping entry into an extension (limited to the root module's MODULE.bazel file for non-isolated extension usages to prevent name clashes).

@tyler-french recently approached me with a similar situation: He needed to patch a Go repository managed by the go_deps extension to add a dependency on Tink. If the root module adds a dep on Tink via bazel_dep(name = "tink", ...), then make_visible_to(go_deps, com_github_google_tink = "tink") could make this repository visible to repositories generated by go_deps as com_github_google_tink.

meteorcloudy · 2023-08-08T08:25:08Z

For that, we may need to make a "reverse use_repo", let's call it make_visible_to

Hmm, I'm not sure how difficult it is to implement this. Can we somehow just pass a resolved canonical label through module extension tags?

fmeum · 2023-08-08T08:38:02Z

We can pass a canonical label to an arbitrary target visible from the root module into the extension via tags, but that has some serious usability problems:

Users would need to make up a target to pass in the repo name (go_deps.additional_repo(name = "com_github_google_tink", label = Label("@tink//:does_not_need_to_exist)).
From within the content of a patch or BUILD file, it isn't clear how to reference that label. Textual replacement of @com_github_google_tink with "@@" + tink_label.workspace_name is an option, but seems fragile.
Every extension with the ability to patch or add BUILD file contents would need to build in this functionality, which could result in fragmentation.

shahms · 2023-08-25T17:57:45Z

@fmeum I ran into a similar situation with Go as well, specifically with https://github.com/google/brotli/tree/master/go where the root module depends on the C brotli library. In that case it was easy enough to patch the Go library BUILD file to use the canonical name from the root module as that dependency wasn't using modules yet.

Something like make_visible_to seems like a potential avenue here.

I think the fundamental issue is around bridging Bazel-managed module namespaces and pip, go, cargo, etc. While the rules in question will need some way of consuming those dependencies from Bazel, Bazel will need to provide a mechanism of making them available. But that all assumes (so far) that the root module already has a direct dependency on that shared dependency which it can make available.

It seems like this is bazelbuild/bazel#19301

hwright mentioned this issue Dec 15, 2017

Handle python packages that have shared object files google/subpar#55

Closed

benley mentioned this issue Dec 18, 2017

Need way to specify target platform benley/bazel_rules_pex#58

Open

duggelz added the type: feature request label Feb 16, 2018

duggelz assigned lberki and davidstanke Apr 17, 2018

brandjon added the untriaged label Nov 9, 2018

alexeagle pushed a commit to alexeagle/rules_python that referenced this issue Aug 19, 2020

typo: Add missing 'v' to url (bazelbuild#48)

8b3f2d5

chrislovecnm mentioned this issue Aug 7, 2023

Need help on installing psycopg2 library #1348

Closed

davidstanke removed their assignment Aug 8, 2023

fmeum mentioned this issue Aug 27, 2023

Bzlmod: allow repos to be visible to other module extensions bazelbuild/bazel#19301

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Need a method to build native libraries for pip imports #48

Need a method to build native libraries for pip imports #48

hwright commented Dec 4, 2017

excavador commented Jan 27, 2018

duggelz commented Feb 16, 2018

1e100 commented Nov 2, 2022 •

edited

Loading

yijianli-autra commented Nov 8, 2022

1e100 commented Nov 8, 2022

1e100 commented Nov 10, 2022 •

edited

Loading

1e100 commented Nov 10, 2022

shahms commented Aug 7, 2023

chrislovecnm commented Aug 7, 2023

shahms commented Aug 7, 2023

meteorcloudy commented Aug 8, 2023

fmeum commented Aug 8, 2023 •

edited

Loading

meteorcloudy commented Aug 8, 2023

fmeum commented Aug 8, 2023

shahms commented Aug 25, 2023

Need a method to build native libraries for pip imports #48

Need a method to build native libraries for pip imports #48

Comments

hwright commented Dec 4, 2017

excavador commented Jan 27, 2018

duggelz commented Feb 16, 2018

1e100 commented Nov 2, 2022 • edited Loading

yijianli-autra commented Nov 8, 2022

1e100 commented Nov 8, 2022

1e100 commented Nov 10, 2022 • edited Loading

1e100 commented Nov 10, 2022

shahms commented Aug 7, 2023

chrislovecnm commented Aug 7, 2023

shahms commented Aug 7, 2023

meteorcloudy commented Aug 8, 2023

fmeum commented Aug 8, 2023 • edited Loading

meteorcloudy commented Aug 8, 2023

fmeum commented Aug 8, 2023

shahms commented Aug 25, 2023

1e100 commented Nov 2, 2022 •

edited

Loading

1e100 commented Nov 10, 2022 •

edited

Loading

fmeum commented Aug 8, 2023 •

edited

Loading