Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove usage of the NumPy C API #41

Closed
vyasr opened this issue Apr 18, 2024 · 9 comments
Closed

Remove usage of the NumPy C API #41

vyasr opened this issue Apr 18, 2024 · 9 comments

Comments

@vyasr
Copy link
Contributor

vyasr commented Apr 18, 2024

RAPIDS currently makes use of the NumPy C API in a handful of places, generally in Cython code. The NumPy C API is generally quite good and has remained stable, making it easy to work with. However, it does introduce additional build and packaging complexity that would be nice to avoid. With minimal changes to RAPIDS code, we should be able to remove numpy as a build dependency entirely, which may simplify our builds and also saves us from needing to rebuild packages at all when numpy 2.0 is released. If we were getting a lot of value out of the C API the calculus might be different, but in practice our usage of it is very minimal and can generally be avoided. I propose that we expend a little bit of development effort to stop relying on the NumPy C API altogether. This will help us on two fronts: 1) we'll more easily be able to support multiple major versions of NumPy (see #38) since we only have to worry about Python compatibility, not C compatibility; and more importantly 2) we won't have to worry about NumPy C APIs when considering if we can use the Python limited API to produce a single package across Python versions (will open a separate issue for that next). The latter is the more important piece here, since as of this writing the numpy C API is not compatible with the Python limited API based on the author's current experimentation.

The changes required basically boil down to two things:

  1. cudf/cuspatial: cudf and cuspatial both use the C API transitively only, via pyarrow. There is no direct usage of the C API. Therefore, the cudf/cuspatial piece of this issue will be addressed when [FEA] Reduce arrow library dependencies in cudf cudf#15193 is completed.
  2. ucxx: ucxx uses the C API to expose host buffers to other APIs. This usage should be possible to remove by directly implementing the buffer protocol on a custom object. It will require a bit of extra work, but should be easy to maintain going forward.
@rgommers
Copy link

The latter is the more important piece here, since as of this writing the numpy C API is not compatible with the Python limited API based on the author's current experimentation.

This is correct, NumPy uses too much of the CPython C API to make supporting the limited API feasible any time soon.

@vyasr
Copy link
Contributor Author

vyasr commented Apr 19, 2024

Thanks for confirming @rgommers! RAPIDS would definitely benefit from producing abi3 wheels, and since we don't really deal with host data buffers there's not really much reason for us to need the numpy C API, so I think it's worth a small investment from us to remove numpy as a build dep.

@seberg
Copy link

seberg commented Apr 22, 2024

While it is correct that NumPy will keep using the full API, of course. That isn't a limitation on downstream. The NumPy headers use the limited API for only a few things and Matti added a test for that. Specifically:

  • Exposing the internals of bool and string scalars (because they subclass the non-limited Python classes).
  • Creating (working with is fine) new DTypes with the new API (because of bugs and limitations in Python, eventually this will go away).

Yes, limited API supported isn't tested or used really, but problems shouldn't be confined to macros, so since including the headers works (and is tested) I wouldn't expect further problems.

EDIT: I forgot to add that the necessary disabling of the above two bullets was missing from the headers before NumPy 2.0 meaning that you must compile with NumPy 2. But this shouldn't be a limitation in practice.


That doesn't mean I don't agree with the sentiment: I think a lot or even all of NumPy C-API use is probably simply unnecessary and fewer dependencies are great.

@vyasr
Copy link
Contributor Author

vyasr commented Apr 23, 2024

Thanks for clarifying that. Yeah, we only need to care about the numpy headers on our end. I'll update this thread based on how much effort I see removing usage of the NumPy C API taking us.

@vyasr
Copy link
Contributor Author

vyasr commented Apr 26, 2024

Here's the ucxx removal. Not sure I got everything right, but I don't think we need a whole lot more than what's in there.

rapids-bot bot pushed a commit to rapidsai/ucxx that referenced this issue May 13, 2024
#225)

Currently ucxx makes use of numpy's C API exactly once, in order to expose host buffers to Python. It does so by leveraging the fact that numpy arrays can be constructed directly from a pointer using the C API. However, we can produce essentially an equivalent result by using numpy's Python object on a representation of the same pointer in terms of a custom object that implements the buffer protocol. This adds about 20 lines of Cython code in exchange for dropping numpy as a build dependency entirely, which is a nice maintenance benefit (see also rapidsai/build-planning#41).

Authors:
  - Vyas Ramasubramani (https://github.com/vyasr)

Approvers:
  - Sebastian Berg (https://github.com/seberg)
  - Peter Andreas Entschev (https://github.com/pentschev)

URL: #225
@jakirkham
Copy link
Member

Given the UCXX piece has been addressed and the remaining work is in cuDF ( rapidsai/cudf#15193 ), should we go ahead and close this out?

@vyasr
Copy link
Contributor Author

vyasr commented Jul 15, 2024

Sure

@vyasr vyasr closed this as completed Jul 15, 2024
@jakirkham
Copy link
Member

There was some NumPy C API usage in cuCIM via pybind11

This has now been removed with PR: rapidsai/cucim#751

@vyasr
Copy link
Contributor Author

vyasr commented Aug 28, 2024

With rapidsai/cudf#16640 the cudf transitive usage via pyarrow has also been removed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants