Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NumPy 2.0 support #38

Open
jakirkham opened this issue Apr 17, 2024 · 17 comments
Open

NumPy 2.0 support #38

jakirkham opened this issue Apr 17, 2024 · 17 comments
Assignees
Labels
Milestone

Comments

@jakirkham
Copy link
Member

jakirkham commented Apr 17, 2024

NumPy 2.0 is coming out soon ( numpy/numpy#24300 ). NumPy 2.0.0rc1 packages for conda & wheels came out 2 weeks back ( numpy/numpy#24300 (comment) )

Ecosystem support for NumPy 2.0 is being tracked in issue: numpy/numpy#26191

Also conda-forge is discussing how to support NumPy 2.0: conda-forge/conda-forge.github.io#1997

When building against NumPy 2.0, it is possible with default settings to build packages that are compatible with NumPy 1 & 2. Where NumPy will target the oldest NumPy version that was built for that Python version being targeted


Developed the following list by installing RAPIDS 24.04 and inspecting, which packages used NumPy. Specifically ran the commands below

conda install -n base conda-tree -y
conda create -n rapids-24.04 -c rapidsai -c conda-forge -c nvidia rapids=24.04 python=3.11 cuda-version=12.2 -y
conda tree -n rapids-24.04 whoneeds numpy

This generated a list of dependencies. Some of these were RAPIDS packages themselves. So removed those from the list. Also dropped some indirect dependencies of RAPIDS. Admittedly this can get a little subjective. Though tried to capture a sufficiently complete, though not overly detailed, picture


From this, have built the table below

Some versions have questions marks if...

  • The release is uncertain
  • There may be some lingering issues
  • It may have been fixed against an older NumPy 2, but hasn't been retested with RC1

Blank entries mean no information is known about those fields at this time

Package Supported Released Version Upstream issue/PR
Arrow Y Y 16.0.0 apache/arrow#39532
Bokeh Y Y 3.4.1 bokeh/bokeh#13835
Branca Y Y 0.7.2 python-visualization/branca#163
CuPy Y Y 13.2.0 cupy/cupy#8306
Dask Y Y 2024.5.1 dask/dask#11066
Datashader Y Y 0.16.2 holoviz/datashader#1324
folium Y Y? 0.16.0? python-visualization/folium#1937
GDAL Y Y 3.9.0 OSGeo/gdal#9751
HoloViews Y Y 1.19.0 holoviz/holoviews#5979 & holoviz/holoviews#6238
Hypothesis Y Y 6.100.2 HypothesisWorks/hypothesis#3955
imagecodecs Y Y 2024.6.1 cgohlke/imagecodecs#100
imageio Y Y 2.34.2 imageio/imageio#1077
mapclassify Y Y 2.6.1? pysal/mapclassify#188
Matplotlib Y Y 3.8.4 matplotlib/matplotlib#26778
Numba Y Y 0.60.0 numba/numba#9544
Pandas Y Y 2.2.2 pandas-dev/pandas#55519
PyTorch Y Y 2.3.0 pytorch/pytorch#107302
PyWavelets Y Y 1.6.0 PyWavelets/pywt#731
scikit-image Y Y 0.23.1 scikit-image/scikit-image#7282
scikit-learn Y Y 1.4.2 scikit-learn/scikit-learn#27075
SciPy Y Y 1.13.0 scipy/scipy#20375
Shapely Y Y 2.0.4? shapely/shapely#1972
TensorFlow tensorflow/tensorflow#67291
tifffile Y Y 2024.4.24 cgohlke/tifffile#252
treelite Y Y 4.2.1 dmlc/treelite#560
Xarray Y Y 2024.06.0 pydata/xarray#8844
XGBoost Y Y 2.1.0 dmlc/xgboost#10221

Note to editors: Also attaching the CSV file used to generate this table (as editing Markdown tables can be tricky 😅). Would suggest making any changes in the CSV file locally (with Excel or other). Then you can use prettytable (available on PyPI & Conda-forge) to generate Markdown with this code. The resulting content can be copy-pasted above. Can drag and drop the CSV file into this textbox to attach it

prettytable code:
import prettytable

with open("rapids_numpy_pkgs.csv", "r") as fh:
    t = prettytable.from_csv(fh, delimeter=",", lineterminator="\n")
    t.set_style(prettytable.MARKDOWN)
with open("rapids_numpy_pkgs.md", "w") as fh2:
    fh2.write(str(t))
@jameslamb
Copy link
Member

Thanks @jakirkham ! I think this is a great approach.

I looked through this list of dependencies today and can't think of any others or a different approach to identify them. And I checked the statuses of all the not-yet-released ones and don't see any changes.

@jakirkham
Copy link
Member Author

jakirkham commented May 16, 2024

Went through the project list again earlier today and also now

Main changes were the GDAL release went out

Also Numba RCs are available

Dask may work with NumPy 2, but needs reconfirmation

Added a better issue link for TensorFlow

Tried to also split apart when upstream has fixes (like Dask or XGBoost) from whether they are released. Hopefully that gives a bit more visibility into the state of NumPy 2 support

@rgommers
Copy link

rgommers commented Jun 2, 2024

dask 2024.5.1, datashader 0.16.2, imagecodecs 2024.6.1, and treelite 4.2.1 all have numpy 2.0-compatible releases out now.

@vyasr
Copy link
Contributor

vyasr commented Jun 3, 2024

Thanks Ralf!

@jakirkham
Copy link
Member Author

Thanks for the reminder! 🙏

Have refreshed the table above

@rgommers
Copy link

Numba 0.60.0, Xarray 2024.6.0, and CuPy 13.2.0 were all released. So looks like things are mostly good here (a few rough edges left).

@rgommers
Copy link

Imageio and XGBoost were both released as well.

@FirefoxMetzger
Copy link

You can cross ImageIO off the list ... we now support numpy v2.0 as of ImageIO v2.34.2

@jakirkham
Copy link
Member Author

Thanks all! 🙏

Have refreshed the list

Looks like we are down to TensorFlow. This is only needed in some cases. So think it makes sense to start doing this work at this point

@seberg
Copy link

seberg commented Jul 3, 2024

It seems hdbscan is a dependency cuml has. I will push them to fix that though, they probably just need to redo their wheel build (but instead just added a numpy<2 pin for now).
(Turns out there are some other issues, although they don't seem NumPy 2 related.)

xref scikit-learn-contrib/hdbscan#644 (fix, but remaining issue)
xref scikit-learn-contrib/hdbscan#642 (issue)

@vyasr
Copy link
Contributor

vyasr commented Jul 15, 2024

IIRC hdbscan is only a test dependency, JFYI.

rapids-bot bot pushed a commit to rapidsai/cudf that referenced this issue Jul 15, 2024
Splitting out the non API changes from gh-15897, the Scalar API change is required for the tests to pass with NumPy 2, but almost all changes should be relatively straight forward here on their own.

(I will add inline comments.)

---

This PR does not fix integer comparisons, there are currently no tests that run into these.

xref: rapidsai/build-planning#38

Authors:
  - Sebastian Berg (https://github.com/seberg)

Approvers:
  - Matthew Roeschke (https://github.com/mroeschke)

URL: #16141
rapids-bot bot pushed a commit to rapidsai/cudf that referenced this issue Jul 15, 2024
…#16140)

This aligns with NumPy, which deprecated this since a while and raises an error now on NumPy 2, for example for `Scalar(-1, dtype=np.uint8)`.

Since it aligns with NumPy, the DeprecationWarning of earlier NumPy versions is inherited for those.

This (or similar handling) is required to be compatible with NumPy 2/pandas, since the default needs to be to reject operation when values are out of bounds for e.g. `uint8_series + 1000`, the 1000 should not be silently cast to a `uint8`.

---

Split from gh-15897

xref: rapidsai/build-planning#38

Authors:
  - Sebastian Berg (https://github.com/seberg)

Approvers:
  - Matthew Roeschke (https://github.com/mroeschke)

URL: #16140
@jameslamb
Copy link
Member

Just making connections here... those hdbscan changes will also have the side benefit of replacing a pip install git+... (that ends up in every devcontainer using cuml) with wheels from PyPI 😁

rapidsai/cuml#5906 (comment)

rapids-bot bot pushed a commit to rapidsai/cuml that referenced this issue Jul 28, 2024
This applies some smaller NumPy 2 related fixes.  With (in progress) cupy 13.2 fixups, the single gpu test suite seems to be doing mostly fine.  There is a single test remaining:
```
test_simpl_set.py::test_simplicial_set_embedding
```
is failing with:
```
(Pdb) cp.asarray(cu_embedding)
array([[23067.518, 23067.518],
       [17334.559, 17334.559],
       [22713.598, 22713.598],
       ...,
       [23238.438, 23238.438],
       [25416.912, 25416.912],
       [19748.943, 19748.943]], dtype=float32)
```
being completely different from the reference:
```
array([[5.330462 , 4.3419437],
       [4.1822557, 5.6225405],
       [5.200859 , 4.530094 ],
       ...,
       [4.852359 , 5.0026293],
       [5.361374 , 4.1475334],
       [4.0259256, 5.7187223]], dtype=float32)
```
And I am not sure why that might be, I will prod it a bit more, but it may need someone who knows the methods to have a look.

One wrinkle is that hdbscan is not yet released for NumPy 2, but I guess that still required even though sklearn has a version?
(Probably, not a big issue, but my fixups scikit-learn-contrib/hdbscan#644 run into some issue even though it doesn't seem NumPy 2 related.)

xref: rapidsai/build-planning#38

Authors:
  - Sebastian Berg (https://github.com/seberg)
  - https://github.com/jakirkham
  - Dante Gama Dessavre (https://github.com/dantegd)

Approvers:
  - Kyle Edwards (https://github.com/KyleFromNVIDIA)
  - Dante Gama Dessavre (https://github.com/dantegd)

URL: #5954
@jakirkham
Copy link
Member Author

Regarding HDBSCAN, we updated to 0.8.38.post1 yesterday in PR: rapidsai/cuml#5906

This supports NumPy 2 and, as James pointed out, switches to installing prebuilt binaries

@seberg
Copy link

seberg commented Aug 19, 2024

With CuPy 13.3 on the way soon, we should be able to start unpinning NumPy again now. I have started this process with the following PRs. Note that these are draft for now, since many projects do require CuPy to pass.

However, we also will need a bit of time to unpin the dependency for things like rmm and ucx(x|-py) that do not have any issues with doing so, but are dependencies to all others:

@jameslamb
Copy link
Member

@jakirkham @seberg for all the libraries that were blocked waiting on cupy 13.3.0, do we want a !=13.2.0 on their cupy dependencies? To minimize the risk of users running into the issues described in rapidsai/ucx-py#1064 (comment)?

Or do you think that's too aggressive?

@jakirkham
Copy link
Member Author

jakirkham commented Aug 22, 2024

The 3 of us discussed James' question offline. We concluded

If a user simply installs CuPy, they will NumPy 2.1 + CuPy 13.3.0, which works

A user may not run into the issues we have seen with CuPy (and this is true for some RAPIDS libraries as well). For clarity, these are the 2 issues:

Also a user may be combining CuPy 13.2.0 with NumPy 1, which also works

So we are thinking to leave it as-is. That said, we can always update this based on user feedback

rapids-bot bot pushed a commit to rapidsai/cudf that referenced this issue Aug 24, 2024
Part of issue: rapidsai/build-planning#38

Start building `cudf` with `numpy` version `2.0`. This remains compatible with `numpy` version `1.x` and `2.x`. Allows us to test building with `numpy` version `2.0` (and make sure we catch any issues that show up). Also relaxes the `numpy` `1.x` pin. Pulls in the RDFG changes that are rolling out for broader RAPIDS NumPy 2 support.

Authors:
  - https://github.com/jakirkham
  - Sebastian Berg (https://github.com/seberg)

Approvers:
  - GALI PREM SAGAR (https://github.com/galipremsagar)
  - Ray Douglass (https://github.com/raydouglass)
  - James Lamb (https://github.com/jameslamb)

URL: #16300
@mmccarty mmccarty modified the milestones: RAPIDS v24.10, v24.10 Aug 27, 2024
@jameslamb
Copy link
Member

We're pretty close to being able to install all of RAPIDS alongside numpy >=2!

@hcho3 and I looked through the open issues today, here's the dependency graph of remaining work:

---
title: NumPy 2.0 dependencies
---
flowchart TD
    A[fmt/spdlog] --> B[cuspatial]
    B --> C[cuxfilter]
    B --> D[rapids]
    E[rapids-xgboost] --> D
    F[cugraph] --> G[cugraph-gnn]
    F --> D
    G -- (optional for 24.10) --> H[cugraph-pg]
Loading

Represented in link-to-ticket form:

And then finally, after that:

I'll start an internal chat thread to try coordinating the work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

7 participants