Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add proof of concept for NVRTC support #1170

Merged
merged 8 commits into from
Aug 8, 2024
Merged

Add proof of concept for NVRTC support #1170

merged 8 commits into from
Aug 8, 2024

Conversation

mborland
Copy link
Member

@mborland mborland commented Aug 6, 2024

Development and completed CUDA runs can be found here: cppalliance/cuda-math#8

This one was requested by @izaid for CuPy usage. To me it seems beneficial to just alias libcu++ anywhere we can for this level of support because NVRTC has extremely strict rules. We're going to have to create aliases to everything in std:: to either point to cuda::std::, thrust::, or roll our own implementation for things that don't exist such as the already completed implementation of <limits>. I'm also not quite sure how to pass relative paths to nvrtcCompileProgram so right now I have everything as hard-coded full paths. This is also for runtime compilation so we get less useful results out of just compilation than we do with NVCC here, but everything is still run first through the cuda-math repo.

Any thoughts: @jzmaddock, @ckormanyos, @NAThompson?

@izaid
Copy link

izaid commented Aug 6, 2024

Thanks for setting this up @mborland! And ccing @steppi here.

Yes, NVRTC is quite strict, but unfortunately it's used in various places so we would need to support it. We went through this as well, and our solution indeed was just to alias the relevant parts of std. It definitely wasn't a complete implementation, but we did copy-and-paste a lot of <limits>, etc.

@mborland
Copy link
Member Author

mborland commented Aug 6, 2024

@izaid, @steppi, @dschmitz89 Do any of you want or need ROCm support as long as we are going back through this?

edit: To me it doesn't seem especially valuable since there's translations available from SYCL or CUDA code. Looks more like additional maintenance without a ton of benefit.

@izaid
Copy link

izaid commented Aug 6, 2024

I don't think we need ROCm support, CUDA is the big win.

Copy link

codecov bot commented Aug 6, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 94.08%. Comparing base (ab09ece) to head (135208b).

Additional details and impacted files

Impacted file tree graph

@@             Coverage Diff             @@
##           develop    #1170      +/-   ##
===========================================
+ Coverage    94.06%   94.08%   +0.01%     
===========================================
  Files          780      780              
  Lines        65797    65797              
===========================================
+ Hits         61892    61904      +12     
+ Misses        3905     3893      -12     
Files Coverage Δ
include/boost/math/special_functions/gamma.hpp 91.75% <ø> (ø)

... and 1 file with indirect coverage changes


Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ab09ece...135208b. Read the comment docs.

@jzmaddock
Copy link
Collaborator

We're going to have to create aliases to everything in std:: to either point to cuda::std::, thrust::

Since our calls are all unqualified, can we not just change BOOST_MATH_STD_USING to point to the appropriate namespace?

@steppi
Copy link

steppi commented Aug 7, 2024

Since our calls are all unqualified, can we not just change BOOST_MATH_STD_USING to point to the appropriate namespace?

@izaid and I wanted to do the equivalent of this in SciPy. The hitch was that to get things working in CuPy (which uses NVRTC) we needed a patchwork of different namespaces. thrust for complex arithmetic, mostly cuda::std, sometimes just :: from cuda_runtime.h, and I think even had to copy and paste some stuff from the standard library and add the CUDA markup.

@mborland
Copy link
Member Author

mborland commented Aug 7, 2024

We're going to have to create aliases to everything in std:: to either point to cuda::std::, thrust::

Since our calls are all unqualified, can we not just change BOOST_MATH_STD_USING to point to the appropriate namespace?

The equivalent of BOOST_MATH_STD_USING is easy since all the mathematical functions require no special headers and are all in the global namespace. The things we will need from thrust:: are containers, and we have aliases already like tuple: https://github.com/boostorg/math/blob/develop/include/boost/math/tools/tuple.hpp#L14. Utilities are in cuda::std:: such as a version of <type_traits>, etc.: https://nvidia.github.io/cccl/libcudacxx/standard_api/utility_library/type_traits.html. My thought is to create a bunch of alias headers in boost/math/tools like boost::math::tuple so if you call say boost::math::is_same_v it is either std::is_same_v or cuda::std::is_same_v depending on context. For posterity these could all be extracted and put into a central place. @jzmaddock what do you think about landing all these kind of context aware aliases in Config? I thought it might be out of scope, but wanted your thoughts. I'd rather not propose a Boost.GPU_Compat library,

@jzmaddock
Copy link
Collaborator

My thought is to create a bunch of alias headers in boost/math/tools like boost::math::tuple so if you call say boost::math::is_same_v it is either std::is_same_v or cuda::std::is_same_v depending on context. For posterity these could all be extracted and put into a central place. @jzmaddock what do you think about landing all these kind of context aware aliases in Config? I thought it might be out of scope, but wanted your thoughts. I'd rather not propose a Boost.GPU_Compat library,

It's not unreasonable to push this to Config: I'm sure other libraries could make use of this stuff too... but do we want to support this in standalone mode? It might be worth at least raising this on the mailing list?

@mborland
Copy link
Member Author

mborland commented Aug 7, 2024

It's not unreasonable to push this to Config: I'm sure other libraries could make use of this stuff too... but do we want to support this in standalone mode? It might be worth at least raising this on the mailing list?

I would keep the original copies here in math so we can continue supporting standalone mode, because that is what the Python packages depend on. I'll post on the ML.

@mborland
Copy link
Member Author

mborland commented Aug 7, 2024

It's not unreasonable to push this to Config: I'm sure other libraries could make use of this stuff too... but do we want to support this in standalone mode? It might be worth at least raising this on the mailing list?

I would keep the original copies here in math so we can continue supporting standalone mode, because that is what the Python packages depend on. I'll post on the ML.

Looks like the ML is in favor of everything going in it's own library. Pending any other comments I think this is a good limited proof that we can make it work. I'll keep building out the basic functions e.g. fpclassify not provided by CUDA and see if we can get a simple distribution working.

@mborland mborland merged commit 2a4351e into develop Aug 8, 2024
79 checks passed
@mborland mborland deleted the nvrtc branch August 8, 2024 12:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants