GPU Batch 5 #1167

mborland · 2024-07-30T20:56:14Z

Adds GPU support to: sqrt1pm1, erf, erfc, erf_inv, erfc_inv, powm1, lgamma, tgamma, and associated gamma functions.

Removes recursion from lgamma and tgamma by adding dispatch functions to handle the reflection case.

codecov · 2024-07-31T19:36:33Z

Codecov Report

Attention: Patch coverage is 93.67089% with 10 lines in your changes missing coverage. Please review.

Project coverage is 94.08%. Comparing base (ef3892c) to head (445e36a).

Files	Patch %	Lines
include/boost/math/special_functions/powm1.hpp	72.22%	5 Missing ⚠️
include/boost/math/special_functions/gamma.hpp	94.59%	4 Missing ⚠️
include/boost/math/special_functions/erf.hpp	94.11%	1 Missing ⚠️

Additional details and impacted files

@@           Coverage Diff            @@
##           develop    #1167   +/-   ##
========================================
  Coverage    94.07%   94.08%           
========================================
  Files          780      780           
  Lines        65764    65797   +33     
========================================
+ Hits         61867    61904   +37     
+ Misses        3897     3893    -4

Files	Coverage Δ
include/boost/math/special_functions/beta.hpp	`100.00% <ø> (ø)`
...de/boost/math/special_functions/detail/erf_inv.hpp	`100.00% <100.00%> (ø)`
...ost/math/special_functions/detail/igamma_large.hpp	`33.11% <100.00%> (ø)`
...ost/math/special_functions/detail/lgamma_small.hpp	`59.90% <100.00%> (+0.54%)`	⬆️
...h/special_functions/detail/unchecked_factorial.hpp	`100.00% <100.00%> (ø)`
include/boost/math/special_functions/sign.hpp	`100.00% <100.00%> (ø)`
include/boost/math/special_functions/sqrt1pm1.hpp	`100.00% <100.00%> (ø)`
include/boost/math/tools/fraction.hpp	`91.04% <100.00%> (+0.56%)`	⬆️
include/boost/math/tools/precision.hpp	`100.00% <100.00%> (ø)`
test/test_erf.cpp	`95.00% <ø> (ø)`
... and 7 more

... and 3 files with indirect coverage changes

Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ef3892c...445e36a. Read the comment docs.

mborland · 2024-07-31T19:48:58Z

@jzmaddock Can you see any reason why Windows platforms would not like the diffs for erfc_inv and tgamma? The former just changes the dispatching method from pointers to std::integral_constant to references and the latter adds a dispatching function. I would assume if there was something fundamentally wrong with the way dispatching is occuring it would hit other platforms. Both are throwing on numeric overflow

dschmitz89 · 2024-08-05T07:18:52Z

@mborland : this looks awesome. Could you give a little high level overview of what the goals of these GPU efforts in Boost.Math are? SciPy is doing something similar at the moment and we are also using some special functions from you guys. CC @steppi in case he did not see these recent GPU PRs

steppi · 2024-08-05T12:35:29Z

This is really awesome to see! Thanks for the ping @dschmitz89. In scipy/scipy#20785 (comment), you had said

From a stats maintenance perspective, I would be sad to see scipy dropping boost due to array API concerns. Their implementations of special functions for statistical purposes are a great resource for us, not mentioning their responsiveness to our issues.

and I'd replied something to the effect that if Boost also supported GPU execution for special functions, then we could keep using Boost's special functions in SciPy without having to maintain separate implementations for other array libraries. @jzmaddock thought that was reasonable and now here we are. I'll play around with getting Boost special functions working in CuPy, and it looks like we'll be able to use them directly there, and aso use them internally in the xsf special function library @izaid and I are working on.

mborland · 2024-08-05T12:51:59Z

@mborland : this looks awesome. Could you give a little high level overview of what the goals of these GPU efforts in Boost.Math are? SciPy is doing something similar at the moment and we are also using some special functions from you guys.

I think Albert's summary is pretty good. @jzmaddock has a quite old PR adding CUDA support here: #127 so I have been extracting from and expanding on that. The development on this repo: https://github.com/cppalliance/cuda-math since my employer can only directly add GPU enabled CI there, and then cherry picking onto boost proper. Two questions for you two @dschmitz89 and @steppi

Is there a specific prioritization of functions or distributions that would benefit you?
Are you looking for CUDA only or are there other platforms that you need/want covered? We have been inserting both CUDA and SYCL support so far. SYCL has some more significant limitations like no recursion. So far it's been relatively easy to work around by adding dispatching functions but there will be functions that need to be completely re-written or will remain unsupported such as anything involving root finding so I'd rather punt rewrites until there is real demand.

izaid · 2024-08-05T13:09:32Z

I think this is great too! I'll let @steppi answer any specific prioritization of functions, but the ones you've done so far look good to me. I think we really only care about CUDA for now, but others may think differently.

@mborland I wanted to throw in one point though. Are you testing these using just nvcc, or nvrtc as well? nvrtc is CUDA's runtime (JIT) compiler, and is what CuPy uses for instance. It has more stringent requirements...

steppi · 2024-08-05T13:10:06Z

2. anything involving root finding

I’m trying to figure out root finding that will work well on GPU now. I’ll keep you posted on my progress.

mborland · 2024-08-05T13:29:58Z

@mborland I wanted to throw in one point though. Are you testing these using just nvcc, or nvrtc as well? nvrtc is CUDA's runtime (JIT) compiler, and is what CuPy uses for instance. It has more stringent requirements...

Right now it's just NVCC but we can look into adding NVRTC as well. Tracked here: cppalliance/cuda-math#7.

I’m trying to figure out root finding that will work well on GPU now. I’ll keep you posted on my progress.

Cool. Thank you

dschmitz89 · 2024-08-05T14:51:20Z

Nice that the conversation is starting. :) On the SciPy side I would defer to @steppi for all technical details as I don't have expertise in CUDA or special function computations.

mborland · 2024-08-06T13:01:00Z

Nice that the conversation is starting. :) On the SciPy side I would defer to @steppi for all technical details as I don't have expertise in CUDA or special function computations.

Feel free to open further issues/discussion here or downstream at: https://github.com/cppalliance/cuda-math. Either way it will get seen.

Only CI failure is from codecov so I am going to merge this one in.

mborland mentioned this pull request Jul 30, 2024

GPU Batch 5 cppalliance/cuda-math#6

Merged

mborland force-pushed the cuda_5 branch from cb739a1 to a0c9289 Compare July 31, 2024 15:38

mborland added 20 commits August 5, 2024 10:19

Add GPU support to sqrt1pm1

b9f6405

Add GPU support to continued fractions and remove recursion

aa5e7e6

Add GPU support to erf and erfc

51a49ad

Add SYCL erf testing

5ab4837

Add sqrt1pm1 to fwd

086ba3c

Add CUDA erf and erfc testing

d5b142c

Replace integral constant pointers with references

00e0aa0

Make erf_inv and erfc_inv GPU enabled

623b1fc

Add erf_inv and erfc_inv CUDA testing

9d2f737

Replace igamma_temme_large integral constant pointer with reference

e469215

Add GPU support to igamma_temme_large

4aaa9b7

Add GPU support to lgamma_small_imp

4f2cbbd

Add additional factorial overloads for GPU times

0a15df8

Add GPU support to lgamma_small

4d51329

Fix signbit GPU support

540c715

Support GPU with root epsilon functions

2bfb2b5

Add GPU support to gamma functions

10e7e1b

Add GPU markers to math_fwd for gamma functions

80f0fff

Add CUDA testing for tgamma and lgamma

9f7215d

Disable factorial check on GPU platform

e9ccdcf

mborland added 13 commits August 5, 2024 10:19

Add SYCL gamma testing

5ad3866

Remove extra overloads

b67f1fd

Remove recursion from tgamma

18ef483

Remove recursion in lgamma

dc792c9

Ignore CUDA warning about GNU force inline

1fcc01a

Ignore literal range warnings

0fafaf2

Ignore mapairy literal range warnings

82d56ac

Make powm1 GPU compatible

49cc43c

Fix GCC warnings

635bd32

Revert if constexpr change

c7e16f7

Fix typo

24a0922

Fix comparison

faf541e

Use tag type idiom

87048d9

mborland force-pushed the cuda_5 branch from f29b7cc to 91479dc Compare August 5, 2024 14:19

Reorder overflow checks

445e36a

mborland force-pushed the cuda_5 branch from 91479dc to 445e36a Compare August 5, 2024 19:29

mborland merged commit ab09ece into develop Aug 6, 2024
77 of 78 checks passed

mborland deleted the cuda_5 branch August 6, 2024 13:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GPU Batch 5 #1167

GPU Batch 5 #1167

mborland commented Jul 30, 2024

codecov bot commented Jul 31, 2024 •

edited

Loading

mborland commented Jul 31, 2024

dschmitz89 commented Aug 5, 2024

steppi commented Aug 5, 2024 •

edited

Loading

mborland commented Aug 5, 2024

izaid commented Aug 5, 2024

steppi commented Aug 5, 2024 •

edited

Loading

mborland commented Aug 5, 2024

dschmitz89 commented Aug 5, 2024

mborland commented Aug 6, 2024

GPU Batch 5 #1167

GPU Batch 5 #1167

Conversation

mborland commented Jul 30, 2024

codecov bot commented Jul 31, 2024 • edited Loading

Codecov Report

mborland commented Jul 31, 2024

dschmitz89 commented Aug 5, 2024

steppi commented Aug 5, 2024 • edited Loading

mborland commented Aug 5, 2024

izaid commented Aug 5, 2024

steppi commented Aug 5, 2024 • edited Loading

mborland commented Aug 5, 2024

dschmitz89 commented Aug 5, 2024

mborland commented Aug 6, 2024

codecov bot commented Jul 31, 2024 •

edited

Loading

steppi commented Aug 5, 2024 •

edited

Loading

steppi commented Aug 5, 2024 •

edited

Loading