uint256: optimize div-related functions by reducing bounds check #168

AaronChen0 · 2024-05-15T15:55:25Z

Since functions addTo and subMulTo are in a for loop inside udivremKnuth, reducing bounds check in them should result in an observable performance boost.

len(y) > 0 should be guaranteed by those respective non-zero checks for divisors inside Div, Mod, AddMod, etc.

Test

go test ./...

ok  	github.com/holiman/uint256	1.375s

Benchmark

go test -run - -bench 'Mod|Div' -count 10 -timeout 40m >/tmp/old
go test -run - -bench 'Mod|Div' -count 10 -timeout 40m >/tmp/new
benchstat old new | grep -v big

goos: linux
goarch: amd64
pkg: github.com/holiman/uint256
cpu: AMD Ryzen 7 7735H with Radeon Graphics         
                                 │     old     │                 new                 │
                                 │   sec/op    │   sec/op     vs base                │
Div/small/uint256-16               3.121n ± 1%   3.087n ± 1%   -1.07% (p=0.020 n=10)
Div/mod64/uint256-16               21.66n ± 1%   21.41n ± 2%   -1.15% (p=0.001 n=10)
Div/mod128/uint256-16              40.81n ± 0%   37.78n ± 2%   -7.42% (p=0.000 n=10)
Div/mod192/uint256-16              36.58n ± 1%   34.38n ± 1%   -5.99% (p=0.000 n=10)
Div/mod256/uint256-16              28.12n ± 1%   26.75n ± 1%   -4.86% (p=0.000 n=10)
Mod/small/uint256-16               4.300n ± 3%   4.075n ± 2%   -5.23% (p=0.000 n=10)
Mod/mod64/uint256-16               27.47n ± 1%   27.69n ± 2%        ~ (p=0.085 n=10)
Mod/mod128/uint256-16              42.11n ± 2%   39.26n ± 1%   -6.77% (p=0.000 n=10)
Mod/mod192/uint256-16              37.63n ± 2%   35.48n ± 2%   -5.70% (p=0.000 n=10)
Mod/mod256/uint256-16              29.85n ± 1%   28.09n ± 1%   -5.91% (p=0.000 n=10)
AddMod/small/uint256-16            6.307n ± 1%   6.368n ± 0%   +0.97% (p=0.011 n=10)
AddMod/mod64/uint256-16            10.02n ± 2%   10.04n ± 0%        ~ (p=0.422 n=10)
AddMod/mod128/uint256-16           19.04n ± 1%   18.72n ± 1%   -1.68% (p=0.001 n=10)
AddMod/mod192/uint256-16           20.96n ± 1%   20.24n ± 1%   -3.41% (p=0.000 n=10)
AddMod/mod256/uint256-16           6.428n ± 1%   6.479n ± 1%   +0.80% (p=0.011 n=10)
MulMod/small/uint256-16            17.71n ± 1%   17.55n ± 2%        ~ (p=0.078 n=10)
MulMod/mod64/uint256-16            36.68n ± 2%   36.83n ± 1%        ~ (p=0.123 n=10)
MulMod/mod128/uint256-16           57.66n ± 1%   55.88n ± 1%   -3.08% (p=0.000 n=10)
MulMod/mod192/uint256-16           73.38n ± 2%   69.39n ± 1%   -5.44% (p=0.000 n=10)
MulMod/mod256/uint256-16           86.87n ± 1%   87.06n ± 0%        ~ (p=0.404 n=10)
MulMod/mod256/uint256r-16          38.22n ± 2%   37.80n ± 1%   -1.12% (p=0.023 n=10)
SDiv/large/uint256-16              37.59n ± 1%   34.93n ± 1%   -7.08% (p=0.000 n=10)
MulDivOverflow/small/uint256-16    22.60n ± 1%   22.38n ± 2%   -1.00% (p=0.001 n=10)
MulDivOverflow/div64/uint256-16    27.58n ± 1%   27.07n ± 2%   -1.83% (p=0.002 n=10)
MulDivOverflow/div128/uint256-16   43.95n ± 2%   40.23n ± 2%   -8.44% (p=0.000 n=10)
MulDivOverflow/div192/uint256-16   56.92n ± 2%   52.31n ± 2%   -8.09% (p=0.000 n=10)
MulDivOverflow/div256/uint256-16   74.00n ± 1%   65.19n ± 2%  -11.90% (p=0.000 n=10)
geomean                            48.88n        47.08n        -3.68%

codecov · 2024-05-15T16:03:32Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 100.00%. Comparing base (66a4528) to head (7aa34b6).

Additional details and impacted files

@@            Coverage Diff            @@
##            master      #168   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files            5         5           
  Lines         1626      1628    +2     
=========================================
+ Hits          1626      1628    +2

holiman

Whether it makes any positive improvement or not is a but unclear, I think, because the benchmarks are inherently a bit flaky.
But I'm fairly certain that it can't hurt, and probably a slight improvement, so why not.

Thanks!

AaronChen0 · 2024-05-16T08:31:37Z

Using godbolt(https://go.godbolt.org/z/4nGefdYrM), I do see the bounds check is moved outside the loop. And the loop is inside another loop, so this PR should help.

If you benchmark using an idle computer (don't benchmark with background jobs running), you definitely would see an upto 7% boost, several ns slashed off.

If the percent number in 37.59n ± 1% is large in benchmark result, that would suggest your computer is busy with other background tasks during benchmarking. Try killing them before benchmarking. You could get a more accurate result.

I have another PR doing loop unrolling which combined with this PR could have an upto 17% boost for div-related functions.

AaronChen0 · 2024-05-16T08:47:49Z

Should I add loop unrolling in this PR to make the performance boost more obvious, or in another PR?

holiman · 2024-05-16T09:06:00Z

Should I add loop unrolling in this PR to make the performance boost more obvious, or in another PR?

Let's do that in a separate PR.

uint256: optimize div-related functions by reducing bounds check

7aa34b6

holiman approved these changes May 16, 2024

View reviewed changes

holiman merged commit 11a325c into holiman:master May 16, 2024
6 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

uint256: optimize div-related functions by reducing bounds check #168

uint256: optimize div-related functions by reducing bounds check #168

AaronChen0 commented May 15, 2024 •

edited

Loading

codecov bot commented May 15, 2024

holiman left a comment

AaronChen0 commented May 16, 2024 •

edited

Loading

AaronChen0 commented May 16, 2024 •

edited

Loading

holiman commented May 16, 2024

uint256: optimize div-related functions by reducing bounds check #168

uint256: optimize div-related functions by reducing bounds check #168

Conversation

AaronChen0 commented May 15, 2024 • edited Loading

Test

Benchmark

codecov bot commented May 15, 2024

Codecov Report

holiman left a comment

Choose a reason for hiding this comment

AaronChen0 commented May 16, 2024 • edited Loading

AaronChen0 commented May 16, 2024 • edited Loading

holiman commented May 16, 2024

AaronChen0 commented May 15, 2024 •

edited

Loading

AaronChen0 commented May 16, 2024 •

edited

Loading

AaronChen0 commented May 16, 2024 •

edited

Loading