secp256k1: Reduce scalar base mult copies. #2898

davecgh · 2022-03-09T07:12:48Z

~~This is rebased on #2888~~.

Profiling shows that around 7.5% of the time in scalar base multiplication is attributed to duffcopy. Upon further examination, this is the result of a combination of the range statement making copies of the bytes and the need to construct a Jacobian point from the individual field values stored in the in-memory byte points table.

This optimizes the function to avoid that as follows:

Perform the conversion to Jacobian once when the affine byte table is decompressed from the stored values
Make use of those Jacobian points directly
Use an indexed for loop instead of a range over the bytes
Perform the calculation using the result variable directly instead of via a local variable that is copied to the result

The following benchmark results show the speedup is in line with the expected gains per the profiling results:

name                     old time/op   new time/op    delta
------------------------------------------------------------------------------
ScalarBaseMultNonConst   24.1µs ±22%   22.5µs ± 2%   -6.97%  (p=0.000 n=98+96)

rstaudt2 · 2022-03-13T15:08:15Z

The PR description says this is rebased on #2888, but it looks like it is just based off master currently.

davecgh · 2022-03-13T17:28:39Z

Ah, I guess I rebased it over master in the last round of updates. It's rebased over 2888 now as intended.

Profiling shows that around 7.5% of the time in scalar base multiplication is attributed to duffcopy. Upon further examination, this is the result of a combination of the range statement making copies of the bytes and the need to construct a Jacobian point from the individual field values stored in the in-memory byte points table. This optimizes the function to avoid that as follows: - Perform the conversion to Jacobian once when the affine byte table is decompressed from the stored values - Make use of those Jacobian points directly - Use an indexed for loop instead of a range over the bytes - Perform the calculation using the result variable directly instead of via a local variable that is copied to the result The following benchmark results show the speedup is in line with the expected gains per the profiling results: name old time/op new time/op delta ------------------------------------------------------------------------------ ScalarBaseMultNonConst 24.1µs ±22% 22.5µs ± 2% -6.97% (p=0.000 n=98+96)

davecgh added the optimization label Mar 9, 2022

davecgh added this to the 1.8.0 milestone Mar 9, 2022

davecgh force-pushed the secp256k1_optimize_scalarbasemult branch from 2bbfb34 to 9fcf7d6 Compare March 10, 2022 17:05

davecgh force-pushed the secp256k1_optimize_scalarbasemult branch from 9fcf7d6 to 4037827 Compare March 13, 2022 17:27

rstaudt2 approved these changes Mar 14, 2022

View reviewed changes

davecgh force-pushed the secp256k1_optimize_scalarbasemult branch from 4037827 to 9f550ed Compare March 14, 2022 19:41

JoeGruffins approved these changes Mar 17, 2022

View reviewed changes

matheusd approved these changes Mar 18, 2022

View reviewed changes

davecgh force-pushed the secp256k1_optimize_scalarbasemult branch from 9f550ed to aae0128 Compare March 18, 2022 00:32

davecgh merged commit aae0128 into decred:master Mar 18, 2022

davecgh deleted the secp256k1_optimize_scalarbasemult branch March 18, 2022 00:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

secp256k1: Reduce scalar base mult copies. #2898

secp256k1: Reduce scalar base mult copies. #2898

davecgh commented Mar 9, 2022 •

edited

Loading

rstaudt2 commented Mar 13, 2022

davecgh commented Mar 13, 2022 •

edited

Loading

secp256k1: Reduce scalar base mult copies. #2898

secp256k1: Reduce scalar base mult copies. #2898

Conversation

davecgh commented Mar 9, 2022 • edited Loading

rstaudt2 commented Mar 13, 2022

davecgh commented Mar 13, 2022 • edited Loading

davecgh commented Mar 9, 2022 •

edited

Loading

davecgh commented Mar 13, 2022 •

edited

Loading