results: evaluate against "On Fast Calculation of Addition Chains for Isogeny-Based Cryptography" #56

mmcloughlin · 2020-05-25T06:42:28Z

See how addchain performs on target exponents from "On Fast Calculation of Addition Chains for Isogeny-Based Cryptography":

http://faculty.eng.fau.edu/azarderakhsh/files/2016/11/Inscrypt2016.pdf

The text was updated successfully, but these errors were encountered:

mmcloughlin · 2020-05-25T06:51:39Z

mmcloughlin · 2020-05-25T07:08:45Z

For p₅₁₂ the best result from addchain is:

addchain: best: opt(dictionary(sliding_window(16),continued_fractions(binary)))
addchain: total: 586	doubles: 	498 adds: 88

This is worse by 2 in total length, and worse by a lot on total number of squares. So it could definitely be worthwhile investigating their technique and implementing it.

Nik-U · 2020-12-06T00:10:08Z

I decided to take a look at this. The "smooth isogeny primes" used in the paper have the property that the lower half is mostly '1's, and the upper half appears random. The technique described in Algorithm 4 is essentially:

Find good sliding windows for the upper half.
Choose a width c and fill the lower half with windows containing c '1's.
Find a good addition sequence to generate all of the windows.

This procedure is essentially just the dictionary method with a Hybrid decomposer having the maximum run length set to c. So why is the performance so much worse than the paper?

After carefully examining Figure 2 and comparing it to the output of the hybrid dictionary, the difference is entirely localized to the upper half of the prime (which "appears random"). The SlidingWindow decomposer generates windows that look very different from the ones in Figure 2.

Unfortunately, as far as I can tell, the paper doesn't actually explain the method it uses for choosing its windows. Section 4.1 and Algorithm 4 are almost entirely preoccupied with explaining its SequenceAlgorithm and how to choose c. They simply say to use a maximum window size above 7 bits, and leave it at that. The introduction of the windowing method in Section 2.2 cites four papers, none of which give any clues (with one possible exception, which I mention at the end).

So in the absence of information, I just made up a heuristic that generates windows that more closely resemble the ones shown in Figure 2. I augmented the sliding window technique with two optional rules:

A window cannot contain more than Z '0' bits.
The "shortening rule": If a maximum-length (i.e., K-bit) window contains at least one '0' bit and the window is immediately followed by a '1', then shrink the window to exclude all trailing sequential '1's. For example, if K = 4 and this feature is enabled, then 11011100 would be split as [11]0[111]00 instead of [1101][11]00.

With K=8, Z=3, and the shortening rule enabled, SlidingWindow gets pretty close to Figure 2, but does not reproduce it entirely. This method cannot explain why Figure 2 begins with an oversized window, nor can it explain why 100001 appears in the third row, or why some windows are split differently (e.g., [11011]00[1011011] instead of [11011001]0[11011]).

Nonetheless, these rules result in three improved results:

p256_scalar is improved from +2 to +0
p384_scalar is improved from +1 to +0
isop512_field (p512 from the paper - 2) is improved from +2 to -3

It makes sense that these techniques would help with P-256 and P-384, because they share a similar binary structure, albeit reversed (long '1' chains in the upper half, and randomness in the lower half). The solution for isop512_field has 83 adds and 498 doubles. This is more adds than in the paper, but it actually narrowly beats the paper by its own weighting metric ("S = 0.8M" in Section 2.3). The SlidingWindow heuristics are never able to reach 75 adds, even with similar parameters; the new best result comes from a completely different parameter shape with T = 20 and K = 16.

I made the changes to try out this method in Nik-U@9d14747. If this approach seems reasonable then I can submit a pull request, but I wanted to check with you first to make sure this is an acceptable architecture. The change adds two optional fields to SlidingWindow to implement the heuristics above. It should be backward compatible, except of course if anyone was initializing the struct without named fields. The change also adds prime.SmoothIsogeny to capture the prime shape in the paper and add it to the results set.

The change adds decomposers with the new techniques into the Ensemble. This increases the number of ChainAlgorithms to 615, which is significantly more than before. Unfortunately, it looks like all of the new parameter shapes tie for optimality for at least one result, so it's not clear where to prune if the increased running time is a problem.

The only potential clue for how the paper might have generated its windows is reference 17: Improved Techniques for Fast Exponentiation. Section 5.2 contains an "unsigned fractional windows" algorithm that seems to be unlike the currently available decomposers. It is buried under some unintuitive notation, but essentially it boils down to: construct sliding windows with a given max length from LSB to MSB, but if a w+1 length window contains a value greater than 2^w + m for a parameter m, then create a window excluding the 2^w and handle that in a subsequent iteration. That technique might be worth investigating as another type of decomposer.

Nik-U · 2020-12-06T02:38:35Z

I took a look at reference 17 mentioned at the end of my previous comment. Section 5.2 is essentially a right-to-left sliding window method that allows the window to expand by an extra bit sometimes. For example, with w = 8 for an 8-bit window, the window is allowed to expand to 9 bits if the resulting 9-bit value is no greater than 2^w + m. In practice, this means that the window can sometimes be extra large if it contains a lot of zeroes, but otherwise it is a standard right-to-left sliding window algorithm.

Sadly, this does not explain where the upper windows in Figure 2 of the isogeny prime paper came from, because they certainly do not take this shape (e.g., all windows in question are w+1 = 8 bits or smaller, there is a window with value 243, but the algorithm chose not to construct a window with value 179). For now at least, the technique remains a mystery.

mmcloughlin · 2020-12-07T09:27:31Z

@Nik-U I have a very busy week or two coming up so I don't know if I can give this full attention, but I just wanted to acknowledge your messages and say thanks for your work. These are really great results and I would definitely like to incorporate your work into addchain. Just a couple of quick thoughts.

Your code looks good overall. My only architectural question is whether we should add another decomposer type rather than adding fields to SlidingWindow? The boolean S seems especially odd. Maybe it's okay, but this is just my initial gut reaction.

I also found the paper to be lacking clarity on some key points, so I emailed the authors back in May asking if they could share the code. Unfortunately, I didn't get a response. I notice that you seem to be associated with CrySP at Waterloo along with David Jao. Can you ask him for the code?

Nik-U · 2020-12-07T19:41:56Z

@Nik-U I have a very busy week or two coming up so I don't know if I can give this full attention, but I just wanted to acknowledge your messages and say thanks for your work.

No worries! I'll submit the pull request when it's ready, but there's no need for rush processing. 🙂

Your code looks good overall. My only architectural question is whether we should add another decomposer type rather than adding fields to SlidingWindow? The boolean S seems especially odd. Maybe it's okay, but this is just my initial gut reaction.

I agree that a new Decomposer is a better design. However, I think it should be accompanied by a change to Hybrid that takes a nested Decomposer as a parameter field, instead of K. That way we can avoid having a different Hybrid for each sliding window approach. If left blank, it can construct a SlidingWindow by default.

The new SlidingWindow heuristics do not have a name, so I'm thinking of calling it ShortSlidingWindow. Z would be specific to that Decomposer, since having Z != K doesn't help the results without also using the shortening rule.

Since submitting the issue, I've also found that implementing a "right-to-left" sliding window method improves the results even further:

p256_scalar is further improved from +0 (with the shortening patch) to -1
p2519_field is improved from 263 to 261

So I think that the best way to do this would be:

Make Hybrid take a Decomposer to use after eliminating runs
SlidingWindow remains as is, add ShortSlidingWindow with Z and the new shortening rule
Add RTLSlidingWindow and RTLShortSlidingWindow as right-to-left versions of the above
All of the sliding window implementations can share two implementations internally (one for each direction)

Adding the right-to-left versions further increases the size of the Ensemble to 815, even when being conservative about the Decomposers. Personally, I'd rather have the library spend more time to come up with a better solution, but maybe not all users have the same trade-off. After this issue is addressed, it may be worthwhile to consider having multiple presets in the ensemble package based on how much time the user is willing to invest. That can go in a separate issue, though.

Can you ask him for the code?

I'll see if the code or method is available around here somewhere, though I suspect that the windows might have been hand-generated.

Nik-U · 2020-12-08T18:51:29Z

Can you ask him for the code?

I'll see if the code or method is available around here somewhere, though I suspect that the windows might have been hand-generated.

I've received confirmation that the upper windows for the prime were manually selected, so no algorithm is available. Luckily the two heuristics can get us quite close.

Krijn-math · 2021-03-04T13:29:18Z

To add to this discussion: we've recently used addchain to compute addition chains for isogeny primes, but for slightly different shaped primes (CSIDH primes) and slightly different exponents. The preprint is available here: https://eprint.iacr.org/2021/259

Basically, we use primes of the form p := 2^k prod_i ell_i - 1, where the ell_i are small odd primes. (Often in isogeny-based crypto one takes the smallest n = 74 primes for the ell_i, and sometimes even n = 221 or n = 306.) We computed addition chains for the exponents:

exp_2 := (p+1) / 4             (equivalent to taking square root)
exp_3 := (2*p-1) / 3           (equivalent to taking third root)
exp_4 := (p+1) / 8             (equivalent to taking fourth root)
exp_5 := (3*p-2) / 5           (equivalent to taking fifth root)
exp_7 := (4*p-3) / 7           (equivalent to taking seventh root)
exp_9 := (5*p-4) / 9           (equivalent to taking nineth root)

In general, we want to find addition chains for exponents that are equivalent to taking the N-th root for N = ell_i for some i, which speeds up cryptographical computations. Now, these exponents exp_i are all quite interesting, because if the prime is very large (say 4096 bits or so) then p+1 has this big cofactor 2^k, which implies that p itself has a lower half all 1's (similar to the SIDH-primes). Therefore, something similar holds for these exp_i, some of them will have a lower half of 0's (such as exp_2 and exp_4), some of them will have a lower half of 1's (exp_i for i = 3, 5, 7) and exp_9 has a lower half of ...010101...

It seems to me that similar methods as from this paper should be applicable for fast calculations of addition chains for N-th roots for CSIDH primes!

mmcloughlin · 2021-04-06T05:32:00Z

Awesome, thanks for the summary @Krijn-math. Apologies if this is already in the paper but I'm curious if you have best known hand-optimized addition chains for these exponents that we can compare against?

Krijn-math · 2021-04-08T08:35:16Z

@mmcloughlin We don't have those. For a moment I considered doing so, but the structure of these primes doesn't make that too much fun...

Nik-U mentioned this issue Jun 8, 2021

alg/dict: SlidingWindow: max zeroes and shortening #110

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

results: evaluate against "On Fast Calculation of Addition Chains for Isogeny-Based Cryptography" #56

results: evaluate against "On Fast Calculation of Addition Chains for Isogeny-Based Cryptography" #56

mmcloughlin commented May 25, 2020

mmcloughlin commented May 25, 2020

mmcloughlin commented May 25, 2020 •

edited

Loading

Nik-U commented Dec 6, 2020

Nik-U commented Dec 6, 2020

mmcloughlin commented Dec 7, 2020

Nik-U commented Dec 7, 2020

Nik-U commented Dec 8, 2020

Krijn-math commented Mar 4, 2021

mmcloughlin commented Apr 6, 2021

Krijn-math commented Apr 8, 2021

results: evaluate against "On Fast Calculation of Addition Chains for Isogeny-Based Cryptography" #56

results: evaluate against "On Fast Calculation of Addition Chains for Isogeny-Based Cryptography" #56

Comments

mmcloughlin commented May 25, 2020

mmcloughlin commented May 25, 2020

mmcloughlin commented May 25, 2020 • edited Loading

Nik-U commented Dec 6, 2020

Nik-U commented Dec 6, 2020

mmcloughlin commented Dec 7, 2020

Nik-U commented Dec 7, 2020

Nik-U commented Dec 8, 2020

Krijn-math commented Mar 4, 2021

mmcloughlin commented Apr 6, 2021

Krijn-math commented Apr 8, 2021

mmcloughlin commented May 25, 2020 •

edited

Loading