Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

results: evaluate against "On Fast Calculation of Addition Chains for Isogeny-Based Cryptography" #56

Open
mmcloughlin opened this issue May 25, 2020 · 10 comments

Comments

@mmcloughlin
Copy link
Owner

See how addchain performs on target exponents from "On Fast Calculation of Addition Chains for Isogeny-Based Cryptography":

http://faculty.eng.fau.edu/azarderakhsh/files/2016/11/Inscrypt2016.pdf

@mmcloughlin
Copy link
Owner Author

Screenshot from 2020-05-24 23-51-15

@mmcloughlin
Copy link
Owner Author

mmcloughlin commented May 25, 2020

For p512 the best result from addchain is:

addchain: best: opt(dictionary(sliding_window(16),continued_fractions(binary)))
addchain: total: 586	doubles: 	498 adds: 88

This is worse by 2 in total length, and worse by a lot on total number of squares. So it could definitely be worthwhile investigating their technique and implementing it.

@Nik-U
Copy link

Nik-U commented Dec 6, 2020

I decided to take a look at this. The "smooth isogeny primes" used in the paper have the property that the lower half is mostly '1's, and the upper half appears random. The technique described in Algorithm 4 is essentially:

  1. Find good sliding windows for the upper half.
  2. Choose a width c and fill the lower half with windows containing c '1's.
  3. Find a good addition sequence to generate all of the windows.

This procedure is essentially just the dictionary method with a Hybrid decomposer having the maximum run length set to c. So why is the performance so much worse than the paper?

After carefully examining Figure 2 and comparing it to the output of the hybrid dictionary, the difference is entirely localized to the upper half of the prime (which "appears random"). The SlidingWindow decomposer generates windows that look very different from the ones in Figure 2.

Unfortunately, as far as I can tell, the paper doesn't actually explain the method it uses for choosing its windows. Section 4.1 and Algorithm 4 are almost entirely preoccupied with explaining its SequenceAlgorithm and how to choose c. They simply say to use a maximum window size above 7 bits, and leave it at that. The introduction of the windowing method in Section 2.2 cites four papers, none of which give any clues (with one possible exception, which I mention at the end).

So in the absence of information, I just made up a heuristic that generates windows that more closely resemble the ones shown in Figure 2. I augmented the sliding window technique with two optional rules:

  • A window cannot contain more than Z '0' bits.
  • The "shortening rule": If a maximum-length (i.e., K-bit) window contains at least one '0' bit and the window is immediately followed by a '1', then shrink the window to exclude all trailing sequential '1's. For example, if K = 4 and this feature is enabled, then 11011100 would be split as [11]0[111]00 instead of [1101][11]00.

With K=8, Z=3, and the shortening rule enabled, SlidingWindow gets pretty close to Figure 2, but does not reproduce it entirely. This method cannot explain why Figure 2 begins with an oversized window, nor can it explain why 100001 appears in the third row, or why some windows are split differently (e.g., [11011]00[1011011] instead of [11011001]0[11011]).

Nonetheless, these rules result in three improved results:

  • p256_scalar is improved from +2 to +0
  • p384_scalar is improved from +1 to +0
  • isop512_field (p512 from the paper - 2) is improved from +2 to -3

It makes sense that these techniques would help with P-256 and P-384, because they share a similar binary structure, albeit reversed (long '1' chains in the upper half, and randomness in the lower half). The solution for isop512_field has 83 adds and 498 doubles. This is more adds than in the paper, but it actually narrowly beats the paper by its own weighting metric ("S = 0.8M" in Section 2.3). The SlidingWindow heuristics are never able to reach 75 adds, even with similar parameters; the new best result comes from a completely different parameter shape with T = 20 and K = 16.

I made the changes to try out this method in Nik-U@9d14747. If this approach seems reasonable then I can submit a pull request, but I wanted to check with you first to make sure this is an acceptable architecture. The change adds two optional fields to SlidingWindow to implement the heuristics above. It should be backward compatible, except of course if anyone was initializing the struct without named fields. The change also adds prime.SmoothIsogeny to capture the prime shape in the paper and add it to the results set.

The change adds decomposers with the new techniques into the Ensemble. This increases the number of ChainAlgorithms to 615, which is significantly more than before. Unfortunately, it looks like all of the new parameter shapes tie for optimality for at least one result, so it's not clear where to prune if the increased running time is a problem.

The only potential clue for how the paper might have generated its windows is reference 17: Improved Techniques for Fast Exponentiation. Section 5.2 contains an "unsigned fractional windows" algorithm that seems to be unlike the currently available decomposers. It is buried under some unintuitive notation, but essentially it boils down to: construct sliding windows with a given max length from LSB to MSB, but if a w+1 length window contains a value greater than 2^w + m for a parameter m, then create a window excluding the 2^w and handle that in a subsequent iteration. That technique might be worth investigating as another type of decomposer.

@Nik-U
Copy link

Nik-U commented Dec 6, 2020

I took a look at reference 17 mentioned at the end of my previous comment. Section 5.2 is essentially a right-to-left sliding window method that allows the window to expand by an extra bit sometimes. For example, with w = 8 for an 8-bit window, the window is allowed to expand to 9 bits if the resulting 9-bit value is no greater than 2^w + m. In practice, this means that the window can sometimes be extra large if it contains a lot of zeroes, but otherwise it is a standard right-to-left sliding window algorithm.

Sadly, this does not explain where the upper windows in Figure 2 of the isogeny prime paper came from, because they certainly do not take this shape (e.g., all windows in question are w+1 = 8 bits or smaller, there is a window with value 243, but the algorithm chose not to construct a window with value 179). For now at least, the technique remains a mystery.

@mmcloughlin
Copy link
Owner Author

@Nik-U I have a very busy week or two coming up so I don't know if I can give this full attention, but I just wanted to acknowledge your messages and say thanks for your work. These are really great results and I would definitely like to incorporate your work into addchain. Just a couple of quick thoughts.

Your code looks good overall. My only architectural question is whether we should add another decomposer type rather than adding fields to SlidingWindow? The boolean S seems especially odd. Maybe it's okay, but this is just my initial gut reaction.

I also found the paper to be lacking clarity on some key points, so I emailed the authors back in May asking if they could share the code. Unfortunately, I didn't get a response. I notice that you seem to be associated with CrySP at Waterloo along with David Jao. Can you ask him for the code?

@Nik-U
Copy link

Nik-U commented Dec 7, 2020

@Nik-U I have a very busy week or two coming up so I don't know if I can give this full attention, but I just wanted to acknowledge your messages and say thanks for your work.

No worries! I'll submit the pull request when it's ready, but there's no need for rush processing. 🙂

Your code looks good overall. My only architectural question is whether we should add another decomposer type rather than adding fields to SlidingWindow? The boolean S seems especially odd. Maybe it's okay, but this is just my initial gut reaction.

I agree that a new Decomposer is a better design. However, I think it should be accompanied by a change to Hybrid that takes a nested Decomposer as a parameter field, instead of K. That way we can avoid having a different Hybrid for each sliding window approach. If left blank, it can construct a SlidingWindow by default.

The new SlidingWindow heuristics do not have a name, so I'm thinking of calling it ShortSlidingWindow. Z would be specific to that Decomposer, since having Z != K doesn't help the results without also using the shortening rule.

Since submitting the issue, I've also found that implementing a "right-to-left" sliding window method improves the results even further:

  • p256_scalar is further improved from +0 (with the shortening patch) to -1
  • p2519_field is improved from 263 to 261

So I think that the best way to do this would be:

  • Make Hybrid take a Decomposer to use after eliminating runs
  • SlidingWindow remains as is, add ShortSlidingWindow with Z and the new shortening rule
  • Add RTLSlidingWindow and RTLShortSlidingWindow as right-to-left versions of the above
  • All of the sliding window implementations can share two implementations internally (one for each direction)

Adding the right-to-left versions further increases the size of the Ensemble to 815, even when being conservative about the Decomposers. Personally, I'd rather have the library spend more time to come up with a better solution, but maybe not all users have the same trade-off. After this issue is addressed, it may be worthwhile to consider having multiple presets in the ensemble package based on how much time the user is willing to invest. That can go in a separate issue, though.

Can you ask him for the code?

I'll see if the code or method is available around here somewhere, though I suspect that the windows might have been hand-generated.

@Nik-U
Copy link

Nik-U commented Dec 8, 2020

Can you ask him for the code?

I'll see if the code or method is available around here somewhere, though I suspect that the windows might have been hand-generated.

I've received confirmation that the upper windows for the prime were manually selected, so no algorithm is available. Luckily the two heuristics can get us quite close.

@Krijn-math
Copy link

To add to this discussion: we've recently used addchain to compute addition chains for isogeny primes, but for slightly different shaped primes (CSIDH primes) and slightly different exponents. The preprint is available here: https://eprint.iacr.org/2021/259

Basically, we use primes of the form p := 2^k prod_i ell_i - 1, where the ell_i are small odd primes. (Often in isogeny-based crypto one takes the smallest n = 74 primes for the ell_i, and sometimes even n = 221 or n = 306.) We computed addition chains for the exponents:

exp_2 := (p+1) / 4             (equivalent to taking square root)
exp_3 := (2*p-1) / 3           (equivalent to taking third root)
exp_4 := (p+1) / 8             (equivalent to taking fourth root)
exp_5 := (3*p-2) / 5           (equivalent to taking fifth root)
exp_7 := (4*p-3) / 7           (equivalent to taking seventh root)
exp_9 := (5*p-4) / 9           (equivalent to taking nineth root)

In general, we want to find addition chains for exponents that are equivalent to taking the N-th root for N = ell_i for some i, which speeds up cryptographical computations. Now, these exponents exp_i are all quite interesting, because if the prime is very large (say 4096 bits or so) then p+1 has this big cofactor 2^k, which implies that p itself has a lower half all 1's (similar to the SIDH-primes). Therefore, something similar holds for these exp_i, some of them will have a lower half of 0's (such as exp_2 and exp_4), some of them will have a lower half of 1's (exp_i for i = 3, 5, 7) and exp_9 has a lower half of ...010101...

It seems to me that similar methods as from this paper should be applicable for fast calculations of addition chains for N-th roots for CSIDH primes!

@mmcloughlin
Copy link
Owner Author

Awesome, thanks for the summary @Krijn-math. Apologies if this is already in the paper but I'm curious if you have best known hand-optimized addition chains for these exponents that we can compare against?

@Krijn-math
Copy link

@mmcloughlin We don't have those. For a moment I considered doing so, but the structure of these primes doesn't make that too much fun...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants