Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add MP_SMALL_STACK_SIZE option #538

Merged
merged 11 commits into from
Apr 4, 2024
Merged

add MP_SMALL_STACK_SIZE option #538

merged 11 commits into from
Apr 4, 2024

Conversation

sjaeckel
Copy link
Member

@sjaeckel sjaeckel commented Oct 27, 2022

This adds an option to use a heap-buffer for the usually stack-based
MP_WARRAY-sized temporary buffers.

Per default it will reserve a single buffer, which can be modified

  • at compile-time via the MP_WARRAY_NUM define
  • at run-time by calling mp_warray_init()

The internal structure can only be created once. If one wants to modify
the maximum number of elements, the entire structure has to be free'd
by calling mp_warray_free().

In case one wants to use this option with multiple threads, one shall
use the mp_warray_init() function and pass appropriate locking functions.

Timings

op / algo ECC RSA
decrypt_key ecc_decrypt_key rsa_decrypt_key
encrypt_key ecc_encrypt_key rsa_encrypt_key
sign_hash ecc_sign_hash rsa_sign_hash
verify_hash ecc_verify_hash rsa_verify_hash

Related to: #441 #447 #511

@sjaeckel sjaeckel force-pushed the small-stacksize branch 11 times, most recently from d577ff8 to 75d8a23 Compare October 28, 2022 12:38
@sjaeckel sjaeckel marked this pull request as ready for review October 28, 2022 12:42
@sjaeckel
Copy link
Member Author

@jlaitine could you maybe have a look and tell me whether that's an acceptable solution for you?

Copy link
Contributor

@czurnieden czurnieden left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Found nothing except some typos.

But the effect on the ECC computations is interesting!

.github/workflows/main.yml Outdated Show resolved Hide resolved
doc/bn.tex Outdated Show resolved Hide resolved
doc/bn.tex Outdated Show resolved Hide resolved
doc/bn.tex Outdated Show resolved Hide resolved
doc/bn.tex Show resolved Hide resolved
doc/bn.tex Outdated Show resolved Hide resolved
s_mp_warray_get.c Outdated Show resolved Hide resolved
@sjaeckel sjaeckel self-assigned this Apr 18, 2023
@czurnieden
Copy link
Contributor

What happened to this PR? Did you change your mind? I find it quite interesting to have a choice between stack and heap for this. On the other side: it breaks our general thread-safety and needs locks.
But I'm still supportive.

@sjaeckel sjaeckel force-pushed the small-stacksize branch 2 times, most recently from 838b736 to f18c997 Compare March 11, 2024 14:32
@sjaeckel
Copy link
Member Author

What happened to this PR? Did you change your mind?

No, I just didn't have the passion to follow up :)

On the other side: it breaks our general thread-safety and needs locks.

That's indeed a thing. I'm not sure whether it'd make sense to replace the locks by taking care of that ourselves and using lock-free/atomic operations under the hood!?

@sjaeckel sjaeckel force-pushed the small-stacksize branch 9 times, most recently from 3e1a7ce to 887b799 Compare March 14, 2024 14:40
@czurnieden
Copy link
Contributor

Sorry, took a bit longer to check.

Better like that!?

Yes, looks reasonable.

Caveat: I cannot test the Windows stuff now: my last PC here with Windows (what is the plural of Windows?) let the magic smoke out, need a visit to Ebay before going on.

PS: I am aware of my TODO list, it's not forgotten!

@sjaeckel
Copy link
Member Author

what is the plural of Windows?

My SO just explained to me that the plural only works with the complete name! Microsoft Windows -> Microsofts Windows

my TODO list

No need to hurry, we have time.

test the Windows stuff

I'll see what I can do by extending CI.

@czurnieden
Copy link
Contributor

My SO just explained to me that the plural only works with the complete name! Microsoft Windows -> Microsofts Windows

Ah, ok, my thanks to your SO!

I'll see what I can do by extending CI.

That would be really nice!

No need to hurry, we have time.

You have, but I'm an overweight old fart already! ;-)

My main problem is the error I got with the base2->base10 conversion with a single round of NR to compute the next divisor at a very large (close to MP_MAX_DIGIT_COUNT with MP_64BIT ) number. Sadly a randomly generated large number and it hasn't been logged what it was (Full HD. Yes, I'm an idiot). Haven't been able to repeat the error and am trying to compute the accumulated error of the NR rounds now manually. Fiddly.

We need to be able to do at most 22 rounds with n_0 = 1000 and the individual absolute error is about 2^(-34). But it is not floating point, it is integer math, so what is the accumulated error at the end? Does it even accumulate?

Was the aformentioned single error just a random bitflip? An error elsewhere?

*hng*!

@sjaeckel sjaeckel force-pushed the small-stacksize branch 2 times, most recently from 2723f09 to d6dca63 Compare March 27, 2024 07:58
@sjaeckel
Copy link
Member Author

Ah, ok, my thanks to your SO!

You're very welcome ;-)

I'll see what I can do by extending CI.

wow, what a ride ... In between I was not sure whether I'd simply disable the feature for MSVC ... but now it seems to work^TM ...

This adds an option to use a heap-buffer for the usually stack-based
`MP_WARRAY`-sized temporary buffers.

Per default it will reserve a single buffer, which can be modified
* at compile-time via the `MP_WARRAY_NUM` define
* at run-time by calling `mp_warray_init()`

The internal structure can only be created once. If one wants to modify
the maximum number of elements, the entire structure has to be free'd
by calling `mp_warray_free()`.

In case one wants to use this option with multiple threads, one shall
use the `mp_warray_init()` function and pass appropriate locking functions.

Signed-off-by: Steffen Jaeckel <s@jaeckel.eu>
`s_warray_init()` and `s_warray_free()` are not safe and MUST NOT be called
from multiple threads.

This also removes `MP_WARRAY_NUM`, since automatic initialization will not
be safe for more than one thread.

Signed-off-by: Steffen Jaeckel <s@jaeckel.eu>
To be able to support this for MSVC as well, we have to move this into a
separate private API function.

Signed-off-by: Steffen Jaeckel <s@jaeckel.eu>
Output gets garbeled a bit, but we only care for the result which is
`Tests OK/NOP/FAIL: 50/0/0`.

Add `-Wno-incomplete-setjmp-declaration` since `clang-10` shipping with
Ubuntu 20.04 seems broken... and `-Wno-unknown-warning-option` since
`clang-8` doesn't know about this warning...

Signed-off-by: Steffen Jaeckel <s@jaeckel.eu>
... and fix some MSVC related (and other) things.

Signed-off-by: Steffen Jaeckel <s@jaeckel.eu>
Signed-off-by: Steffen Jaeckel <s@jaeckel.eu>
@sjaeckel
Copy link
Member Author

@jlaitine can you please check if this works for you?

Signed-off-by: Steffen Jaeckel <s@jaeckel.eu>
Signed-off-by: Steffen Jaeckel <s@jaeckel.eu>
Signed-off-by: Steffen Jaeckel <s@jaeckel.eu>
Signed-off-by: Steffen Jaeckel <s@jaeckel.eu>
Signed-off-by: Steffen Jaeckel <s@jaeckel.eu>
@sjaeckel
Copy link
Member Author

sjaeckel commented Apr 3, 2024

In between I was not sure whether I'd simply disable the feature for MSVC ... but now it seems to work^TM ...

aaaand in the end I disabled it for MSVC since IMO the most elegant solution is via TLS and MSVC seems to have a problem with TLS in DLLs. I'm not 100% sure what's going wrong, but since it was going fine with the static library [0] I suspect that we'd have to implement what's described in [1] and I'm simply not willing to do that, nor would I accept a PR that implements it.

If I'm mistaken there and someone with more MSVC experience wants to chime in and provide an easy fix, I'll happily accept that :)

[0] https://ci.appveyor.com/project/libtom/libtommath/builds/49507598
[1] https://learn.microsoft.com/en-us/windows/win32/dlls/using-thread-local-storage-in-a-dynamic-link-library

@sjaeckel sjaeckel requested a review from czurnieden April 3, 2024 15:47
Copy link
Contributor

@czurnieden czurnieden left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bearhug?

@czurnieden
Copy link
Contributor

https://learn.microsoft.com/en-us/windows/win32/dlls/using-thread-local-storage-in-a-dynamic-link-library

So that's the superlativ of "unnecessary complicated", isn't it?

@sjaeckel sjaeckel merged commit a9a11d2 into develop Apr 4, 2024
83 checks passed
@sjaeckel sjaeckel deleted the small-stacksize branch April 4, 2024 07:47
@jlaitine
Copy link
Contributor

jlaitine commented Apr 8, 2024

@jlaitine can you please check if this works for you?

Hi sorry for the very long response time, I have been busy with some other things. I see that this already got merged, so I tested the development branch directly and this change works perfectly!

I still have a small compilation error, in PX4, which is compiled with "-Wall". Not related to this PR. I created anoter PR out of that: #578

Thanks again,
Jukka

@sjaeckel
Copy link
Member Author

sjaeckel commented Apr 8, 2024

Thanks a lot for the feedback! Glad to hear that it works for you :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants