Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RSA key generation time seems to have doubled #174

Closed
davidbernard04 opened this issue Jun 15, 2022 · 12 comments
Closed

RSA key generation time seems to have doubled #174

davidbernard04 opened this issue Jun 15, 2022 · 12 comments

Comments

@davidbernard04
Copy link

Hi! We are using dropbearkey in embedded systems, and noticed that the average delay for generating a RSA key of 2048 bits has doubled with 2022.82 version, compared to our previous 2018.76 version.

For example on these two low-power platforms:

  • MIPS @ 150MHz: Now takes around 80 sec instead of 40 sec
  • ARMv5 @ 456MHz: Now takes around 40 sec instead of 20 sec

It is of course an average, because the generation time varies greatly. Note that above numbers are based on 40 keys generated (20 keys for each version, alternately generated). I know it's not a tremendous amount of tests, but seems enough to see a trend.

Were you aware of such impact?

@mkj
Copy link
Owner

mkj commented Jun 15, 2022

I wasn't aware of it, though haven't benchmarked.
libtommath was updated in 2020.79 which seems the most likely change.
Update LibTomMath to 1.2.0

update ltm to 1.1.0 and enable FIPS 186.4 compliant key-generation
(The "FIPS 186.4 compliant" change should have actually made 2048 RSA faster, 6 rounds rather than 8)

There are also a few changes to dbrandom.c though those seem less likely to change speed.

@davidbernard04
Copy link
Author

About FIPS compliancy, it is actually 12 rounds rather than 8, since primes size for a 2048-bit RSA key are 1024 bits. But I've made a few test loops by hardcoding 8 rounds on the 2022.82 version, and it has no significant impact (some loops are actually worst, lol).

I've made another run of test with 80 keys generated on our ARMv5 platform, and that's in line with previous results:

  • 2018.76: 15.9 sec average for 80 keys
  • 2022.82: 38.1 sec average for 40 keys (with default 12 rounds)
  • 2022.82: 33.6 sec average for 40 keys (with hardcoded 8 rounds)

So I guess that's really the LibTomMath upgrade that affected the generation time.

Any idea what we can do to improve? Otherwise I suppose we'll have to increase our generation timeout on our low-power platforms. Well, that's the cost of progress. ;)

@mkj
Copy link
Owner

mkj commented Jun 15, 2022

Ah, the CFLAGS order changed in libtommath's own makefile. Previously its own -O3 came after Dropbear's default -Os, in 2022.82 it builds with -Os. Could you try the patch below? I see 30kB extra binary size on x64 though, will have to have a look at that.

--- a/libtommath/makefile_include.mk
+++ b/libtommath/makefile_include.mk
@@ -104,7 +104,7 @@ LIBTOOLFLAGS += -no-undefined
 endif
 
 # add in the standard FLAGS
-LTM_CFLAGS += $(CFLAGS)
+LTM_CFLAGS := $(CFLAGS) $(LTM_CFLAGS)
 LTM_LFLAGS += $(LFLAGS)
 LTM_LDFLAGS += $(LDFLAGS)
 LTM_LIBTOOLFLAGS += $(LIBTOOLFLAGS)

For reference
2018.76 (from ps aux so may be a bit different)
-D _FORTIFY_SOURCE=2 -D DROPBEAR_SERVER -D DROPBEAR_CLIENT ../../libtommath/bn_mp_fwrite.c -quiet -dumpbase bn_mp_fwrite.c -dumpbase-ext .c -mtune=generic -march=x86-64 -g -Os -O3 -Wno-pointer-sign -Wall -Wsign-compare -Wextra -Wshadow -Wsystem-headers -Wdeclaration-after-statement -Wbad-function-cast -Wcast-align -Wstrict-prototypes -Wpointer-arith -fno-strict-overflow -fPIE -fstack-protector-strong -funroll-loops -fomit-frame-pointer -fasynchronous-unwind-tables -fstack-protector-strong -Wformat-security -fstack-clash-protection -fcf-protection

2022.82
-Wall -Wsign-compare -Wextra -Wshadow -Wdeclaration-after-statement -Wbad-function-cast -Wcast-align -Wstrict-prototypes -Wpointer-arith -Wsystem-headers -O3 -funroll-loops -fomit-frame-pointer -Os -W -Wall -Wno-pointer-sign -fno-strict-overflow -fPIE -fstack-protector-strong -D_FORTIFY_SOURCE=2 -Werror -ggdb -I../../libtommath -I../libtomcrypt/src/headers/ -I../../libtommath/../libtomcrypt/src/headers/ -I../ -I../../libtommath/../ -Wno-deprecated -I../libtomcrypt/src/headers/ -DLOCALOPTIONS_H_EXISTS -I. -I.. -DDROPBEAR_SERVER -DDROPBEAR_CLIENT ../../libtommath/bn_mp_div_2.c -o bn_mp_div_2.o

@davidbernard04
Copy link
Author

Nice catch! I've tried with your patch and there is an improvement:

  • 2022.82: 35 sec average (-Os)
  • 2022.82: 25 sec average (-O3)

Although still not as fast as the 2018 version:

  • 2018.76: 16 sec average (-O3)

About binary size, there is an extra of ~20 KB on each dropbear and dropbearkey executables between -Os and -O3 for the 2022.82 version (ARM platform).

FYI on our side, we might choose to stick with the -Os flag to save some space (we have some platforms in dire need of free space, lol).

@mkj
Copy link
Owner

mkj commented Jun 18, 2022

OK right. When I get a chance I'll see if I can compare perf of the releases (and rummage for some older boards to try).
On x64 here the make timing; ./timing benchmark of standalone libtommath is the same between the versions.

@davidbernard04
Copy link
Author

@mkj On second thought, I personally prefer the makefile_include.mk before your patch above. This way, we can control the optimization flag for everything (dropbear, libtomcrypto and libtommath) without having one of the lib that forces a -O3 regardless of the CFLAGS we specified.

@mkj
Copy link
Owner

mkj commented Jul 18, 2022

I think having a default of -O3 for the crypto/maths libraries makes sense (that's where performance matters), but -Os is a better default for non-performance code. I guess there could be defaults for each of them which can be overridden with environment variables - LIBTOM_CFLAGS=-O3 from Dropbear's configure script perhaps.

@mkj mkj added the speed label Jul 31, 2022
@sjaeckel
Copy link
Contributor

sjaeckel commented Oct 4, 2022

@davidbernard04 that's caused by internal changes of libtommath where previously the prime verification simply wasn't as good as it is now... and this comes at a speed penalty.

tl;dr - if you are only using RSA you can set the compile-time flag LTM_USE_ONLY_MR of libtommath (I hope you use the bundled version :) this makes life here easier for once), which will disable the LR test and you should be fine to go. Please let me know how your performance numbers evolve, since there were some other changes in this area.


longer story...

2018.76 still used ltm 1.0.1 which always only did t deterministic Miller-Rabin Tests for primality testing. (That's definitively not what you want).
Since ltm 1.1.0 we run t rounds of the MR test against random numbers + a Lucas-Selfridge test. That's how FIPS 186-4 expects how primality testing is done ideally. Before it was simply not according to "the standard"^TM.

After having a quick look into FIPS 186-4 I just realized that it states in Appendix C.3 that for RSA keys there's no requirement for a LR test, only for DSA keys.

... maybe we overshot a bit on the paranoia scale? Maybe, but I prefer to offer a secure default and users can then still decide whether performance is more important to them.


@mkj:

please feel free to tag me in such issues, I don't watch dropbear that closely and only stumbled over this by chance.

I'll take this back to the project and check whether we can find a compatible way where LR tests can be enabled at compile time, but disabled at runtime. Currently that's not possible.

I'd say you can safely turn on the LTM_USE_ONLY_MR switch if DSA key generation is disabled.
For configurations that require both, RSA and DSA key generation at runtime, I don't have a good solution ready.
Do you even still have DSA key generation enabled per default?

One possibility would be to always enable LTM_USE_ONLY_MR and use the number of MR iterations of column 'M-R Tests Only' of Table C.1 in FIPS 186-4 for DSA and the numbers returned by mp_prime_rabin_miller_trials() for RSA...

@davidbernard04
Copy link
Author

@sjaeckel Thanks a lot for your feedback! I've made some tests on the 2022.82 version with the LTM_USE_ONLY_MR define (on our ARMv5 platform), with 80 keys generated for each test below.

For -Os size-optimized build:

  • 36 sec (no define)
  • 26 sec (with the MR define)

For -O3 speed-optimized build:

  • 24 sec (no define)
  • 20 sec (with the MR define)

So it does improve performance for RSA keys generation!

But our app do generate both RSA and ECDSA keys, so unless you or @mkj implement a way for LR tests to be dynamic at runtime (i.e. enabled only for DSA keys), it is preferable that we don't use the LTM_USE_ONLY_MR define.

mkj added a commit that referenced this issue Oct 25, 2022
This was the default prior to 2022.82 and makes a significant
difference to performance. Perhaps at a later time
this could be made more configurable.

Discussion in #174
@mkj mkj added the regression label Nov 9, 2022
@mkj
Copy link
Owner

mkj commented Nov 10, 2022

I've now added a LTM_CFLAGS argument to configure - it defaults to -O3 -funroll-loops like before, but you can set
./configure LTM_CFLAGS=-Os
54a90dd

I'm also setting LTM_USE_ONLY_MR only when DROPBEAR_DSS is enabled (and have set DROPBEAR_DSS disabled by default,
it's on the way out).
0e70732

@sjaeckel just to confirm, ecdsa doesn't seem to need primality testing at all does it?

@mkj mkj closed this as completed Nov 10, 2022
@mkj
Copy link
Owner

mkj commented Nov 10, 2022

That should read "setting LTM_USE_ONLY_MR only when DROPBEAR_DSS is disabled".
And I've noticed the check was wrong, fixed 960d374

@sjaeckel
Copy link
Contributor

@sjaeckel just to confirm, ecdsa doesn't seem to need primality testing at all does it?

Correct, primality testing is only required for RSA and DSA.

freebsd-git pushed a commit to freebsd/freebsd-ports that referenced this issue Nov 15, 2022
Features and Changes:
  Note >> for compatibility/configuration changes

- >> Disable DROPBEAR_DSS by default
  It is only 1024 bit and uses sha1, most distros disable it by default already.

- Added DROPBEAR_RSA_SHA1 option to allow disabling sha1 rsa signatures.
  >> RSA with sha1 will be disabled in a future release (rsa keys will continue
  to work OK, with sha256 signatures used instead).

- Add option for requiring both password and pubkey (-t)
  Patch from Jackkal

- Add 'no-touch-required' and 'verify-required' options for sk keys
  Patch from Egor Duda

  - >> DROPBEAR_SK_KEYS config option now replaces separate DROPBEAR_SK_ECDSA
  and DROPBEAR_SK_ED25519 options.

- Add 'permitopen' option for authorized_keys to restrict forwarded ports
  Patch from Tuomas Haikarainen

- >> Added LTM_CFLAGS configure argument to set flags for building
  bundled libtommath. This also restores the previous arguments used
  in 2020.81 (-O3 -funroll-loops). That gives a big speedup for RSA
  key generation, which regressed in 2022.82.
  There is a tradeoff with code size, so -Os can be used if required.
  mkj/dropbear#174
  Reported by David Bernard

- Add '-z' flag to disable setting QoS traffic class. This may be necessary
  to work with broken networks or network drivers, exposed after changes to use
  AF21 in 2022.82
  mkj/dropbear#193
  Reported by yuhongwei380, patch from Petr Štetiar

- Allow overriding user shells with COMPAT_USER_SHELLS
  Based on a patch from Matt Robinson

- Improve permission error message
  Patch from k-kurematsu

- >> Remove HMAC_MD5 entirely

Regression fixes from 2022.82:

- Fix X11 build

- Fix build warning

- Fix compilation when disabling pubkey authentication
  Patch from MaxMougg

- Fix MAX_UNAUTH_CLIENTS regression
  Reported by ptpt52

- Avoid using slower prime testing in bundled libtomcrypt when DSS is disabled
  mkj/dropbear#174
  Suggested by Steffen Jaeckel

- Fix Dropbear plugin support
  mkj/dropbear#194
  Reported by Struan Bartlett

Other fixes:

- Fix long standing incorrect compression size check. Dropbear
  (client or server) would erroneously exit with
  "bad packet, oversized decompressed"
  when receiving a compressed packet of exactly the maximum size.

- Fix missing setsid() removed in 2020.79
  mkj/dropbear#180
  Reported and debugged by m5jt and David Bernard

- Try keyboard-interactive auth before password, in dbclient.
  This was unintentionally changed back in 2013
  mkj/dropbear#190
  Patch from Michele Giacomoli

- Drain the terminal when reading the fingerprint confirmation response
  mkj/dropbear#191
  Patch from Michele Giacomoli

- Fix utx wtmp variable typo. This has been wrong for a long time but
  only recently became a problem when wtmp was detected.
  mkj/dropbear#189
  Patch from Michele Giacomoli

- Improve configure test for hardening options.
  Fixes building on AIX
  mkj/dropbear#158

- Fix debian/dropbear.init newline
  From wulei-student

Infrastructure:

- Test off-by-default compile options
- Set -Wundef to catch typos in #if statements
netbsd-srcmastr pushed a commit to NetBSD/pkgsrc that referenced this issue Dec 20, 2023
Include terrapin fix and bump PKGREVISION to make clear this
is not 2022.83.

2022.83 - 14 November 2022

Features and Changes:
  Note >> for compatibility/configuration changes

- >> Disable DROPBEAR_DSS by default
  It is only 1024 bit and uses sha1, most distros disable it by default already.

- Added DROPBEAR_RSA_SHA1 option to allow disabling sha1 rsa signatures.
  >> RSA with sha1 will be disabled in a future release (rsa keys will continue
  to work OK, with sha256 signatures used instead).

- Add option for requiring both password and pubkey (-t)
  Patch from Jackkal

- Add 'no-touch-required' and 'verify-required' options for sk keys
  Patch from Egor Duda

  - >> DROPBEAR_SK_KEYS config option now replaces separate DROPBEAR_SK_ECDSA
  and DROPBEAR_SK_ED25519 options.

- Add 'permitopen' option for authorized_keys to restrict forwarded ports
  Patch from Tuomas Haikarainen

- >> Added LTM_CFLAGS configure argument to set flags for building
  bundled libtommath. This also restores the previous arguments used
  in 2020.81 (-O3 -funroll-loops). That gives a big speedup for RSA
  key generation, which regressed in 2022.82.
  There is a tradeoff with code size, so -Os can be used if required.
  mkj/dropbear#174
  Reported by David Bernard

- Add '-z' flag to disable setting QoS traffic class. This may be necessary
  to work with broken networks or network drivers, exposed after changes to use
  AF21 in 2022.82
  mkj/dropbear#193
  Reported by yuhongwei380, patch from Petr Štetiar

- Allow overriding user shells with COMPAT_USER_SHELLS
  Based on a patch from Matt Robinson

- Improve permission error message
  Patch from k-kurematsu

- >> Remove HMAC_MD5 entirely

Regression fixes from 2022.82:

- Fix X11 build

- Fix build warning

- Fix compilation when disabling pubkey authentication
  Patch from MaxMougg

- Fix MAX_UNAUTH_CLIENTS regression
  Reported by ptpt52

- Avoid using slower prime testing in bundled libtomcrypt when DSS is disabled
  mkj/dropbear#174
  Suggested by Steffen Jaeckel

- Fix Dropbear plugin support
  mkj/dropbear#194
  Reported by Struan Bartlett

Other fixes:

- Fix long standing incorrect compression size check. Dropbear
  (client or server) would erroneously exit with
  "bad packet, oversized decompressed"
  when receiving a compressed packet of exactly the maximum size.

- Fix missing setsid() removed in 2020.79
  mkj/dropbear#180
  Reported and debugged by m5jt and David Bernard

- Try keyboard-interactive auth before password, in dbclient.
  This was unintentionally changed back in 2013
  mkj/dropbear#190
  Patch from Michele Giacomoli

- Drain the terminal when reading the fingerprint confirmation response
  mkj/dropbear#191
  Patch from Michele Giacomoli

- Fix utx wtmp variable typo. This has been wrong for a long time but
  only recently became a problem when wtmp was detected.
  mkj/dropbear#189
  Patch from Michele Giacomoli

- Improve configure test for hardening options.
  Fixes building on AIX
  mkj/dropbear#158

- Fix debian/dropbear.init newline
  From wulei-student

Infrastructure:

- Test off-by-default compile options

- Set -Wundef to catch typos in #if statements


2022.82 - 1 April 2022

Features and Changes:
  Note >> for compatibility/configuration changes

- Implemented OpenSSH format private key handling for dropbearconvert.
  Keys can be read in OpenSSH format or the old PEM format.
  >> Keys are now written in OpenSSH format rather than PEM.
  ED25519 support is now correct. DSS keys are still PEM format.

- Use SHA256 for key fingerprints

- >> Reworked -v verbose printing, specifying multiple times will increase
  verbosity. -vvvv is equivalent to the old DEBUG_TRACE -v level, it
  can be configured at compile time in localoptions.h (see default_options.h)
  Lower -v options can be used to check connection progress or algorithm
  negotiation.
  Thanks to Hans Harder for the implementation

  localoptions.h DEBUG_TRACE should be set to 4 for the same result as the
  previous DEBUG_TRACE 1.

- Added server support for U2F/FIDO keys (ecdsa-sk and ed25519-sk) in
  authorized_keys. no-touch-required option isn't allowed yet.
  Thanks to Egor Duda for the implementation

- autoconf output (configure script etc) is now committed to version control.
  >> It isn't necessary to run "autoconf" any more on a checkout.

- sha1 will be omitted from the build if KEX/signing/MAC algorithms don't
  require it. Instead sha256 is used for random number generation.
  See sysoptions.h to see which algorithms require which hashes.

- Set SSH_PUBKEYINFO environment variable based on the authorized_keys
  entry used for auth. The first word of the comment after the key is used
  (must only have characters a-z A-Z 0-9 .,_-+@)
  Patch from Hans Harder, modified by Matt Johnston

- Let dbclient multihop mode be used with '-J'.
  Patch from Hans Harder

- Allow home-directory relative paths ~/path for various settings
  and command line options.
  *_PRIV_FILENAME DROPBEAR_PIDFILE SFTPSERVER_PATH MOTD_FILENAME
  Thanks to Begley Brothers Inc

  >> The default DROPBEAR_DEFAULT_CLI_AUTHKEY has now changed, it now needs
  a tilde prefix.

- LANG environment variable is carried over from the Dropbear server process
  From Maxim Kochetkov

- Add /usr/sbin and /sbin to $PATH when logging in as root.
  Patch from Raphaël Hertzog
  https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=903403

- Added client option "-o DisableTrivialAuth". It disallows a server immediately
  giving successful authentication (without presenting any password/pubkey prompt).
  This avoids a UI confusion issue where it may appear that the user is accepting
  a SSH agent prompt from their local machine, but are actually accepting a prompt
  sent immediately by the remote server.
  CVE-2021-36369 though the description there is a bit confused. It only applies
  to Dropbear as a client.
  Thanks to Manfred Kaiser from Austrian MilCERT

- Add -q client option to hide remote banner, from Hans Harder

- Add -e option to pass all server environment variables to child processes.
  This should be used with caution.
  Patch from Roland Vollgraf (github #118)

- >> Use DSCP for QoS traffic classes. Priority (tty) traffic is now set to
  AF21 "interactive". Previously TOS classes were used, they are not used by
  modern traffic classifiers. Non-tty traffic is left at default priority.

- >> Disable dh-group1 key exchange by default. It has been disabled server
  side by default since 2018.

- >> Removed Twofish cipher

Fixes:

- Fix flushing channel data when pty was allocated (github #85)
  Data wasn't completely transmitted at channel close.
  Reported and initial patch thanks to Yousong Zhou

- Dropbear now re-executes itself rather than just forking for each connection
  (only on Linux). This allows ASLR to randomise address space for each
  connection as a security mitigation. It should not have any visible impact
  - if there are any performance impacts in the wild please report it.

- Check authorized_keys permissions as the user, fixes NFS squash root.
  Patch from Chris Dragan (github #107)

- A missing home directory is now non-fatal, starting in / instead

- Fixed IPv6 [address]:port parsing for dbclient -b
  Reported by Fabio Molinari

- Improve error logging so that they are logged on the server rather than being
  sent to the client over the connection

- Max window size is increased to 10MB, more graceful fallback if it's invalid.

- Fix correctness of Dropbear's handling of global requests.
  Patch from Dirkjan Bussink

- Fix some small bugs found by fuzzers, null pointer dereference crash and leaks
  (post authentication)

- $HOME variable is used before /etc/passwd when expanding paths such as
  ~/.ssh/id_dropbear (for the client). Patch from Matt Robinson

- C89 build fixes from Guillaume Picquet

Infrastructure:

- Improvements to fuzzers. Added post-auth fuzzer, and a mutator that can
  handle the structure of SSH packet streams. Added cifuzz to run on commits
  and pull requests.
  Thanks to OSS-Fuzz for the tools/clusters and reward funding.

- Dropbear source tarballs generated by release.sh are now reproducible from a
  Git or Mercurial checkout, they will be identical on any system. Tested
  on ubuntu and macos.

- Added some integration testing using pytest. Currently this has tests
  for various channel handling edge cases, ASLR fork randomisation,
  dropbearconvert, and SSH_PUBKEYINFO

- Set up github actions. This runs the pytest suite and other checks.
  - build matrix includes c89, dropbearmulti, bundled libtom, macos, DEBUG_TRACE
  - test for configure script regeneration
  - build a tarball for external reproducibility
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants