Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Argon2 descriptions #23319

Merged
merged 1 commit into from
Dec 21, 2020
Merged

Fix Argon2 descriptions #23319

merged 1 commit into from
Dec 21, 2020

Conversation

MichaIng
Copy link
Member

@MichaIng MichaIng commented Oct 9, 2020

The threads option for the password_hash function does not define the maximum allowed number of CPU threads to be used by the hashing algorithm but the exact number of threads that is used.

Similarly the memory_cost option for the password_hash function does not define the maximum allowed memory to be used by the hashing algorithm, but the exact amount of memory that is used by the hashing table. The minimum value is 8 KiB per thread: ad60619#diff-8d8bf62389a253975b815495978348f9

Most importantly, the time_cost option for the password_hash function does not define the allowed time in seconds, but the number of iterations for the hash function!

If the minimum values are understood, the minimum values are used instead: ad60619#diff-8d8bf62389a253975b815495978348f9


For reference: #19023 (comment)

That is something that is misunderstood in very most documentations, including the PHP function documentation: https://www.php.net/manual/en/function.password-hash.php
And the defaults variable documentation: https://www.php.net/manual/en/password.constants.php
And even on Wikipedia and some other sources it is at least described in a way that can be misunderstood.

Only the RFC itself describes it correctly: https://wiki.php.net/rfc/argon2_password_hash
This includes the fact that time_cost means number of times the hash algorithm iterates and the fact that those numbers are no maxima but the exactly used threads, memory and iterations. The result and the ability to reproduce and compare hashes relies on the fact that those three numbers are exactly what they are.

I hope that some others can verify this, at best we check back with RFC author and based on that start to offer corrections for PHP docs and other places.


Related documentation change: nextcloud/documentation#5828

The threads option for the password_hash function does not define the maximum allowed number of CPU threads to be used by the hashing algorithm but the exact number of threads that is used.

Similarly the memory_cost option for the password_hash function does not define the maximum allowed memory to be used by the hashing algorithm, but the exact amount of memory that is used by the hashing table. The minimum value is 8 KiB per thread.

The time_cost option for the password_hash function does not define the allowed time in seconds, but the number of iterations for the hash function.

If the minimum values are understood, the minimum values are used instead.

Signed-off-by: MichaIng <micha@dietpi.com>
@MichaIng MichaIng added 3. to review Waiting for reviews pending documentation This pull request needs an associated documentation update labels Oct 9, 2020
@MichaIng MichaIng requested a review from blizzz October 9, 2020 18:39
@faily-bot
Copy link

faily-bot bot commented Oct 9, 2020

🤖 beep boop beep 🤖

Here are the logs for the failed build:

Status of 33918: failure

sqlite

Show full log
There were 2 warnings:

1) Test\Files\ViewTest::testRenameFailDeleteTargetKeepSource
Trying to configure method "writeStream" which cannot be configured because it does not exist, has not been specified, is final, or is static

2) Test\Files\ViewTest::testCopyFailDeleteTargetKeepSource
Trying to configure method "writeStream" which cannot be configured because it does not exist, has not been specified, is final, or is static

--

There was 1 failure:

1) TrashbinTest::testExpireOldFilesShared
Failed asserting that 3 is identical to 2.

/drone/src/apps/files_trashbin/tests/TrashbinTest.php:304
/drone/src/apps/files_trashbin/tests/TrashbinTest.php:287

mariadb10.1-php7.3

mariadb10.4-php7.4

mysql8.0-php7.4

mysql5.6-php7.3

postgres9-php7.3

Show full log
There were 2 warnings:

1) Test\Files\ViewTest::testRenameFailDeleteTargetKeepSource
Trying to configure method "writeStream" which cannot be configured because it does not exist, has not been specified, is final, or is static

2) Test\Files\ViewTest::testCopyFailDeleteTargetKeepSource
Trying to configure method "writeStream" which cannot be configured because it does not exist, has not been specified, is final, or is static

--

There was 1 failure:

1) TrashbinTest::testExpireOldFilesShared
Failed asserting that 3 is identical to 2.

/drone/src/apps/files_trashbin/tests/TrashbinTest.php:304
/drone/src/apps/files_trashbin/tests/TrashbinTest.php:287

postgres11-php7.4

@blizzz
Copy link
Member

blizzz commented Oct 14, 2020

@charlesportwoodii I understand you are the author of the PHP RFC on Argon2 https://wiki.php.net/rfc/argon2_password_hash

I take that the descriptions especially on the cost factors from the RFC are correct. However, there are some doubts on the Internet whether all factors are the maximum or the exact value, and specifically on time whether it is about the actual time or iterations. Maybe you can confirm that what the RFC states is indeed matching with the reality? :)

@charlesportwoodii
Copy link

charlesportwoodii commented Oct 14, 2020

@blizzz: @MichaIng assessment, and the description in the original RFC is accurate. The values correspond to the C libs t_cost, m_cost, and parallelism arguments. The t_cost factor is poorly named in both the library and PHP, but kept consistent for consistency's sake.

Argon2 Parameter Password Hash Option Name Description
t_cost time_cost Number of hash iterations
m_cost memory_cost Absolute memory that should be consumed
parallelism threads Exactly number of parallel threads to use

From the RFC:

All three values are integers. The memory cost represents the number of KiB that should be consumed during hashing. The default value is 1<<10, or 1024 KiB, or 1 MiB. The argon2 spec recommends setting the memory cost to a power of 2 when changing.

The time cost represents the number of times the hash algorithm will be run. And the thread parameter indicates the number of CPU threads that will be used during hashing.

In practice:

\password_hash('hello, world', PASSWORD_ARGON2ID, ['time_cost' =>  3, 'm_cost' => 1 << 18, 'threads' => 4]);

Will do Argon2ID with 3 iterations, consume 262144 KiB (268 MegaBytes), and use 4 CPU threads. You can observe this in top/htop or any other process monitoring tool pretty easily. As you can imagine this is a pretty fantastic way to kill your server, especially if you're processing 1000s of logins per minute.

The PHP Password hash defaults are pretty low to compensate for this, particularly to ensure that PHP can run on pretty much any platform, which is why it's critical to have a test script running on your end platform to target the 1000ms+ hashing time for Argon2id, otherwise you should just be using Bcrypt, which has better hardness at sub 500ms.

My recommendation for password hashing is:

  1. If you must use Argon2, use Argon2id and target a hashing time of 1000ms+
  2. In every other case, use bcrypt and target 500ns

In both cases, adjust the cost factors until you reach the desired timing.

The PHP doc team isn't very involved in the RFC process - they normally write the docs after the RFC is approved and closer to the release window for new features so it's likely the descriptions were carried over wrong from the RFC. I'd advise making a PR to the PHP Wiki to update the description. They're not wrong in that it is the maximum values that will be used but I can certainly see how that could be misleading to think it that implies "up to 1020 KiB" rather than 'it will use exactly 1024 KiB"

The only change I'd make to this PR in particular is that the memory cost must be a power of 2, hence why all the examples use bitwise shifting to calculate the value rather than plopping static KiB numbers in.

@MichaIng
Copy link
Member Author

MichaIng commented Oct 14, 2020

Many thanks for your clarification and recommendations.

I'm happy that my thoughts were not totally dumb at this topic and things indeed are as I thought make sense 😄.

Regarding that memory cost needs to be a power of 2, is that true for the size in KiB or for the size in bytes, since using 33 seems to work just fine:
EDIT: Ah this is always true for both 😄, however 33 KiB still works fine, so probably the PHP function internally changes it to the nearest power of 2?

2020-10-14 17:19:06 root@micha:/var/log# php -r 'echo password_hash("hello, world", PASSWORD_ARGON2ID, ["time_cost" =>  1, "memory_cost" => 33, "threads" => 4]) . "\n";'
$argon2id$v=19$m=33,t=1,p=4$WEk1ZzJBQ0NRZi53cTNjSw$NmOwksN0f3dk0KMWkBdi/MTUc9UMWY09Mdct/RzUNDM

Most interesting is the recommendation to use hashing times > 1 second. For me this would be just the upper limit for a seamless user experience.

@blizzz
This would be a reasonable check for the admin panel: Run a test hashing based on current settings and if it takes < 1 second, show a warning to recommend increasing memory cost (if possible on the system and use case) or time cost. Based on the current hashing time, even a specific recommendation for time cost could be printed, i.e. the lowest integer that would lead to >= 1 second when assuming linear increase (which is pretty much true). But an issue is that we cannot really know about the current system load that may have affected the hashing time.
EDIT: Although on a strong machine this recommendation would mean something very different compared to a weak machine, since it is not about the absolut strength of the hash but only compared to using bcrypt with the same hashing time (meaning "time" here 😉). While enabling parallelism decreases hashing time by moreless the factor of additional cores involved, it does not decrease the hashes strength of course. So probably the below is the better thing to do and recommend to switch hashing algorithm or adjust cost settings when Argon2 + <1s or bcrypt + >1s is used.

And probably there should be an option to disable Argon2 (fallback to bcrypt) if short times are wanted. Although I'm still not sure which tasks really include this hasher, login + changing user password, as well authentication tokens, i.e. all sort of client connections, other tasks?

@charlesportwoodii
Copy link

@MichaIng The 2^n KiB recommendation comes from the C lib. The underlying C library doesn't validate the input for 2^n though, hence the recommendation to use bit-wise shifting to calculate the value since you guarantee at 2^n value.

The 1s comes from the Argon2 authors https://twitter.com/terahashcorp/status/1155129705034653698. 1000ms should be the lower bound for any Argon2id hashing, if you're targeting less than that Bcrypt is objectively better in terms of hashing strength. If page load time for login is an important, just use Bcrypt with a 500ms target time and call it a day. It's a stronger and faster choice at 500ms

Paragonie has a refiner for determining optimal costs based upon your system assuming single-threaded applications (threads = 1 which sodium limits you to anyways if you're using the sodium implementation): https://github.com/paragonie/argon2-refiner which I think is pretty decent if you set the target time to 1025ms with 25ms of tolerance should get you good values for a given platform - it just takes a bit to run and you probably need to tune the memory to 2^n afterwards.

@blizzz
Copy link
Member

blizzz commented Oct 19, 2020

Thank you, Charles, for the insights!

Most interesting is the recommendation to use hashing times > 1 second. For me this would be just the upper limit for a seamless user experience.

Seconded! The >=1000ms is an absolute recommendation, so more threads/CPUs do not help here, I understand. In this case, switching back to bcrypt as default seems to be reasonable.

We have a hasher PHP API which is used in various places where passwords are necessary, e.g. Talk rooms, sharing, local users but there is also a use case around LDAP connection handling (though we can solve this differently).

@rullzer you'll want to know about this as well ^

@MichaIng
Copy link
Member Author

Reading through Twitter discussion basically underlines the recommendation to not use Argon2 for password hashing, but bcrypt instead. I ran a few tests with default PHP values on very different machines for Argon2 (m=65536,t=4,p=1) vs bcrypt hashing times:

  • RPi2: ~6 seconds vs ~0.7 seconds
  • Our VPS and my testing VM, both: ~0.7s vs ~0.15s

Both of course not representative but just to mention that in both cases Argon2 is not a good choice, as on RPi2 it is too slow for a nice user experience and on the VPS+VM it is too fast to have benefit over bcrypt. And there is no good way to automatically apply better settings, aside of doing a bunch of tests and prompt an admin panel warning, as those highly depend on the system, available CPU cores and performance, available memory (i.e. what is expected to be used by PHP) and tests even on very current server load, making all this a bid unreliable if not the admin runs a set of tests to optimise the options.

What I do not yet understand is why the >=1s recommendation is absolut when taking into account parallelism. I mean 4 threads can do one iteration through the same array four times faster. Of course it means that someone who wants to break the hash as well can test a string four times faster. But understanding the depth of hashing algorithms and their strength is going OT here 😄.
Sodium is btw not used but the native password hashing function: https://www.php.net/manual/en/book.password.php
So parallelism should work, if not manually via affinity, although I did not measure it within Nextcloud, only via PHP CLI.


Regarding the options descriptions. I'm not sure about the effect of using a memory cost value that is no power of 2, since the function returns a hash, but I can add the information that a power of two should be used. On hasher side, we could switch to bit-wise values or, to not break existing configs, override an invalid value with the nearest (or next lower) power of two + print a warning to logs, or such.

@MichaIng
Copy link
Member Author

I felt free to give this a milestone (+ vote for backport to next Nextcloud 20) and bump it a bid, since it is basically a security-related topic. Based on wrong information admins do wrong choices, which is horrible in case of password hashing. Switching back to bcrypt as default is another topic, but since we have Argon2 support now, at least the docs and comments about it should be correct, so that admins can do better choices.

This was referenced Dec 14, 2020
MichaIng added a commit to nextcloud/documentation that referenced this pull request Dec 18, 2020
Companion commit for: nextcloud/server#23319

Signed-off-by: MichaIng <micha@dietpi.com>
@MichaIng MichaIng removed the pending documentation This pull request needs an associated documentation update label Dec 18, 2020
@MichaIng
Copy link
Member Author

I opened a PR to change the documentation accordingly: nextcloud/documentation#5828

@rullzer rullzer merged commit 114b472 into master Dec 21, 2020
@rullzer rullzer deleted the fix/argon2-descriptions branch December 21, 2020 08:24
@rullzer
Copy link
Member

rullzer commented Dec 21, 2020

/backport to stable20

@MichaIng
Copy link
Member Author

/backport to stable19

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3. to review Waiting for reviews security
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants