-
Notifications
You must be signed in to change notification settings - Fork 115
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replace mod_growth_policy with multiplication-based one #1
Comments
Thank you for the idea. How do you calculate the I know there is the "fast range" from Lemire that uses the multiplication+shift to map a value x into a range [0, n). Instead of: uint32_t reduce(uint32_t x, uint32_t N) {
return x % N;
} He proposes to use: uint32_t reduce(uint32_t x, uint32_t N) {
return ((uint64_t) x * (uint64_t) N) >> 32 ;
} But the problem is that hashes that are close together will end-up in the same bucket. Not too much of a problem with a good hash function, but libstd++ uses an identity hash function for |
Yeah, there are many variations of this idea. Lemire's method economize one shift, but requires "wide multiplication" that may be tiny bit more expensive than "narrow multiplication". Although on 64-bit cpu, it may be implemented in 64-bit registers with the same MUL+SHIFT sequence as my code. Now about distributions. Indeed, Lemire's method is equivalent to computing
It's easy to see that with full-width 32-bit multiplication constant every bit of output will depend on every bit of input. Well, almost. Actually, the problem is exactly opposite to Lemire's hash - with my formula, lower bits of output doesn't depend on higher bits of input. Which may be "good enough" for most, but not all, practical purposes. If we need more fair hash, we can use wide multiplication and then add higher and lower words of the result: res = u64(x) * u64(N);
return u32(res>>32) + u32(res) For even more fair hash, we can repeat it again and/or add some more rol/shift/add operations. Just to let you know - for arbitrary constant N, cost of computing exact Alternatively, we can go with CRC hashing. It's very interesting math construction: Now, how to compute: SSE 4.2 cpus (i.e. 90% of current end-user computers) have For a more portable code, CRC can be computed using tables. I.E., since CRC(N) is just remainder of some binary polynomial corresponding to binary digits of N, we can split N into shorter polynomials, get remainder from each sub-polynomial from a table, and then sum them up. Usually, four 256-entry (4 * 8 bits) tables are used, although we can use three tables instead, of 1024 + 2048 + 2048 entries (10+11+11 bits): b0 = x>>24
b1 = (x>>16) & 0xFF
b2 = (x>>8) & 0xFF
b3 = x & 0xFF
return table0[b0] ^ table1[b1] ^ table2[b2] ^ table3[b3] This requires 4 load+MOVZX operations, 3 LOAD+XOR and 1 LOAD. Still looks like faster than exact |
The idea seems interesting. I will see if I can add a new growth policy based on it and do some tests when I have more time (a bit busy with the end of the year). If you have time don't hesitate to make a pull request, the interface to implement a growth policy is quite simple (see https://github.com/Tessil/robin-map#growth-policy). |
What constant should be used for CONST (in 32 and 64 bits), do you have any good values in mind? You said in your first message |
It's typo, of course it should be odd. It shouldn't be even because in this case I usually use 1234567891 as 31-bit prime number. Even better, you can borrow some from xxHash32 or xxHash64 - i know that Cyan is super-careful about details, so his numbers should be better than mine (i.e. have ~50/50 split between 1 and 0 bits). Use prime.cpp to generate more primes - f.e. you can start with some random-generated number having fair split between 0 and 1 bits and then find closest prime. |
You are absolutely right that it's easy to implement other grow policies, it can be done in a few minutes. The real work is to benchmark the outcome, and to try various modifications of a basic idea. So, if you prefer, i can write several policies implementing variations of these ideas, but benchmarking them will be on you. BTW, do you have test(s) measuring fairness of policy (rather than raw speed)? SMHasher tests should be (almost) directly applicable for this task. |
No, the benchmarks I did were mainly geared toward comparing speed of collisions resolutions in hash tables (didn't check probes lengths and cache-misses formally). I didn't bother too much regarding the hash reductions and their distribution as the ones I use are fairly common |
Overall, I think that the library should provide the following policies:
Since all policies come in pairs, it will be great to combine them, i.e. a template I believe that the default hashing policy should be the most reliable (fair) one, rather than the fastest one. This follows the "least surprise" principle. |
I'll close the issue and keep the power of two policy as default policy for now. In the end I didn't really have the time to test the different alternatives extensively. If someone makes a comparison of different growth policies, feel free to post the link here, I'd be curious to read it. |
Well-known technique to mix bits up and then extract some:
CONST should be
evenodd and preferably prime number, constant through the set/map life. Value of SHIFT determined by the current buckets_count. Extra bonus is that on the hash table growth/reduce table entries arу projected into almost the same place, allowing f.e. incremental, multi-threaded implementation of the resizing operation. Taking into account that shift require 1 cpu cycle, multiplication is 3 cycles, but division is 20+ cycles, MUL-based hashing may be much faster than DIV-based one.The text was updated successfully, but these errors were encountered: