Replace mod_growth_policy with multiplication-based one #1

Bulat-Ziganshin · 2017-12-18T03:02:47Z

Well-known technique to mix bits up and then extract some:

bucket_for_hash(hash) = (hash*CONST) >> SHIFT

CONST should be ~~even~~ odd and preferably prime number, constant through the set/map life. Value of SHIFT determined by the current buckets_count. Extra bonus is that on the hash table growth/reduce table entries arу projected into almost the same place, allowing f.e. incremental, multi-threaded implementation of the resizing operation. Taking into account that shift require 1 cpu cycle, multiplication is 3 cycles, but division is 20+ cycles, MUL-based hashing may be much faster than DIV-based one.

The text was updated successfully, but these errors were encountered:

Tessil · 2017-12-18T21:26:00Z

Thank you for the idea.

How do you calculate the SHIFT from the bucket_count? Wouldn't you end-up to just take the x most significant bits of the hash?

I know there is the "fast range" from Lemire that uses the multiplication+shift to map a value x into a range [0, n).

Instead of:

uint32_t reduce(uint32_t x, uint32_t N) {
  return x % N;
}

He proposes to use:

uint32_t reduce(uint32_t x, uint32_t N) {
  return ((uint64_t) x * (uint64_t) N) >> 32 ;
}

But the problem is that hashes that are close together will end-up in the same bucket. Not too much of a problem with a good hash function, but libstd++ uses an identity hash function for std::hash<int>. So if there is a lot of small or close integers (which is often the case) you will end-up with a lot of collisions. At first glance, your method will suffer of the same problems.

Bulat-Ziganshin · 2017-12-19T08:45:15Z

Yeah, there are many variations of this idea. Lemire's method economize one shift, but requires "wide multiplication" that may be tiny bit more expensive than "narrow multiplication". Although on 64-bit cpu, it may be implemented in 64-bit registers with the same MUL+SHIFT sequence as my code.

Now about distributions. Indeed, Lemire's method is equivalent to computing x / N that suffers on id hash function. But my code

allows to use arbitrary multiplication constant
extracts bits 32-N .. 31 from the result rather than bits 32 .. 31+N.

It's easy to see that with full-width 32-bit multiplication constant every bit of output will depend on every bit of input. Well, almost. Actually, the problem is exactly opposite to Lemire's hash - with my formula, lower bits of output doesn't depend on higher bits of input. Which may be "good enough" for most, but not all, practical purposes.

If we need more fair hash, we can use wide multiplication and then add higher and lower words of the result:

   res = u64(x) * u64(N);
   return u32(res>>32) + u32(res)

For even more fair hash, we can repeat it again and/or add some more rol/shift/add operations. Just to let you know - for arbitrary constant N, cost of computing exact x % N value is 3 wide multiplications plus a few simpler operations, so even my "fair hash" is still several times cheaper.

Alternatively, we can go with CRC hashing. It's very interesting math construction: CRC32(n) = (n*x^32) % p(x), where p(x) is a primitive binary polynomial of degree 32 and we compute remainder of polynomial division with GF(2) arithmetic. Yes, CRC is just another remainder, and it share some desirable properties with arithmetic remainder, namely each bit of output depends on each bit of input, and distribution of results is fair (moreover, afair, CRC32(u32) is a 1:1 relation, so it's ideally fair, guaranteed). Moreover, each CRC(n) is very different from CRC(n+1) that is also great for Robin Hood hashing.

Now, how to compute: SSE 4.2 cpus (i.e. 90% of current end-user computers) have u32 CRC32(u32, prev_crc==0) instruction which is as cheap as a single narrow MUL. In 64-bit mode, there is also u32 CRC32_64(u64, prev_crc==0) instruction that can process 64 input bits in a single step, and is equivalent to CRC32(u32(x), CRC32(x>>32)).

For a more portable code, CRC can be computed using tables. I.E., since CRC(N) is just remainder of some binary polynomial corresponding to binary digits of N, we can split N into shorter polynomials, get remainder from each sub-polynomial from a table, and then sum them up. Usually, four 256-entry (4 * 8 bits) tables are used, although we can use three tables instead, of 1024 + 2048 + 2048 entries (10+11+11 bits):

b0 = x>>24
b1 = (x>>16) & 0xFF
b2 = (x>>8) & 0xFF
b3 = x & 0xFF
return table0[b0] ^ table1[b1] ^ table2[b2] ^ table3[b3]

This requires 4 load+MOVZX operations, 3 LOAD+XOR and 1 LOAD. Still looks like faster than exact x % N computation while providing even a bit better bit mixing. But require 4 KB memory to store tables. Plus allows you to directly support 64-bit (and longer) input hashes with guaranteed fair distribution.

Tessil · 2017-12-20T09:47:53Z

The idea seems interesting.

I will see if I can add a new growth policy based on it and do some tests when I have more time (a bit busy with the end of the year). If you have time don't hesitate to make a pull request, the interface to implement a growth policy is quite simple (see https://github.com/Tessil/robin-map#growth-policy).

Tessil · 2017-12-31T11:18:17Z

What constant should be used for CONST (in 32 and 64 bits), do you have any good values in mind? You said in your first message "CONST should be even and preferably prime number" but I suppose you just meant prime as 2 is the only even prime.

Bulat-Ziganshin · 2017-12-31T11:43:39Z

It's typo, of course it should be odd. It shouldn't be even because in this case x*CONST will always have zero in at least one lowest bit. Primality is not important per se, but it's a simple way to ensure that CONST doesn't have [small] multipliers that can interfere with the following calculations.

I usually use 1234567891 as 31-bit prime number. Even better, you can borrow some from xxHash32 or xxHash64 - i know that Cyan is super-careful about details, so his numbers should be better than mine (i.e. have ~50/50 split between 1 and 0 bits). Use prime.cpp to generate more primes - f.e. you can start with some random-generated number having fair split between 0 and 1 bits and then find closest prime.

Bulat-Ziganshin · 2017-12-31T12:01:49Z

If you have time don't hesitate to make a pull request, the interface to implement a growth policy is quite simple

You are absolutely right that it's easy to implement other grow policies, it can be done in a few minutes. The real work is to benchmark the outcome, and to try various modifications of a basic idea. So, if you prefer, i can write several policies implementing variations of these ideas, but benchmarking them will be on you.

BTW, do you have test(s) measuring fairness of policy (rather than raw speed)? SMHasher tests should be (almost) directly applicable for this task.

Tessil · 2017-12-31T12:39:20Z

BTW, do you have test(s) measuring fairness of policy (rather than raw speed)?

No, the benchmarks I did were mainly geared toward comparing speed of collisions resolutions in hash tables (didn't check probes lengths and cache-misses formally). I didn't bother too much regarding the hash reductions and their distribution as the ones I use are fairly common

Bulat-Ziganshin · 2017-12-31T13:01:10Z

Overall, I think that the library should provide the following policies:

No hashing and growth factor=2^K. This policy just extracts N lower bits from the provided hash value - ideal for situations when client code already implements sophisticated hashing algorithm and the library just need to use the hash value as is.
No hashing and arbitrary growth factor. In this case, the policy should just multiply the hash value by the current hash table size using Lemire's formula. Again, we rely on client code to supply us with already well-mixed hash value.
Fair hashing and growth factor=2^K. Fair hashing may use CRC or multiplication with my formula. It should be the default policy, since it guarantees fair distribution even when user-provided hash values are unfair.
Fair hashing and arbitrary growth factor. Combines computations of two previous hashes.
Hardware-assisted CRC fair hashing. Very fast and 100% fair, but require SSE 4.2.
Combination of policies 5+2

Since all policies come in pairs, it will be great to combine them, i.e. a template CrcPolicy<float Factor=2> provides default Factor=2 for users that don't care about exact growth factor, and its code at compile-time chooses between employing lower bits of hash for Factor==2^K and use of Lemire's formula otherwise.

I believe that the default hashing policy should be the most reliable (fair) one, rather than the fastest one. This follows the "least surprise" principle.

Tessil · 2019-07-31T20:58:32Z

I'll close the issue and keep the power of two policy as default policy for now. In the end I didn't really have the time to test the different alternatives extensively. If someone makes a comparison of different growth policies, feel free to post the link here, I'd be curious to read it.

Tessil closed this as completed Jul 31, 2019

ktprime mentioned this issue Jun 30, 2022

Hash Map Insert Stuck in an infinite loop #52

Closed

wjakob mentioned this issue Apr 18, 2024

Add robin_map/set::erase_fast() method (fixes #75) #76

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Replace mod_growth_policy with multiplication-based one #1

Replace mod_growth_policy with multiplication-based one #1

Bulat-Ziganshin commented Dec 18, 2017 •

edited

Loading

Tessil commented Dec 18, 2017

Bulat-Ziganshin commented Dec 19, 2017 •

edited

Loading

Tessil commented Dec 20, 2017

Tessil commented Dec 31, 2017

Bulat-Ziganshin commented Dec 31, 2017 •

edited

Loading

Bulat-Ziganshin commented Dec 31, 2017 •

edited

Loading

Tessil commented Dec 31, 2017

Bulat-Ziganshin commented Dec 31, 2017

Tessil commented Jul 31, 2019

Replace mod_growth_policy with multiplication-based one #1

Replace mod_growth_policy with multiplication-based one #1

Comments

Bulat-Ziganshin commented Dec 18, 2017 • edited Loading

Tessil commented Dec 18, 2017

Bulat-Ziganshin commented Dec 19, 2017 • edited Loading

Tessil commented Dec 20, 2017

Tessil commented Dec 31, 2017

Bulat-Ziganshin commented Dec 31, 2017 • edited Loading

Bulat-Ziganshin commented Dec 31, 2017 • edited Loading

Tessil commented Dec 31, 2017

Bulat-Ziganshin commented Dec 31, 2017

Tessil commented Jul 31, 2019

Bulat-Ziganshin commented Dec 18, 2017 •

edited

Loading

Bulat-Ziganshin commented Dec 19, 2017 •

edited

Loading

Bulat-Ziganshin commented Dec 31, 2017 •

edited

Loading

Bulat-Ziganshin commented Dec 31, 2017 •

edited

Loading