-
Notifications
You must be signed in to change notification settings - Fork 115
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hash Map Insert Stuck in an infinite loop #52
Comments
Thank you for the report. From a first glance I have troubles to understand how I'll try to investigate more when I have some time. In the meantime could you eventually try the latest version and define the |
I really can't find how I'll try to investigate more as it's a worrying bug. Please let me know if you can eventually reproduce the problem in a self-contained example. Thanks. |
I have experienced the issue multiple times in the past which effectively prevents us from using this map implementation. I thought I filed an issue about this but looks like I haven't. It would be great if this is resolved. |
Thank you, I'll try to get a more in depth look then as it's really worrying if you also had the problem. I have tried to insert the keys mentioned above in every way possible + tried with some extra ones and I can't reproduce the problem. If any of you eventually has a reproducible self-contained example it'd be immensely useful. |
I can reproduce it, but it is practically impossible to share the code (its for out embeddable columnar store thing, that's not OSS). Certain SQL queries would cause this -- tried every sanitizer, nothing came up really (so almost certainly not a corruption). Had this issue with other programs (our application server, some other utilities etc) -- but I unfortunately can't share that codebase either. It is almost certainly not a matter of some memory corruption/buffer overrun issue though. |
Thank you very much for the useful infos. I'll look more into depth then as I currently can't understand how |
In our tests, we also saw a different stuck stack. In this case, Here is some gdb information:
Since the
We also tried to upgrade to latest version (v1.0.1), but can still see the above stuck stacks in our tests. |
Hi Tessil; I am from same team with Song. For us the bug occurs rarely and there is always an infinite loop. We suspected that maybe some memory corruption on our side may mess robin map memory and cause teh infinite loop and we spent already two weeks investigating this. What is specific to our case is that in a highly multithreaded scenario each thread creates, inserts and destroys teh map. We do guard with mutexes the operations on map as the datastructure itself is not thread safe. So now I wonder : do you in your tests also have this scenario where we create map/insert elements/delete map in a loop ; and maybe from lots of threads (each thread creates, inserts and destroy its own robin map). Maybe there is an issue on the delete part which I am thinking may not be heavily tested??? At this point we started replacing the robin map with STL unordered to see if we observe any hangs anymore :( |
For what it's worth, in cases where it fails, we don't erase any KVs. It's all insertions. It occurs when inserting elements. |
Also, we experimented with various many different hash maps implementations. None of them failed so it's perhaps safe to suggest that the issue's |
Thanks all for the information. I was not able to reproduce the problem but I looked carefully through the implementation. I couldn't find a problem in the insertion logic of the code but I noticed that the library can invoke some undefined behaviour with C++17. P0137R1 introduced a change in the object model in C++17 which now requires to I have fixed the problem with commit 4abcc97, could you try these latests changes? It may not be the reason of the bug you encounter but it would be tremendously useful for me to at least try as I can't reproduce the problem myself. I also added some extra assertions so running the code with Sorry for the caused inconveniences. |
@markpapadakis Do you think you can try this patch and confirm that it fixes your problem. Now I am in the middle of replacing and benchmarking with an alternative. Also for some weird reason for us I need to let the server run with a constant load for a week or more before I notice a query stuck in infinite loop inside robin map. |
@gabrieltanase42: I just tried this commit and still fails. |
I am with the same team as Gabi. We were not using c++17 when we observed the problem. We have only been able to reproduce this in multi-day tests of the integrated software stack. We lack a clean reproducer based on a simple set of inputs and some sequence of their presentation. We do use a lock to guard access to the map from multiple threads in the code where we are observing the problem. We have only observed this for relatively small maps with numerically ordered keys which should be presented more or less in order (there might be some races involved, but there is a guarding lock). We have also been using the robin map for several years in hash joins and have never observed an issue with that code. However, unless there is something very odd about the case where the map is behind a lock, we have to assume that we can encounter the same issue when building a hash index for a join. |
So I reviewed all the code of the library to check any eventual problem and I couldn't find anything really suspicious. One part that could be problematic in a pre-C++11 multi-threaded program is the static initialization in https://github.com/Tessil/robin-map/blob/master/include/tsl/robin_hash.h#L1590 But it's thread-safe in C++11 and forward and this static variable is never modified once initialized. There are no other non-const shared variables. Another slight doubt I have is the I have not been able to reproduce the bug in a non-threaded program. I'll try to run some tests in a multi-threaded one as it seems it's a common premise both @szhang0119 and @markpapadakis have and maybe the |
It'd be good to eventually test the map by replacing https://github.com/Tessil/robin-map/blob/master/include/tsl/robin_hash.h#L1591 with |
Tried to reproduce the bug in a multi-threaded test program that do insertions, erasures and maps creations but no luck unfortunately |
I work on the same team as Gabriel-- it's interesting to me (although I don't know what it means) that in the second bug report, from 13 days ago, the static Since the comparison involving
I don't have answers to any of these questions, unfortunately-- |
Yes, it's really strange. How did it end up to reach a value of To be sure, are all multi-threaded accesses and writes protected by a mutex? Even move, swap, copy, creation? |
IIRC it always failed in MT programs for us and it's always about maps protected with either an std::mutex or an std::shared_mutex. As I mentioned in a previous comment, no other hash map implementation I tried failed. |
So I tried to reproduce the bug with the following program that has multiple threads inserting, erasing, reading random values in multiple maps in parallel to try to reproduce the problem but to no avail. Also tried to use sequential values instead of random ones but still working fine. Both with clang 13.0 and g++ 11.3 with I you could eventually try to reproduce the problem with #include <tsl/robin_map.h>
#include <cstdlib>
#include <iostream>
#include <mutex>
#include <random>
#include <shared_mutex>
#include <thread>
const std::size_t nb_iterations = 100000000000;
const std::size_t nb_maps = 4;
std::vector<std::shared_mutex> mutexes(nb_maps);
std::vector<tsl::robin_map<std::uint64_t, int>> maps(nb_maps);
std::mutex stdout_mutex;
thread_local std::random_device rd;
thread_local std::mt19937 gen(rd());
void test() {
std::uniform_int_distribution<std::size_t> map_choice(0, 3);
std::uniform_int_distribution<std::uint64_t> value_gen(0, 10000000);
std::uniform_int_distribution<int> should_insert(0, 3);
std::uniform_int_distribution<int> should_erase(0, 7);
std::uniform_int_distribution<int> should_reset(0, 10000000);
std::uint64_t total = 0;
for (std::size_t i = 0; i < nb_iterations; i++) {
const std::uint64_t insert_val = value_gen(gen);
const std::uint64_t erase_val = value_gen(gen);
const std::uint64_t read_val = value_gen(gen);
const bool insert = should_insert(gen) == 0;
const bool erase = should_erase(gen) == 0;
const bool reset = should_reset(gen) == 0;
if (insert) {
const std::size_t imap = map_choice(gen);
std::unique_lock<std::shared_mutex> lock(mutexes[imap]);
maps[imap].insert({insert_val, 1});
}
if (erase) {
const std::size_t imap = map_choice(gen);
std::unique_lock<std::shared_mutex> lock(mutexes[imap]);
maps[imap].erase(erase_val);
}
{
const std::size_t imap = map_choice(gen);
std::shared_lock<std::shared_mutex> lock(mutexes[imap]);
auto it = maps[imap].find(read_val);
if (it != maps[imap].end()) {
total += 1;
}
}
if (reset) {
const std::size_t imap = map_choice(gen);
std::unique_lock<std::shared_mutex> lock(mutexes[imap]);
tsl::robin_map<std::uint64_t, int> empty;
maps[imap].swap(empty);
total = 0;
}
if (i % 10000000ull == 0) {
std::unique_lock<std::mutex> lock(stdout_mutex);
std::cout << i << ": " << total << std::endl;
}
}
}
int main(int, char**) {
std::size_t nthreads = 16;
std::vector<std::thread> threads(nthreads);
for (std::size_t i = 0; i < threads.size(); i++) {
threads[i] = std::thread(test);
}
for (std::size_t i = 0; i < threads.size(); i++) {
threads[i].join();
}
} |
I also tried to reproduce with something simple but no luck. Here is my test in case it may help somebody:
|
So I tried to reproduce it again with both scripts, ran it for 24 hours with random values and a few hours with sequential values using 32 threads and couldn't reproduce the problem. I re-read the code again and can't find where it could come from outside of a potential race condition or memory corruption. Does the bug happen after moving, swapping, copying a map, serialization/deserialization or anything like that? If anyone encounters the problem, has more information or a way to reproduce, please let me know. One way to help would be to enable assertions and define For now I marked the issue as cannot reproduce as there isn't much more I can do. |
I can reproduce it now in my test(gcc only) and test code https://github.com/ktprime/emhash/blob/master/bench/btest.cpp //download my git emhash and run test with gcc 7.5 tsl_robin_map: Consecutive insert: 538 ms (s=0, size=2000000) Program received signal SIGABRT, Aborted. |
The error seems unrelated (not an infinite loop) and more like a lack of memory when rehashing (hence the |
32GB server. it's a bad hash function cause many rehash ? |
It could be due to the
You can thus try with another |
@Tessil I just ran again this service/program that fails consistently (as described in other comments). I switched from a CitHash variant to xxHash, and even tried a differ GP, but it still fails (effectively, rehashes indefinitely, until it runs out of memory): template <typename KT, typename VT>
using agg_fast_map = tsl::robin_map<KT, VT
, std::hash<KT>
, std::equal_to<KT>
, std::allocator<std::pair<KT, VT>>
, not std::is_arithmetic_v<KT>
, tsl::rh::mod_growth_policy<>>; |
Thanks. So I checked and added a try/catch on the With the power-of-two growth there's a bucket that has to manage 456(!) collisions. With a prime growth, the maximum is 57 collisions for one bucket. The problem is that the robin-hood collision resolution keeps track of the distance from the ideal bucket in a variable which has a limit in size and can't store unlimited distances. So when the distance reaches I'll try with CityHash and xxHash, not sure how they behave with integers. Note though that from my understanding this problem has nothing to do with the infinite loop issue and is a kind of expected behaviour with a bad hash function (I checked the cause, and it's effectively because the |
Tessil, today we were able to create a test that can reproduce infinite loop in a deterministic way. Like every time we run a benchmark the program hangs (spinning in the loop below). We flipped again from robin map to std map and things work perfectly. In this situation however the robin map has a relatively large number of elements. I am looking now at the previous comments and I see that you mention some potential issues and that the user should try different hash functions etc. Probably we could try these but in my mind we could still hit this unfortunate situation where code spins forever in this method: `template
}` I asked permission with my team and if you have time one of these days we could have an interactive session where we can look at the code inside debugger. In this case the map is from uint64_t to bool (tsl::robin_map<gui_t, bool> ); The Hash function is the default one; |
So after I enable asserts , the program runs for like 30 mins inside that loop and it asserts the following: TestSuites: /home/igtanase/brazil-pkg-cache/packages/TessilRobinMap/TessilRobinMap-1.0.x.344.0/AL2_x86_64/DEV.STD.PTHREAD/build/include/tsl/robin_hash.h:251: tsl::detail_robin_hash::bucket_entry<ValueType, StoreHash>::value_type& tsl::detail_robin_hash::bucket_entry<ValueTyp That corresponds to the following function:
|
The bad hash issue is different and is unrelated to the problem you mention. It is when the collision chain keeps being higher than It seems
Sorry, I would prefer to avoid looking into closed-source code for legal reasons. Could you try to compile the code with clang address and undefined sanitizers (and eventually thread and memory ones)? Or just run it through Valgrind for a first pass? What kind of operations are done on the map in the benchmark? Is the map moved or copied? Is it eventually serialized/deserialized? As I see a mention of I reopened the ticket. |
I'll keep digging more. Here are few more details. We do have the code split in multiple .so libs. The robin_map is allocated in one .so and it is passed as argument to a function that is implemented in a a different .so. The reason I mentioned this is because at one point I tried to replace robin with abseil hash and that effort failed because of this exact reason ( abseil/abseil-cpp#1193 abseil/abseil-cpp#834) With regards to multithreading. In this scenario is much simpler. The map is accessed by one thread only. I am doing two operations. Find and insert (key, value). I did run with address sanitizer flag enabled. did'n print out anything. |
Thanks for the info. Do you know the stacktrace of the failing assert in |
Do you think you can prepare a print_info function for me that I can invoke inside that tsl_rh_assert() ? |
The problem is that some of the critical infos may be in the As the bug occurs with mono-thread insertions and finds only (no deletions, map copy/move/swap, serialization or other) and is deterministic, would it eventually be possible to have a compressed text file with all the inserted |
I just realized we are also using clear() to erase the map; Essentially is a BFS algorithm. And we are using two maps; One is a set of input vertices. During the kernel in question we traverse this input map and we compute an output map with new vertices. We need the map to make sure we won;t visit a vertex twice. After the kernel is over, the output map becomes the new input map and the old input map is erased; then we invoke the kernel again; stack:: the version of the code is 1.0.1 but it is prior to you adding those std::launder
I also print the state of the hash map:
So most likely the insert before this find reached the 32768 size (power of two) and the resizing may mess up the internals; Here are the variables in the last find_impl:
|
Thank you for all the info. The I'll take a deeper look tomorrow. |
I think I got the trace of operations to get it to expose the issue. you can replay the trace with something like this:
you need to use # as separator. Now the values in each line are like this: E1228 22:53:38.103422 37107 spmspvKernels.h:276] %%%0x7fa0c523b280#insert#11529778670445002752#1#1 the name of the operation (clear, find, insert, reassign), followed by Key (uint64_t) followed by value (always 1 in this case) followed by the size of the map before the operation is performed. There is some info before the operation name but ignore that; You can unzip : gzip -d log.gz Tomorrow I'll try to reproduce this outside our framework, with the latest code release; |
Standalone test also hangs so we have a reproducer :)
|
…LIMIT during insertion (fix issue #52) During insertion a check was done on dist_from_ideal_bucket to be sure it doesn't becomes bigger than DIST_FROM_IDEAL_BUCKET_LIMIT but this was only done during the robin swap. A check should also be done beforehand if we find an empty bucket otherwise the variable could overflow and lead to bugs. This commit adds this check.
Thank you very much for the example and your help. I was able to successfully reproduce the bug and there was effectively a problem in the code, I'm really sorry for that. I create a PR for the fix if you could try it out. I still have to add some tests before merging it. The problem was in the end due to a high collisions chain due to a poor hash function which the code didn't manage properly. I did check for potential overflows of Could you check if the PR fixes your issue? I would recommend though to change the hash function as unfortunately GCC and clang use an identity function for integer types for I will try to create a Thanks again for your help and patience. |
Thx for your help with this. I'll definitely check your fix in a minute. I guess our luck (christmass present) is that we found a trace to reproduce it. Now I am hoping that this fix also fixes other people's issues. @markpapadakis can you also give it a try with the new branch that @Tessil mentions (fix_issue_52). @Tessil we do use the hash map in multiple scenarios and we do have our own hash functions for different situations. Our first thoutgh was also to change the hash function when we bumped into this latest bug but that would have only masked the problem till the next crash. Here is a link to a hashing function that will work better than std::hash https://web.archive.org/web/20071223173210/http://www.concentric.net/~Ttwang/tech/inthash.htm
|
Some initial testing shows that the code is not crashing anymore. Next week I'll run more in depth tests |
Yes, that was really useful. I did consider the For the robin-map v2 I'm eventually considering to just saturate the distance to distance_type next_distance(distance_type distance) const noexcept {
// To optimize
if (distance == std::numeric_limits<distance_type>::max()) {
return distance;
}
return distance + 1;
}
template <class K>
const_iterator find_impl(const K& key, std::size_t hash) const {
...
while (dist_from_ideal_bucket <= m_buckets[ibucket].dist_from_ideal_bucket()) {
if (compare_keys(KeySelect()(m_buckets[ibucket].value()), key))) {
return const_iterator(m_buckets + ibucket);
}
ibucket = next_bucket(ibucket);
dist_from_ideal_bucket = next_distance(dist_from_ideal_bucket);
}
...
} During the insertion and erasure robin swap, we would have to pay attention to use the non-saturated real distance (which would mean potentially rehashing some values) instead of using the stored one. It would avoid the excessive resizes in case of catastrophical collision chains but performance will probably be reduced a bit. Need to check the performance impact and if it's worth it for such a corner case (if this case occurs, it means the hash function is quite poor). |
…LIMIT during insertion (fix issue #52) During insertion a check was done on dist_from_ideal_bucket to be sure it doesn't becomes bigger than DIST_FROM_IDEAL_BUCKET_LIMIT but this was only done during the robin swap. A check should also be done beforehand if we find an empty bucket otherwise the variable could overflow and lead to bugs. This commit adds this check.
I merged the PR. Don't hesitate to let me know if you still encounter some problems even after this fix. I will create a new release in the next few days and close the PR if everything is working right. |
I created the new release with the bugfix. I strongly recommend to update to this new version. I will close the issue for now, let me know if the bug still occurs, we can then reopen it. Note that you may still end up with |
Hi, Tessil,
I tried the new version on my previous testcase, and I still got the OOM.
The OOM occurs with a simple hash function and 136875 distinct input value
pairs, however when I use std::unordered_map or use a different hash
function, no OOM.
I attached the testcase in this email.
#include <iostream>
#include <unordered_map>
#include "robin_map.h"
struct KeyEQ {
bool operator()(const std::pair<uint32_t, uint32_t>&r1, const
std::pair<uint32_t, uint32_t> &r2) const {
if (r1.first == r2.first && r1.second == r2.second) return true;
else return false;
}
};
struct KeyHasher {
std::size_t operator()(const std::pair<uint32_t, uint32_t> &key) const {
using std::hash;
using std::size_t;
size_t res = 17;
res = 31*res + std::hash<int32_t>()(key.first);
res = 31*res + std::hash<int32_t>()(key.second);
return res;
}
};
struct KeyHasher2 {
std::size_t operator()(const std::pair<uint32_t, uint32_t> &key) const {
using std::hash;
using std::size_t;
size_t res = 17;
res = res + std::hash<int32_t>()(key.first);
res = res + std::hash<int32_t>()(key.second);
res = res * 2654435761;
return res;
}
};
tsl::robin_map<std::pair<uint32_t, uint32_t>, uint32_t, KeyHasher, KeyEQ>
robin_map9; // OOM occurs!!!
//tsl::robin_map<std::pair<uint32_t, uint32_t>, uint32_t, KeyHasher2,
KeyEQ> robin_map9; // OK
std::unordered_map<std::pair<uint32_t, uint32_t>, uint32_t, KeyHasher,
KeyEQ> hash_map9; // OK
//std::unordered_map<std::pair<uint32_t, uint32_t>, uint32_t, KeyHasher2,
KeyEQ> hash_map9; // OK
static void testCollision()
{
size_t hash_total0 = 0;
size_t hash_max_collisions0 = 0;
size_t robin_total0 = 0;
size_t robin_max_collisions0 = 0;
hash_map9.clear();
robin_map9.clear();
for (unsigned int i = 2450816; i < 2450816 + 1825; i++) {
for (unsigned int j = 1; j < 76; j++) {
std::pair<uint32_t, uint32_t> p = {i, j};
if (hash_map9.find(p) != hash_map9.end()) {
hash_map9[p]++;
hash_total0++;
if (hash_map9[p] > hash_max_collisions0) hash_max_collisions0 =
hash_map9[p];
} else {
hash_map9.insert({p, 1});
}
if (robin_map9.find(p) != robin_map9.end()) {
robin_map9[p]++;
robin_total0++;
if (robin_map9[p] > robin_max_collisions0) robin_max_collisions0 =
robin_map9[p];
} else {
robin_map9.insert({p, 1});
}
}
}
std::cout << "hash_map9 without applying any MASK!!! Total collisions: " <<
hash_total0 << ", Max bucket collisions: " << hash_max_collisions0 << "\n";
std::cout << "robin_map9 without applying any MASK!!! Total collisions: "
<< robin_total0 << ", Max bucket collisions: " << robin_max_collisions0 <<
"\n";
std::cout << "std::unordered_map table size: " << hash_map9.size() << ",
max_size: " << hash_map9.max_size() << ", bucket count: " <<
hash_map9.bucket_count() << ", max_bucket_count: " <<
hash_map9.max_bucket_count() << ", load_factor: " <<
hash_map9.load_factor() << ", max_load_factor: " <<
hash_map9.max_load_factor() << "\n";
std::cout << "robin_map9 table size: " << robin_map9.size() << ", max_size:
" << robin_map9.max_size() << ", bucket count: " <<
robin_map9.bucket_count() << ", max_bucket_count: " <<
robin_map9.max_bucket_count() << ", load_factor: " <<
robin_map9.load_factor() << ", max_load_factor: " <<
robin_map9.max_load_factor() << "\n";
return;
}
int main()
{
testCollision();
}
…On Thu, Jan 5, 2023 at 3:04 PM Thibaut Goetghebuer-Planchon < ***@***.***> wrote:
Closed #52 <#52> as completed.
—
Reply to this email directly, view it on GitHub
<#52 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/ANZ6ZE4ZWD37FANAUDAZO5LWQ5HQPANCNFSM5VDWI7WA>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
See #52 (comment) comment. To avoid overflowing the A multiplicative hash is not recommended for open-addressing hash as it may lead to a very high number of collisions depending on the input distribution. The Changing the hash function is the best course of action here, as even if I implement #52 (comment) you'll have terrible performances with the current hash function. If you have a |
Thanks for the reply.
…On Wed, Jan 11, 2023, 12:46 AM Thibaut Goetghebuer-Planchon < ***@***.***> wrote:
See #52 (comment)
<#52 (comment)>
comment. To avoid overflowing the dist_from_ideal_bucket, the map is
rehashed when it reaches the DIST_FROM_IDEAL_BUCKET_LIMIT limit (8192).
A multiplicative hash is not recommended for open-addressing hash as it
may lead to a very high number of collisions depending on the input
distribution. The KeyHasher you use will only output odd hash for any
even input.
Changing the hash function is the best course of action here, as even if I
implement #52 (comment)
<#52 (comment)>
you'll have terrible performances with the current hash function. If you
have a dist_from_ideal_bucket of 8192 for example (or more) for one
bucket, it means that the map has to go through 8192 buckets before finding
the value in this bucket.
—
Reply to this email directly, view it on GitHub
<#52 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ANZ6ZE3B726LYQG35OFROCDWRZXQBANCNFSM5VDWI7WA>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
Changes: https://github.com/Tessil/robin-map/releases/tag/v1.2.0 Changes: Tessil/robin-map@784245b...d37a410 * Tessil/robin-map@d37a410: Update CMake tsl-robin-map to v1.2.1 * Tessil/robin-map@68ff732: Keep rehashing if dist_from_ideal_bucket is > DIST_FROM_IDEAL_BUCKET_LIMIT during insertion (fix issue Tessil/robin-map#52) * Tessil/robin-map@6775231: Disable CMake install rule if robin_map is used as subproject (Tessil/robin-map#60) * Tessil/robin-map@57c9b65: Replace depecrated std::aligned_storage since C++23 by alignas (Tessil/robin-map#61) * Tessil/robin-map@d3131e4: Raise DIST_FROM_IDEAL_BUCKET_LIMIT to 8192 * Tessil/robin-map@f8e0f67: Add assertion to make sure that static_empty_bucket_ptr is empty * Tessil/robin-map@ac1e3d8: Add some extra assertions for clarity and ease of debug * Tessil/robin-map@f1e7457: Clear and shrink the moved hash table in the move operator to be coherent with the move constructor * Tessil/robin-map@4abcc97: When using C++17, std::launder the reinterpreted pointer from std::aligned_storage to adapt to the change of object model introduced in P0137R1. Fix potential undefined behaviour. * Tessil/robin-map@c77f80b: Update link to Conan package * Tessil/robin-map@c7595ba: Apply clang-format --style=Google * Tessil/robin-map@37e94dc: When exceptions are disabled, only print the error message when defined(TSL_DEBUG) instead of !defined(NDEBUG) * Tessil/robin-map@59a3b7d: Fix test_extreme_bucket_count_value_construction test on some platforms * Tessil/robin-map@0c3c858: Check that bucket_count doesn't exceed max_bucket_count() after the constructor initialization The robin-map fast map implementation is used in our p/invoke dispatch mechanism (38aa561). The new version is a recommended upgrade. It doesn't appear to contain fixes that affect us, but it's better to be safe than sorry, right? :)
Hi, there,
We used the robin-map/v0.6.1, and we observed the robin map stuck in an infinite loop here in insert_value_impl method.
We created
tsl::robin_map<uint64_t, DataBlock*>
, and we inserted monotonically increasing keys like:0xc000000000001
,0xc000000000002
, etc. We do not have concurrent modifications on the robin map.We observed this issue both on Intel and ARM machines, but ARM machines showed up more often. However, we do not have a good reproducer, and we only observed this issue in our production fleet.
Could any one help us debug a bit to see if anything pops out? Any suggestions would be appreciated!
Thanks very much!
Here are some debug information we gathered.
The robin map has 8 buckets, but the number of elements are only 5 (although all 8 keys look valid to us). However, we observed a really large
m_load_threshold = 18446744069414584320
, and all 8 buckets show withm_dist_from_ideal_bucket = 32767
.We suspected somehow
m_load_threshold
overflowed, causing the robin map would not expand the size, and hence there are no empty buckets in the map anymore.Here are the list of keys in the map:
The text was updated successfully, but these errors were encountered: