-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Performance]Add concurrent cpu id hashmap #5241
Conversation
To trigger regression tests:
|
tests/cpp/test_cpu_id_hash_map.cc
Outdated
void ConstructRandomSet(size_t size, size_t range, | ||
std::vector<IdType>& id_vec) { | ||
id_vec.resize(size); | ||
std::srand(42); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why choose 42
instead of std::time(nullptr)
or std::random_device
or dgl/random.h
? std::rand()
returns int
and is it ok to cast to int64_t
? will value that larger than int32_t
be generated?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use std::time(nullptr)
instead and static_cast
will make sure the overflow wounldn't happen.
tests/cpp/test_cpu_id_hash_map.cc
Outdated
IdType* unique_id_data = unique_ids.Ptr<IdType>(); | ||
EXPECT_EQ(id_set.size(), unique_num); | ||
|
||
for (size_t i = 0; i < unique_num; i++) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it it possible to check in parallel which may benefit in large size map test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure, use parallel_for for acceleration.
tests/cpp/test_cpu_id_hash_map.cc
Outdated
|
||
IdArray new_ids = NewIdArray(unique_num, CTX, sizeof(IdType) * 8); | ||
IdType default_val = -1; | ||
id_map.Map(unique_ids, default_val, new_ids); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seems not all the pubic member functions are called/verified explicitly in test, I think we'd better test all public APIs or make them private?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actual only Map
and Init
will be used outside, change FillInIds -> fillInIds
to avoid misunderstanding.
tests/cpp/test_cpu_id_hash_map.cc
Outdated
std::set<IdType> id_set(id_vec.begin(), id_vec.end()); | ||
IdArray ids = VecToIdArray(id_vec, sizeof(IdType) * 8, CTX); | ||
IdArray unique_ids = NewIdArray(size, CTX, sizeof(IdType) * 8); | ||
CpuIdHashMap<IdType> id_map(CTX); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This class is named as Cpuxxx
, do we need additional DGLContext
argument? it can be GPU
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, It is only used in CPU. But there is another member in DGLContext
called device_id
, I'm not sure if it can always set to 0. If so, the argument can be removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think device_id
is used for cuda device choose only
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, let me remove this parameter
src/array/cpu/id_hash_map.h
Outdated
|
||
#include <vector> | ||
|
||
#ifdef _MSC_VER |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you still need this include in the .h file after the CompareAndSwap moved to .cc file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
moved to cc for fix
src/array/cpu/id_hash_map.h
Outdated
/** | ||
* @brief A CPU targeted hashmap for mapping duplicate and non-consecutive ids | ||
* in the provided array to unique and consecutive ones. It utilizes | ||
*multi-threading to accelerate the insert and search speed. Currently it is |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
space after *.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
src/array/cpu/id_hash_map.h
Outdated
|
||
private: | ||
/** | ||
* @brief Array used to save all elelemnts in the hash table. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo: elements.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
src/array/cpu/id_hash_map.h
Outdated
*/ | ||
Mapping* hmap_; | ||
/** | ||
* @brief Mask assisted to get the position for a key. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be helpful to describe how the mask works mathematically, or link a doc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
src/array/cpu/id_hash_map.h
Outdated
/** | ||
* @brief Array used to save all elelemnts in the hash table. | ||
*/ | ||
Mapping* hmap_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hash_map_
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changed for fix
src/array/cpu/id_hash_map.h
Outdated
|
||
private: | ||
/** | ||
* @brief Array used to save all elelemnts in the hash table. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Array used to save all elelemnts in the hash table. -->
Hash maps which is used to store all elements.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changed
src/array/cpu/id_hash_map.cc
Outdated
memset(hmap_, -1, sizeof(Mapping) * capcacity); | ||
|
||
IdArray unique_ids = NewIdArray(num, ctx, sizeof(IdType) * 8); | ||
return FillInIds(num, ids_data, unique_ids); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The parameter list of FillInIds is pretty weird, I'd suggest remove FillInIds method, and just copy-paste the code in the Init method, and add a comment say, this code block is to fill the ids.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
moved for fix
src/array/cpu/id_hash_map.cc
Outdated
CHECK_EQ(ids.defined(), true); | ||
const IdType* ids_data = ids.Ptr<IdType>(); | ||
const size_t num = static_cast<size_t>(ids->shape[0]); | ||
CHECK_GT(num, 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The check is to make sure that the ids is 1 dimention? Add a comment to clarify it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added
src/array/cpu/id_hash_map.cc
Outdated
const size_t num = static_cast<size_t>(ids->shape[0]); | ||
CHECK_GT(num, 0); | ||
size_t capcacity = 1; | ||
capcacity = capcacity << static_cast<size_t>(1 + std::log2(num * 3)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Define a MARCO about this formula in the top. (after kGrainSize), and doc it clearly that this formula is experience based.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how about preferring function/variable(probably constexpr) than MACRO?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Constexpr cannot be used here as num
is uncertain at compiler-time. Use an inline function for this.
src/array/cpu/id_hash_map.cc
Outdated
|
||
DGLContext ctx = DGLContext{kDGLCPU, 0}; | ||
auto device = DeviceAPI::Get(ctx); | ||
hmap_ = static_cast<Mapping*>( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general, i would prefer to have less memory ops, esp. having an exposed pointer to handle memory allocation and release. It might be error-prone and hard to debug in the future.
Consider using:
- NDArray
- Vector
- unique_ptr
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use unique_ptr
src/array/cpu/id_hash_map.cc
Outdated
@@ -0,0 +1,195 @@ | |||
/** | |||
* Copyright (c) latest by Contributors |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
latest
or 2023
? I don't know about it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2023 looks more compatible
src/array/cpu/id_hash_map.cc
Outdated
} | ||
|
||
template <typename IdType> | ||
IdArray IdHashMap<IdType>::Map(const IdArray ids) const { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Map -> MapIds
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changed
CHECK_EQ(ids.defined(), true); | ||
const IdType* ids_data = ids.Ptr<IdType>(); | ||
const size_t num_ids = static_cast<size_t>(ids->shape[0]); | ||
// Make sure `ids` is not 0 dim. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What will happen if the ids has 2 dim?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should crash. But as the input has been identified as an IdArray, so in general we only check if it is empty(0-dim)
src/array/cpu/id_hash_map.cc
Outdated
const size_t num_ids = static_cast<size_t>(ids->shape[0]); | ||
// Make sure `ids` is not 0 dim. | ||
CHECK_GT(num_ids, 0); | ||
size_t capcacity = GetMapSize(num_ids); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo: capacity
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
src/array/cpu/id_hash_map.cc
Outdated
IdArray IdHashMap<IdType>::MapIds(const IdArray ids) const { | ||
CHECK_EQ(ids.defined(), true); | ||
const IdType* ids_data = ids.Ptr<IdType>(); | ||
const size_t len = static_cast<size_t>(ids->shape[0]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since num_ids was used in L74, so let's also use num_ids here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changed
return new_ids; | ||
} | ||
|
||
template <typename IdType> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure we need to explicitly code this, remove if it is unnecessary to be added.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed
_TestIdMap<int64_t, 1, 10>(); | ||
_TestIdMap<int32_t, 1000, 500000>(); | ||
_TestIdMap<int64_t, 1000, 500000>(); | ||
_TestIdMap<int32_t, 50000, 1000000>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which makes it a good test case by increasing the size of the IDArray?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When data size increase, multi thread functions are more vulnerable to mistakes. So add data column to test to test its robust.
Overall, the code is in a very good shape now, thanks for the great work! (Approved from @frozenbugs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor suggestions.
} | ||
}; | ||
hash_map_ = {nullptr, deleter}; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel I can understand the reason why we are reusing DGL's allocator (since the hashmap is usually large) but I don't know whether there is any benefit performance-wise. In general it's better to put your justification on why this is necessary in the code comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have a discussion before, the benefit of using a uniform memory allocator are listed below:
- It is more friendly for memory reuse.
- One allocator may ask a large volume amount from OS, if another allocator(e.g STL) also ask a large amount. It could cause either memory waste or frequently system call. Or in some corner cases, the second request may fail because of OOM.
src/array/cpu/id_hash_map.cc
Outdated
} | ||
|
||
template <typename IdType> | ||
IdArray IdHashMap<IdType>::Init(const IdArray ids) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IdArray&
.
Same for other occurrences.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FIxed
// Use `int16_t` instead of `bool`. As vector<bool> is an exception | ||
// for whom updating different elements from different threads is unsafe. | ||
// see https://en.cppreference.com/w/cpp/container#Thread_safety. | ||
std::vector<int16_t> valid(num_ids); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is related to the allocator reuse comment above. I feel this place can also use AllocWorkspace
since num_ids
can be large. Correct me if I'm wrong though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It make sense, use BoolArray
instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems BoolArray
is unsafe in multi-thread environment, change back to vector.
// Get ExclusiveSum of each block. | ||
std::partial_sum( | ||
block_offset.begin() + 1, block_offset.end(), block_offset.begin() + 1); | ||
unique_ids->shape[0] = block_offset.back(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is assigning shape values OK to do? I feel it's very dangerous. It's safer to compute the shape first and then allocate the unique_ids
array.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks safe because I find some other usages like this and the memory will not leaked. We can not allocate ahead because the size is known after the data is filled in.
* | ||
* @return Old value pointed by the `ptr`. | ||
*/ | ||
static IdType CompareAndSwap(IdType* ptr, IdType old_val, IdType new_val); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is the right place for this function since CAS can be applicable to other components as well. Maybe move it to an inline function in dgl::runtime
namespace.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is a public static method so it can be reused in other modulars.
IdArray MapIds(const IdArray ids) const; | ||
|
||
private: | ||
void Next(IdType* pos, IdType* delta) const; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need brief explanation on the private methods as well. (Same for other occurrences)
Here we also need to explain what pos
and delta
means as they are in/out parameters.
@param[in,out] pos ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added
|
||
IdType MapId(const IdType id) const; | ||
|
||
void Insert(IdType id, std::vector<int16_t>* valid, size_t index); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to explain what is valid
. Also why a pointer to vector rather than int16_t *
? I don't think the size of the vector will change or anything.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add note and use NdArray(The elements is bool, don't use naive pointer to avoid memory operation) instead.
src/array/cpu/id_hash_map.h
Outdated
* @brief A CPU targeted hashmap for mapping duplicate and non-consecutive ids | ||
* in the provided array to unique and consecutive ones. It utilizes | ||
* multi-threading to accelerate the insert and search speed. Currently it is | ||
* only designed to be used in `ToBlockCpu` for optimizing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
... only designed to be used to optimize `ToBlockCpu`, so it does
not support key deletion.
You can add other limitations as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added
src/array/cpu/id_hash_map.h
Outdated
* The hashmap should be used in two phases. With the first being creating the | ||
* hashmap, and then init it with an id array. After that, searching any old ids | ||
* to get the mappings according to your need. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't quite understand this explanation. Looks like it has three phases (creation, init, and search).
Do you want to say that querying before inserting all elements is not supported? If so you can just say that in the limitation above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changed for remove misunderstanding.
/** | ||
* @brief Hash maps which is used to store all elements. | ||
*/ | ||
std::unique_ptr<Mapping[], std::function<void(Mapping*)>> hash_map_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why choose Mapping[]
instead of Mapping*
? what's the benefit of using smart pointer instead of raw pointer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is a specialization of unique_ptr
, []
is only supported with T[]
.
Add Id hash map
Add Id hash map
Description
This PR add a new class
CpuIdHashMap
, which is a concurrent id hash map. Currently It is specified to be used inToBlockCPU
for optimization. It maps an array of Ids which is duplicate and non-contiguous to an array which is unique and contiguous. And also return the unique elements in original array. The public method used outside are:Checklist
Please feel free to remove inapplicable items for your PR.
Changes