Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Performance]Add concurrent cpu id hashmap #5241

Merged
merged 28 commits into from
Feb 9, 2023
Merged
Show file tree
Hide file tree
Changes from 24 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
195 changes: 195 additions & 0 deletions src/array/cpu/id_hash_map.cc
Original file line number Diff line number Diff line change
@@ -0,0 +1,195 @@
/**
* Copyright (c) latest by Contributors
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

latest or 2023? I don't know about it.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2023 looks more compatible

* @file array/cpu/id_hash_map.cc
* @brief Class about id hash map
*/

#include "id_hash_map.h"

#include <dgl/array.h>
#include <dgl/runtime/device_api.h>
#include <dgl/runtime/parallel_for.h>

#include <cmath>
#include <numeric>

using namespace dgl::runtime;

namespace {
static constexpr int64_t kEmptyKey = -1;
static constexpr int kGrainSize = 256;
} // namespace

namespace dgl {
namespace aten {

template <typename IdType>
IdType IdHashMap<IdType>::CompareAndSwap(
IdType* ptr, IdType old_val, IdType new_val) {
#ifdef _MSC_VER
if (sizeof(IdType) == 4) {
return _InterlockedCompareExchange(
reinterpret_cast<LONG*>(ptr), new_val, old_val);
} else if (sizeof(IdType) == 8) {
return _InterlockedCompareExchange64(
reinterpret_cast<LONGLONG*>(ptr), new_val, old_val);
} else {
LOG(FATAL) << "ID can only be int32 or int64";
}
#elif __GNUC__ // _MSC_VER
return __sync_val_compare_and_swap(ptr, old_val, new_val);
#else // _MSC_VER
BarclayII marked this conversation as resolved.
Show resolved Hide resolved
#error "CompareAndSwap is not supported on this platform."
#endif // _MSC_VER
}

template <typename IdType>
IdHashMap<IdType>::IdHashMap() : hmap_(nullptr), mask_(0) {}

template <typename IdType>
IdArray IdHashMap<IdType>::Init(const IdArray ids) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IdArray&.
Same for other occurrences.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FIxed

CHECK_EQ(ids.defined(), true);
const IdType* ids_data = ids.Ptr<IdType>();
const size_t num = static_cast<size_t>(ids->shape[0]);
CHECK_GT(num, 0);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The check is to make sure that the ids is 1 dimention? Add a comment to clarify it.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added

size_t capcacity = 1;
capcacity = capcacity << static_cast<size_t>(1 + std::log2(num * 3));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Define a MARCO about this formula in the top. (after kGrainSize), and doc it clearly that this formula is experience based.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about preferring function/variable(probably constexpr) than MACRO?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Constexpr cannot be used here as num is uncertain at compiler-time. Use an inline function for this.

mask_ = static_cast<IdType>(capcacity - 1);

DGLContext ctx = DGLContext{kDGLCPU, 0};
auto device = DeviceAPI::Get(ctx);
hmap_ = static_cast<Mapping*>(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, i would prefer to have less memory ops, esp. having an exposed pointer to handle memory allocation and release. It might be error-prone and hard to debug in the future.

Consider using:

  • NDArray
  • Vector
  • unique_ptr

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use unique_ptr

device->AllocWorkspace(ctx, sizeof(Mapping) * capcacity));
memset(hmap_, -1, sizeof(Mapping) * capcacity);

IdArray unique_ids = NewIdArray(num, ctx, sizeof(IdType) * 8);
return FillInIds(num, ids_data, unique_ids);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The parameter list of FillInIds is pretty weird, I'd suggest remove FillInIds method, and just copy-paste the code in the Init method, and add a comment say, this code block is to fill the ids.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moved for fix

}

template <typename IdType>
IdArray IdHashMap<IdType>::Map(const IdArray ids) const {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Map -> MapIds

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed

CHECK_EQ(ids.defined(), true);
const IdType* ids_data = ids.Ptr<IdType>();
const size_t len = static_cast<size_t>(ids->shape[0]);
CHECK_GT(len, 0);

DGLContext ctx = DGLContext{kDGLCPU, 0};
IdArray new_ids = NewIdArray(len, ctx, sizeof(IdType) * 8);
IdType* values_data = new_ids.Ptr<IdType>();

parallel_for(0, len, kGrainSize, [&](int64_t s, int64_t e) {
for (int64_t i = s; i < e; i++) {
values_data[i] = MapId(ids_data[i]);
}
});
return new_ids;
}

template <typename IdType>
IdHashMap<IdType>::~IdHashMap() {
if (hmap_ != nullptr) {
DGLContext ctx = DGLContext{kDGLCPU, 0};
auto device = DeviceAPI::Get(ctx);
device->FreeWorkspace(ctx, hmap_);
}
}

template <typename IdType>
IdArray IdHashMap<IdType>::FillInIds(
size_t num_ids, const IdType* ids_data, IdArray unique_ids) {
// Use `int16_t` instead of `bool` here. As vector<bool> is an exception
// for whom updating different elements from different threads is unsafe.
// see https://en.cppreference.com/w/cpp/container#Thread_safety.
std::vector<int16_t> valid(num_ids);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is related to the allocator reuse comment above. I feel this place can also use AllocWorkspace since num_ids can be large. Correct me if I'm wrong though.

Copy link
Collaborator Author

@peizhou001 peizhou001 Feb 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It make sense, use BoolArray instead.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems BoolArray is unsafe in multi-thread environment, change back to vector.

auto thread_num = compute_num_threads(0, num_ids, kGrainSize);
std::vector<size_t> block_offset(thread_num + 1, 0);
IdType* unique_ids_data = unique_ids.Ptr<IdType>();

// Insert all elements in this loop.
parallel_for(0, num_ids, kGrainSize, [&](int64_t s, int64_t e) {
size_t count = 0;
for (int64_t i = s; i < e; i++) {
Insert(ids_data[i], &valid, i);
count += valid[i];
}
block_offset[omp_get_thread_num() + 1] = count;
});

// Get ExclusiveSum of each block.
std::partial_sum(
block_offset.begin() + 1, block_offset.end(), block_offset.begin() + 1);
unique_ids->shape[0] = block_offset.back();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is assigning shape values OK to do? I feel it's very dangerous. It's safer to compute the shape first and then allocate the unique_ids array.

Copy link
Collaborator Author

@peizhou001 peizhou001 Feb 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks safe because I find some other usages like this and the memory will not leaked. We can not allocate ahead because the size is known after the data is filled in.


// Get unique array from ids and set value for hash map.
parallel_for(0, num_ids, kGrainSize, [&](int64_t s, int64_t e) {
auto tid = omp_get_thread_num();
auto pos = block_offset[tid];
for (int64_t i = s; i < e; i++) {
if (valid[i]) {
unique_ids_data[pos] = ids_data[i];
Set(ids_data[i], pos);
pos = pos + 1;
}
}
});
return unique_ids;
}

template <typename IdType>
inline void IdHashMap<IdType>::Next(IdType* pos, IdType* delta) const {
// Use Quadric probing.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Quadric" -> "quadratic"?

*pos = (*pos + (*delta) * (*delta)) & mask_;
*delta = *delta + 1;
}

template <typename IdType>
IdType IdHashMap<IdType>::MapId(IdType id) const {
IdType pos = (id & mask_);
IdType delta = 1;
IdType empty_key = static_cast<IdType>(kEmptyKey);
while (hmap_[pos].key != empty_key && hmap_[pos].key != id) {
Next(&pos, &delta);
}
return hmap_[pos].value;
}

template <typename IdType>
void IdHashMap<IdType>::Insert(
IdType id, std::vector<int16_t>* valid, size_t index) {
IdType pos = (id & mask_);
IdType delta = 1;
while (!AttemptInsertAt(pos, id, valid, index)) {
Next(&pos, &delta);
}
}

template <typename IdType>
void IdHashMap<IdType>::Set(IdType key, IdType value) {
IdType pos = (key & mask_);
IdType delta = 1;
while (hmap_[pos].key != key) {
Next(&pos, &delta);
}

hmap_[pos].value = value;
}

template <typename IdType>
bool IdHashMap<IdType>::AttemptInsertAt(
int64_t pos, IdType key, std::vector<int16_t>* valid, size_t index) {
IdType empty_key = static_cast<IdType>(kEmptyKey);
IdType old_val = CompareAndSwap(&(hmap_[pos].key), empty_key, key);

if (old_val != empty_key && old_val != key) {
return false;
} else {
if (old_val == empty_key) (*valid)[index] = true;
return true;
}
}

template class IdHashMap<int32_t>;
template class IdHashMap<int64_t>;

} // namespace aten
} // namespace dgl
145 changes: 145 additions & 0 deletions src/array/cpu/id_hash_map.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,145 @@
/**
* Copyright (c) latest by Contributors
* @file array/cpu/id_hash_map.h
* @brief Class about id hash map
*/

#ifndef DGL_ARRAY_CPU_ID_HASH_MAP_H_
#define DGL_ARRAY_CPU_ID_HASH_MAP_H_

#include <dgl/aten/types.h>

#include <vector>

#ifdef _MSC_VER
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you still need this include in the .h file after the CompareAndSwap moved to .cc file?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moved to cc for fix

#include <intrin.h>
#endif // _MSC_VER

namespace dgl {
namespace aten {

/**
* @brief A CPU targeted hashmap for mapping duplicate and non-consecutive ids
* in the provided array to unique and consecutive ones. It utilizes
*multi-threading to accelerate the insert and search speed. Currently it is
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

space after *.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

*only designed to be used in `ToBlockCpu` for optimizing.
*
* The hashmap should be used in two phases. With the first being creating the
* hashmap, and then init it with an id array. After that, searching any old ids
* to get the mappings according to your need.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't quite understand this explanation. Looks like it has three phases (creation, init, and search).

Do you want to say that querying before inserting all elements is not supported? If so you can just say that in the limitation above.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed for remove misunderstanding.

*
* For example, for an array A with following entries:
* [98, 98, 100, 99, 97, 99, 101, 100, 102]
* Create the hashmap H with:
* `H = CpuIdHashMap()` (1)
* And Init it with:
* `U = H.Init(A)` (2) (U is an id array used to store the unqiue
* ids in A).
* Then U should be (U is not exclusive as the element order is not
* guaranteed to be steady):
* [98, 100, 99, 97, 101, 102]
* And the hashmap should generate following mappings:
* * [
* {key: 98, value: 0},
* {key: 100, value: 1},
* {key: 99, value: 2},
* {key: 97, value: 3},
* {key: 101, value: 4},
* {key: 102, value: 5}
* ]
* Search the hashmap with array I=[98, 99, 102]:
* R = H.Map(I) (3)
* R should be:
* [0, 2, 5]
**/
template <typename IdType>
class IdHashMap {
public:
/**
* @brief An entry in the hashtable.
*/
struct Mapping {
/**
* @brief The ID of the item inserted.
*/
IdType key;
/**
* @brief The value of the item inserted.
*/
IdType value;
};

/**
* @brief Cross platform CAS operation.
* It is an atomic operation that compares the contents of a memory
* location with a given value and, only if they are the same, modifies
* the contents of that memory location to a new given value.
*
* @param ptr The pointer to the object to test and modify .
* @param old_val The value expected to be found in `ptr`.
* @param new_val The value to store in `ptr` if it is as expected.
*
* @return Old value pointed by the `ptr`.
*/
static IdType CompareAndSwap(IdType* ptr, IdType old_val, IdType new_val);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is the right place for this function since CAS can be applicable to other components as well. Maybe move it to an inline function in dgl::runtime namespace.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a public static method so it can be reused in other modulars.


IdHashMap();

IdHashMap(const IdHashMap& other) = delete;
IdHashMap& operator=(const IdHashMap& other) = delete;

/**
* @brief Init the hashmap with an array of ids.
* Firstly allocating the memeory and init the entire space with empty key.
* And then insert the items in `ids` concurrently to generate the
* mappings, in passing returning the unique ids in `ids`.
*
* @param ids The array of ids to be inserted as keys.
*
* @return Unique ids for the input `ids`.
*/
IdArray Init(const IdArray ids);

/**
* @brief Find the mappings of given keys.
*
* @param ids The keys to map for.
*
* @return Mapping results corresponding to `ids`.
*/
IdArray Map(const IdArray ids) const;

~IdHashMap();

private:
void Next(IdType* pos, IdType* delta) const;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need brief explanation on the private methods as well. (Same for other occurrences)
Here we also need to explain what pos and delta means as they are in/out parameters.

@param[in,out] pos ...

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added


IdType MapId(const IdType id) const;

IdArray FillInIds(size_t num_ids, const IdType* ids_data, IdArray unique_ids);

void Insert(IdType id, std::vector<int16_t>* valid, size_t index);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to explain what is valid. Also why a pointer to vector rather than int16_t *? I don't think the size of the vector will change or anything.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add note and use NdArray(The elements is bool, don't use naive pointer to avoid memory operation) instead.


/**
* @warning Key must exist.
*/
void Set(IdType key, IdType value);

bool AttemptInsertAt(
int64_t pos, IdType key, std::vector<int16_t>* valid, size_t index);

private:
/**
* @brief Array used to save all elelemnts in the hash table.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: elements.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Array used to save all elelemnts in the hash table. -->
Hash maps which is used to store all elements.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed

*/
Mapping* hmap_;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hash_map_

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed for fix

/**
* @brief Mask assisted to get the position for a key.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be helpful to describe how the mask works mathematically, or link a doc.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

*/
IdType mask_;
};

} // namespace aten
} // namespace dgl

#endif // DGL_ARRAY_CPU_ID_HASH_MAP_H_
Loading