Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Performance]Add concurrent cpu id hashmap #5241

Merged
merged 28 commits into from
Feb 9, 2023

Conversation

peizhou001
Copy link
Collaborator

@peizhou001 peizhou001 commented Jan 31, 2023

Description

This PR add a new class CpuIdHashMap, which is a concurrent id hash map. Currently It is specified to be used in ToBlockCPU for optimization. It maps an array of Ids which is duplicate and non-contiguous to an array which is unique and contiguous. And also return the unique elements in original array. The public method used outside are:

  1. Init
  2. Map

Checklist

Please feel free to remove inapplicable items for your PR.

  • The PR title starts with [$CATEGORY] (such as [NN], [Model], [Doc], [Feature]])
  • I've leverage the tools to beautify the python and c++ code.
  • The PR is complete and small, read the Google eng practice (CL equals to PR) to understand more about small PR. In DGL, we consider PRs with less than 200 lines of core code change are small (example, test and documentation could be exempted).
  • All changes have test coverage
  • Code is well-documented
  • To the best of my knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change
  • Related issue is referred in this PR
  • If the PR is for a new model/paper, I've updated the example index here.

Changes

@dgl-bot
Copy link
Collaborator

dgl-bot commented Jan 31, 2023

To trigger regression tests:

  • @dgl-bot run [instance-type] [which tests] [compare-with-branch];
    For example: @dgl-bot run g4dn.4xlarge all dmlc/master or @dgl-bot run c5.9xlarge kernel,api dmlc/master

@dgl-bot
Copy link
Collaborator

dgl-bot commented Jan 31, 2023

Commit ID: 4162c8ca3c925636dc259cc554dc79e0f8cad6e0

Build ID: 1

Status: ❌ CI test failed in Stage [Lint Check].

Report path: link

Full logs path: link

@dgl-bot
Copy link
Collaborator

dgl-bot commented Jan 31, 2023

Commit ID: ef5a84a447b5e33ef1d6c6738571a092d4595074

Build ID: 2

Status: ❌ CI test failed in Stage [Lint Check].

Report path: link

Full logs path: link

@dgl-bot
Copy link
Collaborator

dgl-bot commented Jan 31, 2023

Commit ID: 62ebfa3f4ee59e48e73264d71a3a56742fbd999c

Build ID: 3

Status: ❌ CI test failed in Stage [Lint Check].

Report path: link

Full logs path: link

@dgl-bot
Copy link
Collaborator

dgl-bot commented Jan 31, 2023

Commit ID: 371c1ab

Build ID: 4

Status: ❌ CI test failed in Stage [Lint Check].

Report path: link

Full logs path: link

void ConstructRandomSet(size_t size, size_t range,
std::vector<IdType>& id_vec) {
id_vec.resize(size);
std::srand(42);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why choose 42 instead of std::time(nullptr) or std::random_device or dgl/random.h? std::rand() returns int and is it ok to cast to int64_t? will value that larger than int32_t be generated?

Copy link
Collaborator Author

@peizhou001 peizhou001 Feb 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use std::time(nullptr) instead and static_cast will make sure the overflow wounldn't happen.

IdType* unique_id_data = unique_ids.Ptr<IdType>();
EXPECT_EQ(id_set.size(), unique_num);

for (size_t i = 0; i < unique_num; i++) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it it possible to check in parallel which may benefit in large size map test?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, use parallel_for for acceleration.


IdArray new_ids = NewIdArray(unique_num, CTX, sizeof(IdType) * 8);
IdType default_val = -1;
id_map.Map(unique_ids, default_val, new_ids);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems not all the pubic member functions are called/verified explicitly in test, I think we'd better test all public APIs or make them private?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actual only Map and Init will be used outside, change FillInIds -> fillInIds to avoid misunderstanding.

std::set<IdType> id_set(id_vec.begin(), id_vec.end());
IdArray ids = VecToIdArray(id_vec, sizeof(IdType) * 8, CTX);
IdArray unique_ids = NewIdArray(size, CTX, sizeof(IdType) * 8);
CpuIdHashMap<IdType> id_map(CTX);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This class is named as Cpuxxx, do we need additional DGLContext argument? it can be GPU?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, It is only used in CPU. But there is another member in DGLContext called device_id, I'm not sure if it can always set to 0. If so, the argument can be removed.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think device_id is used for cuda device choose only

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, let me remove this parameter

tests/cpp/test_cpu_id_hash_map.cc Outdated Show resolved Hide resolved
@peizhou001 peizhou001 closed this Feb 1, 2023
@peizhou001 peizhou001 reopened this Feb 1, 2023
@peizhou001 peizhou001 marked this pull request as draft February 1, 2023 03:47
@dgl-bot
Copy link
Collaborator

dgl-bot commented Feb 6, 2023

Commit ID: a930e624af6ea748bcfe7366b20b9356422cedbf

Build ID: 21

Status: ❌ CI test failed in Stage [C++ CPU (Win64)].

Report path: link

Full logs path: link

@dgl-bot
Copy link
Collaborator

dgl-bot commented Feb 6, 2023

Commit ID: 24b21994da8821b9771f30cc1db0a71c792e4e77

Build ID: 22

Status: ❌ CI test failed in Stage [C++ CPU (Win64)].

Report path: link

Full logs path: link

@dgl-bot
Copy link
Collaborator

dgl-bot commented Feb 6, 2023

Commit ID: 1c38d8adae6952b5e603359d0cc34303df233616

Build ID: 23

Status: ❌ CI test failed in Stage [C++ CPU (Win64)].

Report path: link

Full logs path: link


#include <vector>

#ifdef _MSC_VER
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you still need this include in the .h file after the CompareAndSwap moved to .cc file?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moved to cc for fix

/**
* @brief A CPU targeted hashmap for mapping duplicate and non-consecutive ids
* in the provided array to unique and consecutive ones. It utilizes
*multi-threading to accelerate the insert and search speed. Currently it is
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

space after *.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed


private:
/**
* @brief Array used to save all elelemnts in the hash table.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: elements.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

*/
Mapping* hmap_;
/**
* @brief Mask assisted to get the position for a key.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be helpful to describe how the mask works mathematically, or link a doc.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

/**
* @brief Array used to save all elelemnts in the hash table.
*/
Mapping* hmap_;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hash_map_

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed for fix


private:
/**
* @brief Array used to save all elelemnts in the hash table.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Array used to save all elelemnts in the hash table. -->
Hash maps which is used to store all elements.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed

memset(hmap_, -1, sizeof(Mapping) * capcacity);

IdArray unique_ids = NewIdArray(num, ctx, sizeof(IdType) * 8);
return FillInIds(num, ids_data, unique_ids);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The parameter list of FillInIds is pretty weird, I'd suggest remove FillInIds method, and just copy-paste the code in the Init method, and add a comment say, this code block is to fill the ids.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moved for fix

CHECK_EQ(ids.defined(), true);
const IdType* ids_data = ids.Ptr<IdType>();
const size_t num = static_cast<size_t>(ids->shape[0]);
CHECK_GT(num, 0);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The check is to make sure that the ids is 1 dimention? Add a comment to clarify it.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added

const size_t num = static_cast<size_t>(ids->shape[0]);
CHECK_GT(num, 0);
size_t capcacity = 1;
capcacity = capcacity << static_cast<size_t>(1 + std::log2(num * 3));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Define a MARCO about this formula in the top. (after kGrainSize), and doc it clearly that this formula is experience based.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about preferring function/variable(probably constexpr) than MACRO?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Constexpr cannot be used here as num is uncertain at compiler-time. Use an inline function for this.


DGLContext ctx = DGLContext{kDGLCPU, 0};
auto device = DeviceAPI::Get(ctx);
hmap_ = static_cast<Mapping*>(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, i would prefer to have less memory ops, esp. having an exposed pointer to handle memory allocation and release. It might be error-prone and hard to debug in the future.

Consider using:

  • NDArray
  • Vector
  • unique_ptr

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use unique_ptr

@@ -0,0 +1,195 @@
/**
* Copyright (c) latest by Contributors
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

latest or 2023? I don't know about it.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2023 looks more compatible

}

template <typename IdType>
IdArray IdHashMap<IdType>::Map(const IdArray ids) const {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Map -> MapIds

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed

@dgl-bot
Copy link
Collaborator

dgl-bot commented Feb 6, 2023

Commit ID: e271e3a5129d8f85533b0f55278e446ee44c57fd

Build ID: 24

Status: ❌ CI test failed in Stage [C++ CPU (Win64)].

Report path: link

Full logs path: link

CHECK_EQ(ids.defined(), true);
const IdType* ids_data = ids.Ptr<IdType>();
const size_t num_ids = static_cast<size_t>(ids->shape[0]);
// Make sure `ids` is not 0 dim.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What will happen if the ids has 2 dim?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should crash. But as the input has been identified as an IdArray, so in general we only check if it is empty(0-dim)

const size_t num_ids = static_cast<size_t>(ids->shape[0]);
// Make sure `ids` is not 0 dim.
CHECK_GT(num_ids, 0);
size_t capcacity = GetMapSize(num_ids);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: capacity

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

IdArray IdHashMap<IdType>::MapIds(const IdArray ids) const {
CHECK_EQ(ids.defined(), true);
const IdType* ids_data = ids.Ptr<IdType>();
const size_t len = static_cast<size_t>(ids->shape[0]);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since num_ids was used in L74, so let's also use num_ids here.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed

return new_ids;
}

template <typename IdType>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure we need to explicitly code this, remove if it is unnecessary to be added.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed

_TestIdMap<int64_t, 1, 10>();
_TestIdMap<int32_t, 1000, 500000>();
_TestIdMap<int64_t, 1000, 500000>();
_TestIdMap<int32_t, 50000, 1000000>();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which makes it a good test case by increasing the size of the IDArray?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When data size increase, multi thread functions are more vulnerable to mistakes. So add data column to test to test its robust.

@frozenbugs
Copy link
Collaborator

Overall, the code is in a very good shape now, thanks for the great work!
Feel free to merge after approved by @BarclayII or @jermainewang who has move experience in dgl.

(Approved from @frozenbugs)

@dgl-bot
Copy link
Collaborator

dgl-bot commented Feb 7, 2023

Commit ID: 93c71fabcb796d20a247b75a099adfd454ff180b

Build ID: 25

Status: ✅ CI test succeeded

Report path: link

Full logs path: link

Copy link
Collaborator

@BarclayII BarclayII left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor suggestions.

}
};
hash_map_ = {nullptr, deleter};
}
Copy link
Collaborator

@BarclayII BarclayII Feb 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel I can understand the reason why we are reusing DGL's allocator (since the hashmap is usually large) but I don't know whether there is any benefit performance-wise. In general it's better to put your justification on why this is necessary in the code comments.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have a discussion before, the benefit of using a uniform memory allocator are listed below:

  1. It is more friendly for memory reuse.
  2. One allocator may ask a large volume amount from OS, if another allocator(e.g STL) also ask a large amount. It could cause either memory waste or frequently system call. Or in some corner cases, the second request may fail because of OOM.

src/array/cpu/id_hash_map.cc Show resolved Hide resolved
}

template <typename IdType>
IdArray IdHashMap<IdType>::Init(const IdArray ids) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IdArray&.
Same for other occurrences.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FIxed

// Use `int16_t` instead of `bool`. As vector<bool> is an exception
// for whom updating different elements from different threads is unsafe.
// see https://en.cppreference.com/w/cpp/container#Thread_safety.
std::vector<int16_t> valid(num_ids);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is related to the allocator reuse comment above. I feel this place can also use AllocWorkspace since num_ids can be large. Correct me if I'm wrong though.

Copy link
Collaborator Author

@peizhou001 peizhou001 Feb 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It make sense, use BoolArray instead.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems BoolArray is unsafe in multi-thread environment, change back to vector.

// Get ExclusiveSum of each block.
std::partial_sum(
block_offset.begin() + 1, block_offset.end(), block_offset.begin() + 1);
unique_ids->shape[0] = block_offset.back();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is assigning shape values OK to do? I feel it's very dangerous. It's safer to compute the shape first and then allocate the unique_ids array.

Copy link
Collaborator Author

@peizhou001 peizhou001 Feb 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks safe because I find some other usages like this and the memory will not leaked. We can not allocate ahead because the size is known after the data is filled in.

*
* @return Old value pointed by the `ptr`.
*/
static IdType CompareAndSwap(IdType* ptr, IdType old_val, IdType new_val);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is the right place for this function since CAS can be applicable to other components as well. Maybe move it to an inline function in dgl::runtime namespace.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a public static method so it can be reused in other modulars.

IdArray MapIds(const IdArray ids) const;

private:
void Next(IdType* pos, IdType* delta) const;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need brief explanation on the private methods as well. (Same for other occurrences)
Here we also need to explain what pos and delta means as they are in/out parameters.

@param[in,out] pos ...

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added


IdType MapId(const IdType id) const;

void Insert(IdType id, std::vector<int16_t>* valid, size_t index);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to explain what is valid. Also why a pointer to vector rather than int16_t *? I don't think the size of the vector will change or anything.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add note and use NdArray(The elements is bool, don't use naive pointer to avoid memory operation) instead.

* @brief A CPU targeted hashmap for mapping duplicate and non-consecutive ids
* in the provided array to unique and consecutive ones. It utilizes
* multi-threading to accelerate the insert and search speed. Currently it is
* only designed to be used in `ToBlockCpu` for optimizing.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... only designed to be used to optimize `ToBlockCpu`, so it does
not support key deletion.

You can add other limitations as well.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added

Comment on lines 25 to 27
* The hashmap should be used in two phases. With the first being creating the
* hashmap, and then init it with an id array. After that, searching any old ids
* to get the mappings according to your need.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't quite understand this explanation. Looks like it has three phases (creation, init, and search).

Do you want to say that querying before inserting all elements is not supported? If so you can just say that in the limitation above.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed for remove misunderstanding.

/**
* @brief Hash maps which is used to store all elements.
*/
std::unique_ptr<Mapping[], std::function<void(Mapping*)>> hash_map_;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why choose Mapping[] instead of Mapping*? what's the benefit of using smart pointer instead of raw pointer?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a specialization of unique_ptr, [] is only supported with T[].

@dgl-bot
Copy link
Collaborator

dgl-bot commented Feb 8, 2023

Commit ID: 436ec3bcbe2381517ad2e41bc7ac4be63b95fe45

Build ID: 26

Status: ✅ CI test succeeded

Report path: link

Full logs path: link

@dgl-bot
Copy link
Collaborator

dgl-bot commented Feb 8, 2023

Commit ID: ae04d31f024c32ebdeb15b25824bc2cd16ebded3

Build ID: 27

Status: ✅ CI test succeeded

Report path: link

Full logs path: link

@peizhou001 peizhou001 merged commit f0b7cc9 into dmlc:master Feb 9, 2023
@peizhou001 peizhou001 deleted the peizhou/cpu_id_map branch February 9, 2023 00:51
paoxiaode pushed a commit to paoxiaode/dgl that referenced this pull request Mar 24, 2023
DominikaJedynak pushed a commit to DominikaJedynak/dgl that referenced this pull request Mar 12, 2024
This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

6 participants