New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

[Performance]Add concurrent cpu id hashmap #5241

Merged

peizhou001 merged 28 commits into dmlc:master from peizhou001:peizhou/cpu_id_map

Feb 9, 2023

Collaborator

peizhou001 commented Jan 31, 2023 •

edited

Loading

Description

This PR add a new class CpuIdHashMap, which is a concurrent id hash map. Currently It is specified to be used in ToBlockCPU for optimization. It maps an array of Ids which is duplicate and non-contiguous to an array which is unique and contiguous. And also return the unique elements in original array. The public method used outside are:

Init
Map

Checklist

Please feel free to remove inapplicable items for your PR.

The PR title starts with [$CATEGORY] (such as [NN], [Model], [Doc], [Feature]])
I've leverage the tools to beautify the python and c++ code.
The PR is complete and small, read the Google eng practice (CL equals to PR) to understand more about small PR. In DGL, we consider PRs with less than 200 lines of core code change are small (example, test and documentation could be exempted).
All changes have test coverage
Code is well-documented
To the best of my knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change
Related issue is referred in this PR
If the PR is for a new model/paper, I've updated the example index here.

Changes

Ubuntu added 2 commits

January 30, 2023 13:54


          add cpu id hash map

ae076cb


          add test

ab66351

Collaborator

dgl-bot commented Jan 31, 2023

To trigger regression tests:

@dgl-bot run [instance-type] [which tests] [compare-with-branch];
For example: @dgl-bot run g4dn.4xlarge all dmlc/master or @dgl-bot run c5.9xlarge kernel,api dmlc/master

Collaborator

dgl-bot commented Jan 31, 2023

Commit ID: 4162c8ca3c925636dc259cc554dc79e0f8cad6e0

Build ID: 1

Status: ❌ CI test failed in Stage [Lint Check].

Report path: link

Full logs path: link

peizhou001 mentioned this pull request

[Performance]Optimize ToBlock in CPU #5192

Closed

8 tasks

peizhou001 assigned BarclayII, Rhett-Ying, frozenbugs and jermainewang


          change file header

8031a0a

Collaborator

dgl-bot commented Jan 31, 2023

Commit ID: ef5a84a447b5e33ef1d6c6738571a092d4595074

Build ID: 2

Status: ❌ CI test failed in Stage [Lint Check].

Report path: link

Full logs path: link


          format

35f3a7b

Collaborator

dgl-bot commented Jan 31, 2023

Commit ID: 62ebfa3f4ee59e48e73264d71a3a56742fbd999c

Build ID: 3

Status: ❌ CI test failed in Stage [Lint Check].

Report path: link

Full logs path: link

peizhou001 assigned peizhou001 and unassigned BarclayII, jermainewang, frozenbugs and Rhett-Ying

peizhou001 requested review from BarclayII, jermainewang, frozenbugs and Rhett-Ying

January 31, 2023 05:28

peizhou001 added the Core Library Enhancement label


          Merge branch 'master' into peizhou/cpu_id_map

371c1ab

Collaborator

dgl-bot commented Jan 31, 2023

Commit ID: 371c1ab

Build ID: 4

Status: ❌ CI test failed in Stage [Lint Check].

Report path: link

Full logs path: link

Rhett-Ying reviewed

View reviewed changes

tests/cpp/test_cpu_id_hash_map.cc Outdated

+              void ConstructRandomSet(size_t size, size_t range,
+                std::vector<IdType>& id_vec) {
+                  id_vec.resize(size);
+                  std::srand(42);

Collaborator

Rhett-Ying Jan 31, 2023

why choose 42 instead of std::time(nullptr) or std::random_device or dgl/random.h? std::rand() returns int and is it ok to cast to int64_t? will value that larger than int32_t be generated?

Collaborator Author

peizhou001 Feb 1, 2023 •

edited

Loading

use std::time(nullptr) instead and static_cast will make sure the overflow wounldn't happen.

tests/cpp/test_cpu_id_hash_map.cc Outdated

+                  IdType* unique_id_data = unique_ids.Ptr<IdType>();
+                  EXPECT_EQ(id_set.size(), unique_num);
+                  for (size_t i = 0; i < unique_num; i++) {

Collaborator

Rhett-Ying Jan 31, 2023

it it possible to check in parallel which may benefit in large size map test?

Collaborator Author

peizhou001 Feb 1, 2023

sure, use parallel_for for acceleration.

tests/cpp/test_cpu_id_hash_map.cc Outdated

+                  IdArray new_ids = NewIdArray(unique_num, CTX, sizeof(IdType) * 8);
+                  IdType default_val = -1;
+                  id_map.Map(unique_ids, default_val, new_ids);

Collaborator

Rhett-Ying Jan 31, 2023

seems not all the pubic member functions are called/verified explicitly in test, I think we'd better test all public APIs or make them private?

Collaborator Author

peizhou001 Feb 1, 2023

Actual only Map and Init will be used outside, change FillInIds -> fillInIds to avoid misunderstanding.

tests/cpp/test_cpu_id_hash_map.cc Outdated

+                  std::set<IdType> id_set(id_vec.begin(), id_vec.end());
+                  IdArray ids = VecToIdArray(id_vec, sizeof(IdType) * 8, CTX);
+                  IdArray unique_ids = NewIdArray(size, CTX, sizeof(IdType) * 8);
+                  CpuIdHashMap<IdType> id_map(CTX);

Collaborator

Rhett-Ying Jan 31, 2023

This class is named as Cpuxxx, do we need additional DGLContext argument? it can be GPU?

Collaborator Author

peizhou001 Feb 1, 2023

No, It is only used in CPU. But there is another member in DGLContext called device_id, I'm not sure if it can always set to 0. If so, the argument can be removed.

Collaborator

Rhett-Ying Feb 1, 2023

I think device_id is used for cuda device choose only

Collaborator Author

peizhou001 Feb 1, 2023

I see, let me remove this parameter

tests/cpp/test_cpu_id_hash_map.cc Outdated Show resolved Hide resolved

peizhou001 closed this

peizhou001 reopened this

peizhou001 marked this pull request as draft

February 1, 2023 03:47


          use CAS instead of atomic to avoid expensive init

c46efcd

Ubuntu added 3 commits

February 6, 2023 06:17


          change name

29f09ae


          add note

04af6db


          refactor

e9a8046

Collaborator

dgl-bot commented Feb 6, 2023

Commit ID: a930e624af6ea748bcfe7366b20b9356422cedbf

Build ID: 21

Status: ❌ CI test failed in Stage [C++ CPU (Win64)].

Report path: link

Full logs path: link

Collaborator

dgl-bot commented Feb 6, 2023

Commit ID: 24b21994da8821b9771f30cc1db0a71c792e4e77

Build ID: 22

Status: ❌ CI test failed in Stage [C++ CPU (Win64)].

Report path: link

Full logs path: link

Collaborator

dgl-bot commented Feb 6, 2023

Commit ID: 1c38d8adae6952b5e603359d0cc34303df233616

Build ID: 23

Status: ❌ CI test failed in Stage [C++ CPU (Win64)].

Report path: link

Full logs path: link

frozenbugs reviewed

View reviewed changes

src/array/cpu/id_hash_map.h Outdated


		#include <vector>

		#ifdef _MSC_VER

Collaborator

frozenbugs Feb 6, 2023

Do you still need this include in the .h file after the CompareAndSwap moved to .cc file?

Collaborator Author

peizhou001 Feb 6, 2023

moved to cc for fix

src/array/cpu/id_hash_map.h Outdated

+              /**
+               * @brief A CPU targeted hashmap for mapping duplicate and non-consecutive ids
+               * in the provided array to unique and consecutive ones. It utilizes
+               *multi-threading to accelerate the insert and search speed. Currently it is

Collaborator

frozenbugs Feb 6, 2023

space after *.

Collaborator Author

peizhou001 Feb 6, 2023

fixed

src/array/cpu/id_hash_map.h Outdated

+               private:
+                /**
+                 * @brief Array used to save all elelemnts in the hash table.

Collaborator

frozenbugs Feb 6, 2023

typo: elements.

Collaborator Author

peizhou001 Feb 6, 2023

fixed

src/array/cpu/id_hash_map.h Outdated

+                 */
+                Mapping* hmap_;
+                /**
+                 * @brief Mask assisted to get the position for a key.

Collaborator

frozenbugs Feb 6, 2023

It would be helpful to describe how the mask works mathematically, or link a doc.

Collaborator Author

peizhou001 Feb 6, 2023

fixed

src/array/cpu/id_hash_map.h Outdated

+                /**
+                 * @brief Array used to save all elelemnts in the hash table.
+                 */
+                Mapping* hmap_;

Collaborator

frozenbugs Feb 6, 2023

hash_map_

Collaborator Author

peizhou001 Feb 6, 2023

changed for fix

src/array/cpu/id_hash_map.h Outdated

+               private:
+                /**
+                 * @brief Array used to save all elelemnts in the hash table.

Collaborator

frozenbugs Feb 6, 2023

Array used to save all elelemnts in the hash table. -->
Hash maps which is used to store all elements.

Collaborator Author

peizhou001 Feb 6, 2023

changed

src/array/cpu/id_hash_map.cc Outdated

+                memset(hmap_, -1, sizeof(Mapping) * capcacity);
+                IdArray unique_ids = NewIdArray(num, ctx, sizeof(IdType) * 8);
+                return FillInIds(num, ids_data, unique_ids);

Collaborator

frozenbugs Feb 6, 2023

The parameter list of FillInIds is pretty weird, I'd suggest remove FillInIds method, and just copy-paste the code in the Init method, and add a comment say, this code block is to fill the ids.

Collaborator Author

peizhou001 Feb 6, 2023

moved for fix

src/array/cpu/id_hash_map.cc Outdated

+                CHECK_EQ(ids.defined(), true);
+                const IdType* ids_data = ids.Ptr<IdType>();
+                const size_t num = static_cast<size_t>(ids->shape[0]);
+                CHECK_GT(num, 0);

Collaborator

frozenbugs Feb 6, 2023

The check is to make sure that the ids is 1 dimention? Add a comment to clarify it.

Collaborator Author

peizhou001 Feb 6, 2023

added

src/array/cpu/id_hash_map.cc Outdated

+                const size_t num = static_cast<size_t>(ids->shape[0]);
+                CHECK_GT(num, 0);
+                size_t capcacity = 1;
+                capcacity = capcacity << static_cast<size_t>(1 + std::log2(num * 3));

Collaborator

frozenbugs Feb 6, 2023

Define a MARCO about this formula in the top. (after kGrainSize), and doc it clearly that this formula is experience based.

Collaborator

Rhett-Ying Feb 6, 2023

how about preferring function/variable(probably constexpr) than MACRO?

Collaborator Author

peizhou001 Feb 6, 2023

Constexpr cannot be used here as num is uncertain at compiler-time. Use an inline function for this.

src/array/cpu/id_hash_map.cc Outdated

+                DGLContext ctx = DGLContext{kDGLCPU, 0};
+                auto device = DeviceAPI::Get(ctx);
+                hmap_ = static_cast<Mapping*>(

Collaborator

frozenbugs Feb 6, 2023

In general, i would prefer to have less memory ops, esp. having an exposed pointer to handle memory allocation and release. It might be error-prone and hard to debug in the future.

Consider using:

NDArray
Vector
unique_ptr

Collaborator Author

peizhou001 Feb 6, 2023

Use unique_ptr

Rhett-Ying reviewed

View reviewed changes

src/array/cpu/id_hash_map.cc Outdated

		@@ -0,0 +1,195 @@
		/**
		* Copyright (c) latest by Contributors

Collaborator

Rhett-Ying Feb 6, 2023

latest or 2023? I don't know about it.

Collaborator Author

peizhou001 Feb 6, 2023

2023 looks more compatible

frozenbugs reviewed

View reviewed changes

src/array/cpu/id_hash_map.cc Outdated

+              }
+              template <typename IdType>
+              IdArray IdHashMap<IdType>::Map(const IdArray ids) const {

Collaborator

frozenbugs Feb 6, 2023

Map -> MapIds

Collaborator Author

peizhou001 Feb 6, 2023

changed


          use unique prt

785ac45

Collaborator

dgl-bot commented Feb 6, 2023

Commit ID: e271e3a5129d8f85533b0f55278e446ee44c57fd

Build ID: 24

Status: ❌ CI test failed in Stage [C++ CPU (Win64)].

Report path: link

Full logs path: link

frozenbugs reviewed

View reviewed changes

src/array/cpu/id_hash_map.cc

+                CHECK_EQ(ids.defined(), true);
+                const IdType* ids_data = ids.Ptr<IdType>();
+                const size_t num_ids = static_cast<size_t>(ids->shape[0]);
+                // Make sure `ids` is not 0 dim.

Collaborator

frozenbugs Feb 6, 2023

What will happen if the ids has 2 dim?

Collaborator Author

peizhou001 Feb 7, 2023

It should crash. But as the input has been identified as an IdArray, so in general we only check if it is empty(0-dim)

frozenbugs reviewed

View reviewed changes

src/array/cpu/id_hash_map.cc Outdated

+                const size_t num_ids = static_cast<size_t>(ids->shape[0]);
+                // Make sure `ids` is not 0 dim.
+                CHECK_GT(num_ids, 0);
+                size_t capcacity = GetMapSize(num_ids);

Collaborator

frozenbugs Feb 6, 2023

typo: capacity

Collaborator Author

peizhou001 Feb 7, 2023

fixed

frozenbugs reviewed

View reviewed changes

src/array/cpu/id_hash_map.cc Outdated

+              IdArray IdHashMap<IdType>::MapIds(const IdArray ids) const {
+                CHECK_EQ(ids.defined(), true);
+                const IdType* ids_data = ids.Ptr<IdType>();
+                const size_t len = static_cast<size_t>(ids->shape[0]);

Collaborator

frozenbugs Feb 6, 2023

Since num_ids was used in L74, so let's also use num_ids here.

Collaborator Author

peizhou001 Feb 7, 2023

changed

frozenbugs reviewed

View reviewed changes

src/array/cpu/id_hash_map.cc

+                return new_ids;
+              }
+              template <typename IdType>

Collaborator

frozenbugs Feb 6, 2023

not sure we need to explicitly code this, remove if it is unnecessary to be added.

Collaborator Author

peizhou001 Feb 7, 2023

removed

BarclayII reviewed

View reviewed changes

src/array/cpu/id_hash_map.cc Show resolved Hide resolved

frozenbugs reviewed

View reviewed changes

tests/cpp/test_id_hash_map.cc

+                _TestIdMap<int64_t, 1, 10>();
+                _TestIdMap<int32_t, 1000, 500000>();
+                _TestIdMap<int64_t, 1000, 500000>();
+                _TestIdMap<int32_t, 50000, 1000000>();

Collaborator

frozenbugs Feb 6, 2023

Which makes it a good test case by increasing the size of the IDArray?

Collaborator Author

peizhou001 Feb 7, 2023

When data size increase, multi thread functions are more vulnerable to mistakes. So add data column to test to test its robust.

Collaborator

frozenbugs commented Feb 6, 2023

Overall, the code is in a very good shape now, thanks for the great work!
Feel free to merge after approved by @BarclayII or @jermainewang who has move experience in dgl.

(Approved from @frozenbugs)


          fix comments

d2fbe68

Collaborator

dgl-bot commented Feb 7, 2023

Commit ID: 93c71fabcb796d20a247b75a099adfd454ff180b

Build ID: 25

Status: ✅ CI test succeeded

Report path: link

Full logs path: link

BarclayII approved these changes

View reviewed changes

Collaborator

BarclayII left a comment

Minor suggestions.

src/array/cpu/id_hash_map.cc

+                  }
+                };
+                hash_map_ = {nullptr, deleter};
+              }

Collaborator

BarclayII Feb 6, 2023 •

edited

Loading

I feel I can understand the reason why we are reusing DGL's allocator (since the hashmap is usually large) but I don't know whether there is any benefit performance-wise. In general it's better to put your justification on why this is necessary in the code comments.

Collaborator Author

peizhou001 Feb 7, 2023

We have a discussion before, the benefit of using a uniform memory allocator are listed below:

It is more friendly for memory reuse.
One allocator may ask a large volume amount from OS, if another allocator(e.g STL) also ask a large amount. It could cause either memory waste or frequently system call. Or in some corner cases, the second request may fail because of OOM.

src/array/cpu/id_hash_map.cc Show resolved Hide resolved

src/array/cpu/id_hash_map.cc Outdated

+              }
+              template <typename IdType>
+              IdArray IdHashMap<IdType>::Init(const IdArray ids) {

Collaborator

BarclayII Feb 7, 2023

IdArray&.
Same for other occurrences.

Collaborator Author

peizhou001 Feb 7, 2023

FIxed

src/array/cpu/id_hash_map.cc

+                // Use `int16_t` instead of `bool`. As vector<bool> is an exception
+                // for whom updating different elements from different threads is unsafe.
+                // see https://en.cppreference.com/w/cpp/container#Thread_safety.
+                std::vector<int16_t> valid(num_ids);

Collaborator

BarclayII Feb 7, 2023

This is related to the allocator reuse comment above. I feel this place can also use AllocWorkspace since num_ids can be large. Correct me if I'm wrong though.

Collaborator Author

peizhou001 Feb 7, 2023 •

edited

Loading

It make sense, use BoolArray instead.

Collaborator Author

peizhou001 Feb 8, 2023

Seems BoolArray is unsafe in multi-thread environment, change back to vector.

src/array/cpu/id_hash_map.cc

+                // Get ExclusiveSum of each block.
+                std::partial_sum(
+                    block_offset.begin() + 1, block_offset.end(), block_offset.begin() + 1);
+                unique_ids->shape[0] = block_offset.back();

Collaborator

BarclayII Feb 7, 2023

Is assigning shape values OK to do? I feel it's very dangerous. It's safer to compute the shape first and then allocate the unique_ids array.

Collaborator Author

peizhou001 Feb 8, 2023 •

edited

Loading

It looks safe because I find some other usages like this and the memory will not leaked. We can not allocate ahead because the size is known after the data is filled in.

src/array/cpu/id_hash_map.h

+                 *
+                 * @return Old value pointed by the `ptr`.
+                 */
+                static IdType CompareAndSwap(IdType* ptr, IdType old_val, IdType new_val);

Collaborator

BarclayII Feb 7, 2023

I don't think this is the right place for this function since CAS can be applicable to other components as well. Maybe move it to an inline function in dgl::runtime namespace.

Collaborator Author

peizhou001 Feb 8, 2023

It is a public static method so it can be reused in other modulars.

src/array/cpu/id_hash_map.h

+                IdArray MapIds(const IdArray ids) const;
+               private:
+                void Next(IdType* pos, IdType* delta) const;

Collaborator

BarclayII Feb 7, 2023

Need brief explanation on the private methods as well. (Same for other occurrences)
Here we also need to explain what pos and delta means as they are in/out parameters.

@param[in,out] pos ...

Collaborator Author

peizhou001 Feb 8, 2023

Added

src/array/cpu/id_hash_map.h


		IdType MapId(const IdType id) const;

		void Insert(IdType id, std::vector<int16_t>* valid, size_t index);

Collaborator

BarclayII Feb 7, 2023

Need to explain what is valid. Also why a pointer to vector rather than int16_t *? I don't think the size of the vector will change or anything.

Collaborator Author

peizhou001 Feb 8, 2023

Add note and use NdArray(The elements is bool, don't use naive pointer to avoid memory operation) instead.

src/array/cpu/id_hash_map.h Outdated

+               * @brief A CPU targeted hashmap for mapping duplicate and non-consecutive ids
+               * in the provided array to unique and consecutive ones. It utilizes
+               * multi-threading to accelerate the insert and search speed. Currently it is
+               * only designed to be used in `ToBlockCpu` for optimizing.

Collaborator

BarclayII Feb 7, 2023

... only designed to be used to optimize `ToBlockCpu`, so it does
not support key deletion.

You can add other limitations as well.

Collaborator Author

peizhou001 Feb 8, 2023

added

src/array/cpu/id_hash_map.h Outdated

Comment on lines 25 to 27

+               * The hashmap should be used in two phases. With the first being creating the
+               * hashmap, and then init it with an id array. After that, searching any old ids
+               * to get the mappings according to your need.

Collaborator

BarclayII Feb 7, 2023

I don't quite understand this explanation. Looks like it has three phases (creation, init, and search).

Do you want to say that querying before inserting all elements is not supported? If so you can just say that in the limitation above.

Collaborator Author

peizhou001 Feb 8, 2023

changed for remove misunderstanding.

Rhett-Ying reviewed

View reviewed changes

src/array/cpu/id_hash_map.h

+                /**
+                 * @brief Hash maps which is used to store all elements.
+                 */
+                std::unique_ptr<Mapping[], std::function<void(Mapping*)>> hash_map_;

Collaborator

Rhett-Ying Feb 8, 2023

why choose Mapping[] instead of Mapping*? what's the benefit of using smart pointer instead of raw pointer?

Collaborator Author

peizhou001 Feb 8, 2023

It is a specialization of unique_ptr, [] is only supported with T[].

Ubuntu added 2 commits

February 8, 2023 03:28


          fix comments

8de2cdd


          fix header

0cf2704

Collaborator

dgl-bot commented Feb 8, 2023

Commit ID: 436ec3bcbe2381517ad2e41bc7ac4be63b95fe45

Build ID: 26

Status: ✅ CI test succeeded

Report path: link

Full logs path: link

Collaborator

dgl-bot commented Feb 8, 2023

Commit ID: ae04d31f024c32ebdeb15b25824bc2cd16ebded3

Build ID: 27

Status: ✅ CI test succeeded

Report path: link

Full logs path: link

peizhou001 merged commit f0b7cc9 into dmlc:master

peizhou001 deleted the peizhou/cpu_id_map branch

February 9, 2023 00:51

paoxiaode pushed a commit to paoxiaode/dgl that referenced this pull request


          [Performance]Add concurrent cpu id hashmap (dmlc#5241)

e665d86

Add Id hash map

DominikaJedynak pushed a commit to DominikaJedynak/dgl that referenced this pull request


          [Performance]Add concurrent cpu id hashmap (dmlc#5241)

b65b74e

Add Id hash map

This pull request was closed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet