-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Performance][Feature] Implement edge excluding in EdgeDataLoader on GPU #3226
Conversation
To trigger regression tests:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After some inspection I found the PR largely orthogonal to #2971 since it is just moving the edge ID exclusion postprocessing to C++/CUDA. Although if the latter was merged then we probably want to skip the edge ID exclusion postprocessing.
There's a potential conflict with #2971 though since the latter is moving the edge ID exclusion into sample_frontier
.
python/dgl/utils/filter.py
Outdated
|
||
|
||
class Filter(object): | ||
"""Class used to either find either find the subset of ids that are in this |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"either find either find". Also we usually capitalize "ids" as "IDs".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in
fe0bc66.
python/dgl/utils/filter.py
Outdated
|
||
class Filter(object): | ||
"""Class used to either find either find the subset of ids that are in this | ||
filter, or th subset of ids that are not in this filter, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"the"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in
fe0bc66.
self._exclude_eids = None | ||
self._filter = None | ||
|
||
if device == F.cpu(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe add a TODO for CPU? If the filters are implemented for CPU then we won't use _locate_eids_to_exclude
right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added in
aa21891.
…GPU (#3226) * Update filter code * Add unit tests * Fixes * Switch to indices * Rename functions * Fix linting * Fix whitespace * Add doc * Fix heterograph * Change workspace allocation * Fix linting * Fix docs in filter.py * Add todo Co-authored-by: Jinjing Zhou <VoVAllen@users.noreply.github.com> Co-authored-by: Quan (Andy) Gan <coin2028@hotmail.com>
Description
When the output device for the dataloader is a GPU, this enable making use of the cuda hash table implementation to find the set of edges to remove.
In my testing using
/examples/pytorch/graphsage/train_sampling_unsupervised.py
, this reduced the time to find the edges from 148ms using numpy'sisin()
method, to 2.0ms to copy the edge ids to the GPU and check them against the excluded edges (74x speedup). The majority of this is performing the CPU to GPU copy. In terms of just checking the edges, it takes just 200us.Although I believe outside of the scope of this PR, the CPU side of things would benefit from a similar strategy.
Checklist
Please feel free to remove inapplicable items for your PR.
or have been fixed to be compatible with this change
Changes
Add Filter class to store the hashmap of edges to be excluded. This speeds up the process when there are multiple layers, as opposed to generating the same hashmap for each layer.
This also adds a unit test.