Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Performance] Use segment operators for graph readout. #2361

Merged
merged 11 commits into from
Nov 22, 2020

Conversation

yzh119
Copy link
Member

@yzh119 yzh119 commented Nov 22, 2020

Description

Previously we created a pseudo graph and applied SpMM to get graph readout results, and introduced graph creation overhead. Thus PR implements segment operators to lower this (and corresponding coo/csr conversion) overhead.

Checklist

Please feel free to remove inapplicable items for your PR.

  • The PR title starts with [$CATEGORY] (such as [NN], [Model], [Doc], [Feature]])
  • Changes are complete (i.e. I finished coding on this PR)
  • All changes have test coverage
  • Code is well-documented
  • To the my best knowledge, examples are either not affected by this change,
    or have been fixed to be compatible with this change
  • Related issue is referred in this PR

@yzh119 yzh119 changed the title [WIP][Performance] Use segment operators for graph readout. [Performance] Use segment operators for graph readout. Nov 22, 2020
@yzh119 yzh119 merged commit 3adbfa1 into dmlc:master Nov 22, 2020
yzh119 added a commit that referenced this pull request Nov 22, 2020
python/dgl/backend/backend.py Show resolved Hide resolved
python/dgl/backend/pytorch/sparse.py Show resolved Hide resolved
python/dgl/sparse.py Show resolved Hide resolved
src/array/cpu/segment_reduce.h Show resolved Hide resolved
src/array/cpu/segment_reduce.h Show resolved Hide resolved
} else {
cuda::AtomicAdd(out_buf, val);
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should refactor this part.

*/
template <typename IdType, typename DType,
typename ReduceOp>
__global__ void SegmentReduceKernel(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SegmentReduceKernel -> SegmentReduceCmpKernel

Also, poor docstring

@jermainewang
Copy link
Member

Also missing C++ tests

BarclayII pushed a commit to BarclayII/dgl that referenced this pull request Nov 27, 2020
* upd

* upd

* update

* upd

* upd

* upd

* fix

* lint

* lint

* pylint

* doc
BarclayII pushed a commit to BarclayII/dgl that referenced this pull request Nov 27, 2020
@yzh119
Copy link
Member Author

yzh119 commented Nov 27, 2020

Will fix the documentation issue soon, for other concerns:

  1. print has been removed in [hotfix] Remove redundant print information in #2361 #2362.
  2. We found that on CPU ([performance] Exchange the loop order of feature axis and neighbor axis in SpMMCsr on CPU. #2201 ), the loop order (graph dim, feature dim) is far better than (feature dim, graph dim) , in the later case we don't need an accumulative variable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants