[Performance] Use segment operators for graph readout. #2361

yzh119 · 2020-11-22T09:17:07Z

Description

Previously we created a pseudo graph and applied SpMM to get graph readout results, and introduced graph creation overhead. Thus PR implements segment operators to lower this (and corresponding coo/csr conversion) overhead.

Checklist

Please feel free to remove inapplicable items for your PR.

The PR title starts with [$CATEGORY] (such as [NN], [Model], [Doc], [Feature]])
Changes are complete (i.e. I finished coding on this PR)
All changes have test coverage
Code is well-documented
To the my best knowledge, examples are either not affected by this change,
or have been fixed to be compatible with this change
Related issue is referred in this PR

python/dgl/backend/backend.py

python/dgl/backend/pytorch/sparse.py

python/dgl/sparse.py

src/array/cpu/segment_reduce.h

jermainewang · 2020-11-27T03:26:44Z

src/array/cuda/functor.cuh

+    } else {
+      cuda::AtomicAdd(out_buf, val);
+    }
+  }


We should refactor this part.

jermainewang · 2020-11-27T03:28:12Z

src/array/cuda/segment_reduce.cuh

+ */
+template <typename IdType, typename DType,
+          typename ReduceOp>
+__global__ void SegmentReduceKernel(


SegmentReduceKernel -> SegmentReduceCmpKernel

Also, poor docstring

jermainewang · 2020-11-27T03:29:28Z

Also missing C++ tests

* upd * upd * update * upd * upd * upd * fix * lint * lint * pylint * doc

yzh119 · 2020-11-27T03:57:42Z

Will fix the documentation issue soon, for other concerns:

print has been removed in [hotfix] Remove redundant print information in #2361 #2362.
We found that on CPU ([performance] Exchange the loop order of feature axis and neighbor axis in SpMMCsr on CPU. #2201 ), the loop order (graph dim, feature dim) is far better than (feature dim, graph dim) , in the later case we don't need an accumulative variable.

yzh119 added 11 commits November 20, 2020 18:18

upd

6e7cec7

upd

d4cbe1b

update

7c44f8f

upd

75efcc0

upd

e90b78a

upd

7937ad8

fix

d67a106

lint

bc69986

lint

8de0595

pylint

bb77b2f

doc

69486bb

yzh119 changed the title ~~[WIP][Performance] Use segment operators for graph readout.~~ [Performance] Use segment operators for graph readout. Nov 22, 2020

yzh119 merged commit 3adbfa1 into dmlc:master Nov 22, 2020

yzh119 mentioned this pull request Nov 22, 2020

[hotfix] Remove redundant print information in #2361 #2362

Merged

6 tasks

yzh119 added a commit that referenced this pull request Nov 22, 2020

Remove redundant print information in #2361 (#2362)

58775ad

jermainewang reviewed Nov 27, 2020

View reviewed changes

BarclayII pushed a commit to BarclayII/dgl that referenced this pull request Nov 27, 2020

[Performance] Use segment operators for graph readout. (dmlc#2361)

faf8c71

* upd * upd * update * upd * upd * upd * fix * lint * lint * pylint * doc

BarclayII pushed a commit to BarclayII/dgl that referenced this pull request Nov 27, 2020

Remove redundant print information in dmlc#2361 (dmlc#2362)

21946df

yzh119 mentioned this pull request Nov 27, 2020

[doc] Add docstring for segment reduce. #2375

Merged

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Performance] Use segment operators for graph readout. #2361

[Performance] Use segment operators for graph readout. #2361

yzh119 commented Nov 22, 2020

jermainewang Nov 27, 2020

jermainewang Nov 27, 2020

jermainewang commented Nov 27, 2020

yzh119 commented Nov 27, 2020

[Performance] Use segment operators for graph readout. #2361

[Performance] Use segment operators for graph readout. #2361

Conversation

yzh119 commented Nov 22, 2020

Description

Checklist

jermainewang Nov 27, 2020

Choose a reason for hiding this comment

jermainewang Nov 27, 2020

Choose a reason for hiding this comment

jermainewang commented Nov 27, 2020

yzh119 commented Nov 27, 2020