Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] x_dot_x builtin kernel support #831

Merged
merged 50 commits into from
Sep 14, 2019
Merged

Conversation

classicsong
Copy link
Contributor

@classicsong classicsong commented Sep 4, 2019

Description

Adding support for u_dot_v, u_dot_e, v_dot_e, v_dot_u, e_dot_u and e_dot_v as builtin kernel.
Implementation will base on current binary_reduce structure.
#659

Tasks

  • Forward of [x]_add_[x], [x]_sub_[x], [x]_mul_[x], [x]_div_[x] still work
  • Backward of [x]_add_[x], [x]_sub_[x], [x]_mul_[x], [x]_div_[x] still work
  • Forward of [x]_dot_[x]
  • Backward of [x]_dot_[x]

Checklist

Please feel free to remove inapplicable items for your PR.

  • The PR title starts with [$CATEGORY] (such as [NN], [Model], [Doc], [Feature]])
  • Changes are complete (i.e. I finished coding on this PR)
  • All changes have test coverage
  • Code is well-documented
  • To the my best knowledge, examples are either not affected by this change,
    or have been fixed to be compatible with this change
  • Related issue is referred in this PR

Changes

yzh119 and others added 30 commits August 6, 2019 16:13
Add noting that the PinSage model example under
example/pytorch/recommendation only work with Python 3.6+
as its dataset loader depends on stanfordnlp package
which work only with Python 3.6+.
…ide.

1. make dgl.nn.xxx frame agnostic
2. make test.backend include dgl.nn modules
3. modify test_edge_softmax of test/mxnet/test_nn.py and
    test/pytorch/test_nn.py work on both CPU and GPU
1. clear all agnostic related code in dgl.nn
2. make test_graph_conv agnostic to cpu/gpu
Add base control flow code.
classicsong and others added 2 commits September 4, 2019 21:35
TODO:
1. make sure x_add_x, x_sub_x, x_mul_x, x_div_x work
2. let x_dot_x work
3. make sure backward of x_add_x, x_sub_x, x_mul_x, x_div_x work
4. let x_dot_x backward work
@VoVAllen
Copy link
Collaborator

VoVAllen commented Sep 4, 2019

MXNet CI test may have some problem due to adapting to new version. I'll try to fix this tomorrow.

python/dgl/function/message.py Show resolved Hide resolved
src/kernel/binary_reduce_common.h Show resolved Hide resolved
src/kernel/cpu/binary_reduce_impl.h Show resolved Hide resolved
@yzh119
Copy link
Member

yzh119 commented Sep 5, 2019

@jermainewang , I've verified the correctness with STT. The GPU memory footprint is about the same but builtin function is 2x slower than my custom kernels(node parallel strategy).

@classicsong classicsong changed the title [WIP][Feature] x_dot_x builtin kernel support [Feature] x_dot_x builtin kernel support Sep 6, 2019
Copy link
Member

@yzh119 yzh119 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm ok with this PR.

python/dgl/kernel.py Show resolved Hide resolved
static DGLDEVICE DGLINLINE DType Call(DType *lhs, DType *rhs, int64_t len) {
DType out = 0;
// simple vector dot vector
#pragma unroll
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is already a pragma unroll in the graph level in minigun. Nested pragma unroll usually does not give benefit. Consider remove this.

Copy link
Member

@jermainewang jermainewang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks!

@yzh119 yzh119 merged commit 0a56d65 into dmlc:master Sep 14, 2019
@classicsong classicsong deleted the masked-mm branch November 25, 2019 07:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants