Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Khop Graph Sampler API #39146

Merged
merged 47 commits into from
Jan 27, 2022
Merged

Conversation

DesmonDay
Copy link
Contributor

@DesmonDay DesmonDay commented Jan 22, 2022

PR types

New features

PR changes

APIs

Describe

  1. Add graph_khop_sampler api, especially for GraphSAGE sample method.
  2. Add to_uva_tensor api, which can create UVA(Unified Virtual Addressing) tensor from numpy array.

@paddle-bot-old
Copy link

Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

paddle/fluid/operators/graph_sample_neighbors_op.cc Outdated Show resolved Hide resolved
paddle/fluid/operators/graph_sample_neighbors_op.cc Outdated Show resolved Hide resolved
}

ctx->SetOutputDim("Out_Src", {-1, 1});
ctx->SetOutputDim("Out_Dst", {-1, 1});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

有个疑问,咱们的输出的out_src, out_dst一定需要是二维的吗?需要和PGL目前的版本一致吗?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

暂且存疑

paddle/fluid/operators/graph_sample_neighbors_op.cc Outdated Show resolved Hide resolved
paddle/fluid/operators/graph_sample_neighbors_op.cc Outdated Show resolved Hide resolved
paddle/fluid/operators/graph_sample_neighbors_op.h Outdated Show resolved Hide resolved
paddle/fluid/operators/graph_sample_neighbors_op.h Outdated Show resolved Hide resolved
paddle/fluid/operators/graph_sample_neighbors_op.h Outdated Show resolved Hide resolved
auto unique_inputs_end = std::unique(inputs.begin(), inputs.end());
inputs.resize(std::distance(inputs.begin(), unique_inputs_end));

// 2. Sample neighbors.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

对于注释部分,可以再细化

paddle/fluid/operators/graph_sample_neighbors_op.h Outdated Show resolved Hide resolved
paddle/fluid/operators/graph_sample_neighbors_op.h Outdated Show resolved Hide resolved

template <class bidiiter>
void sample_unique(bidiiter begin, bidiiter end, int num_samples) {
size_t left = std::distance(begin, end);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里的变量名字有点奇怪,叫left比较奇怪,是想说剩下的元素的个数吗?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

是的,因为这个变量在for 循环中是需要不断减一的。这里我修改成 left_num。

template <class bidiiter>
void sample_unique(bidiiter begin, bidiiter end, int num_samples) {
size_t left = std::distance(begin, end);
unsigned int seed = left;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这块的逻辑没有问题, 但是一个signed的变量转化成一个unsigned多少有点风险,后面可能需要注意这块的转化的问题

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已删除这个地方的转换,换用另一种方式产生

paddle/fluid/operators/graph_sample_neighbors_op.h Outdated Show resolved Hide resolved
auto unique_dst_merge_ptr = unique_dst_merge.begin();
auto src_merge_ptr = src_merge.begin();
auto dst_sample_counts_merge_ptr = dst_sample_counts_merge.begin();
for (size_t i = 0; i < num_layers; i++) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

后续的Copy逻辑需要再看看,记个TODO, 看看能不能通过std::move减少一些copy

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已增加 TODO

auto* out_eids = ctx.Output<Tensor>("Out_Eids");
out_eids->Resize({static_cast<int>(eids_merge.size())});
T* p_out_eids = out_eids->mutable_data<T>(ctx.GetPlace());
memset(p_out_eids, 0, eids_merge.size() * sizeof(T));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里的memset看看是否多余,因为后面就是一个直接Copy的操作

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

经测试可以删除,done

paddle/fluid/operators/graph_sample_neighbors_op.h Outdated Show resolved Hide resolved
paddle/fluid/operators/graph_sample_neighbors_reindex.h Outdated Show resolved Hide resolved
paddle/fluid/operators/graph_sample_neighbors_reindex.h Outdated Show resolved Hide resolved
paddle/fluid/pybind/tensor_py.h Show resolved Hide resolved
paddle/fluid/pybind/tensor_py.h Outdated Show resolved Hide resolved
paddle/fluid/pybind/imperative.cc Outdated Show resolved Hide resolved
dst_cumsum_counts, nodes,
"sample_sizes", sample_sizes,
"return_eids", False)
return edge_src, edge_dst, sample_index, reindex_nodes
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里有个想法就是把sorted_eids 默认为None, 这样的化就可以解决你下面的问题

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

nodes,
sample_sizes,
return_eids=False,
name=None):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里的参数名字需要整体对一波

paddle/fluid/operators/graph_sample_neighbors_reindex.h Outdated Show resolved Hide resolved
class GraphSampleNeighborsOpCUDAKernel : public framework::OpKernel<T> {
public:
void Compute(const framework::ExecutionContext& ctx) const override {
// 1. Get inputs.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

注释再正式一点

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

paddle/fluid/operators/graph_sample_neighbors_op.cu Outdated Show resolved Hide resolved
}

template <typename T>
void sample_neighbors(const framework::ExecutionContext& ctx, const T* src,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

函数名 需要符合一下规范 2.函数 普通函数:以大写字母开头,每个单词首字母大写,无下划线。 AddTabEntry() DeleteUrl() 存取函数:要求与变量名匹配(TODO)

}

template <typename T>
void reindex_func(const framework::ExecutionContext& ctx,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

同上函数名

thrust::copy(unique_items.begin(), unique_items.end(), subset->begin());

// Fill outputs with reindex result.
int block = 1024;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

block数这块需要根据设备函数来确定

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

int block = 1024;
int grid = (outputs->size() + block - 1) / block;
reindex_src_output<
T><<<grid, block, 0, reinterpret_cast<const platform::CUDADeviceContext&>(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reinterpret_cast 这块不是应该是函数一个指针吗 为啥是引用

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

参考他人的代码这么写的

}
return new_tensor;
},
py::return_value_policy::reference, R"DOC()DOC");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

device_id默认为0

@PaddlePaddle PaddlePaddle locked and limited conversation to collaborators Jan 26, 2022
@PaddlePaddle PaddlePaddle unlocked this conversation Jan 26, 2022
@DesmonDay DesmonDay changed the title Add graph_sample_neighbors API Add graph_khop_sampler API Jan 26, 2022
@wawltor wawltor self-requested a review January 27, 2022 04:02
Copy link
Contributor

@wawltor wawltor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Member

@ZeyuChen ZeyuChen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ZeyuChen ZeyuChen changed the title Add graph_khop_sampler API Add Khop Graph Sampler API Jan 27, 2022
@ZeyuChen ZeyuChen merged commit 35f949b into PaddlePaddle:develop Jan 27, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants