[Feature]Uniform layer-wise sampler #362

GaiYu0 · 2019-01-21T04:11:30Z

Description

The implementation of uniform layer-wise sampler, tested with SSE on the PubMed dataset.

Checklist

Please feel free to remove inapplicable items for your PR.

The PR title starts with [$CATEGORY] (such as [Model], [Doc], [Feature]])
Changes are complete (i.e. I finished coding on this PR)
All changes have test coverage
Code is well-documented
To the my best knowledge, examples are either not affected by this change,
or have been fixed to be compatible with this change
Related issue is referred in this PR

Changes

Conflicts: src/graph/immutable_graph.cc

GaiYu0 · 2019-01-21T05:02:58Z

I also need to test my implementation with multi-layer GCN.

jermainewang · 2019-01-21T16:30:04Z

Could we refactor the sampler codes to another source file? Also, I suggest you break down this PR into two. One is only the layer sampler codes in C++, and another is the model-related codes in Python.

zheng-da · 2019-01-21T16:16:01Z

examples/pytorch/gcn/gcn_ls.py

+                              subgraph.map_to_subgraph_nid(dst).asnumpy())
+                             for src, dst in parent_uv_edges_per_hop]
+    #print(subg_uv_edges_per_hop)
+    return subgraph, subg_uv_edges_per_hop


you don't need to call LayerSampler like this. Previously, neighbor sampling can't handle resampled nodes correctly. That's why @ZiyueHuang implement neighbor sampling like this. If your layer sampler can handle resampled nodes, you can directly use layer sampler.

BTW, this implementation generates only a single batch in an epoch. This isn't desired. We need to enable mini-batch training.

zheng-da · 2019-01-21T16:17:23Z

examples/pytorch/gcn/gcn_ls.py

+
+    print(args)
+
+    main(args)


I think this file should eventually merge with @ZiyueHuang's neighbor sampling. Most of the code should be the same.

zheng-da · 2019-01-21T16:17:42Z

include/dgl/graph_interface.h

+   * \return a subgraph
+   */
+  /* virtual SampledSubgraph LayerUniformSample(IdArray seeds, const std::string &neigh_type,
+                                             int n_layers, size_t layer_size) const = 0; */


why comment this?

zheng-da · 2019-01-21T16:19:16Z

python/dgl/contrib/sampling/sampler.py

+    if not prefetch:
+        return loader
+    else:
+        return _PrefetchingLoader(loader, num_prefetch=num_workers*2)


What is the difference between layer sampler and neighbor sampler, in terms of API? if they are the same, should we merge them?

zheng-da · 2019-01-21T16:21:00Z

src/graph/immutable_graph.cc

@@ -976,6 +977,138 @@ SampledSubgraph ImmutableGraph::SampleSubgraph(IdArray seed_arr,
  return subg;
 }

+SampledSubgraph ImmutableGraph::LayerSample(IdArray seed_array,
+                                            const float* probability,


do we need to pass probability?

zheng-da · 2019-01-21T16:22:32Z

src/graph/immutable_graph.cc

+  size_t n_seeds = seed_array->shape[0];
+  const dgl_id_t* seed_data = static_cast<dgl_id_t*>(seed_array->data);
+  candidate_set.insert(seed_data, seed_data + n_seeds);
+  std::copy(candidate_set.begin(), candidate_set.end(), nodes.begin());


you don't need to allocate memory for nodes first?

zheng-da · 2019-01-21T16:23:15Z

src/graph/immutable_graph.cc

+  const dgl_id_t* seed_data = static_cast<dgl_id_t*>(seed_array->data);
+  candidate_set.insert(seed_data, seed_data + n_seeds);
+  std::copy(candidate_set.begin(), candidate_set.end(), nodes.begin());
+  layer_ids.insert(layer_ids.end(), nodes.size(), 0);


use push_back?

zheng-da · 2019-01-21T16:24:41Z

src/graph/immutable_graph.cc

+  std::vector<size_t> positions {0, nodes.size()};
+  for (int i = 0; i != n_layers; i++) {
+    candidate_set.clear();
+    for (auto j = positions.end()[-2]; j != positions.back(); ++j) {


positions.end()[-2] looks weird. it seems you want to use positions.front()

zheng-da · 2019-01-21T16:32:44Z

src/graph/immutable_graph.cc

+    for (auto const &pair : n_times) {
+      nodes.push_back(pair.first);
+      layer_ids.push_back(i + 1);
+      probabilities.push_back(pair.second / n_nodes);


why is this sampling probability? also, it's over the number of all nodes in the original graph?

jermainewang · 2019-01-21T16:39:23Z

src/graph/immutable_graph.cc

+    subg.layer_ids[i] = node_info[i].first;
+    subg.sample_prob[i] = node_info[i].second;
+  }
+  */


What are these?

zheng-da · 2019-02-06T05:28:41Z

examples/pytorch/gcn/gcn_ls.py

+                            fn.sum(msg='m', out='h'))
+        h = subg.ndata.pop('h')
+        # same as TODO above
+        h = h / math.sqrt(3.)


The normalization doesn't seem right. Could you try normalizing with vertex degree?

You can try random walk normalized Laplacian, like in https://github.com/ZiyueHuang/dgl/blob/4aa22e63d42fba3a27709a02c55aba680e30dd33/examples/mxnet/gcn/gcn_ns.py for easier implementation (for uniform sampling, mean is OK, no need for the norm),

def gcn_reduce(node): accum = mx.nd.mean(node.mailbox['m'], 1) return {'h': accum}

instead of symmetric normalized Laplacian.

GaiYu0 added 7 commits January 18, 2019 12:52

signed and unsigned integer conversion

25f6d30

replace long with uint64_t

6255aab

replace uint64_t with int64_t

353f622

layer sampler

020964d

Merge branch 'master' of https://github.com/dmlc/dgl into sampler

67c1ba4

Conflicts: src/graph/immutable_graph.cc

c api interface

bb78184

test layer sampler with sse

358f746

GaiYu0 closed this Jan 21, 2019

test with multi-layer gcn

49ea857

GaiYu0 reopened this Jan 21, 2019

lint

9a81b6b

zheng-da reviewed Jan 21, 2019

View reviewed changes

jermainewang mentioned this pull request Jan 21, 2019

[Feature] add NodeFlow API #361

Merged

6 tasks

jermainewang requested changes Jan 21, 2019

View reviewed changes

convergence issue in layer-wise sampling

707af9b

zheng-da reviewed Feb 6, 2019

View reviewed changes

sampler checker

e3385db

jermainewang mentioned this pull request Feb 18, 2019

[Roadmap] v0.2 release checklist #302

Closed

26 tasks

GaiYu0 closed this Feb 26, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature]Uniform layer-wise sampler #362

[Feature]Uniform layer-wise sampler #362

GaiYu0 commented Jan 21, 2019 •

edited

Loading

GaiYu0 commented Jan 21, 2019

jermainewang commented Jan 21, 2019

zheng-da Jan 21, 2019

zheng-da Jan 21, 2019

zheng-da Jan 21, 2019

zheng-da Jan 21, 2019

zheng-da Jan 21, 2019

zheng-da Jan 21, 2019

zheng-da Jan 21, 2019

zheng-da Jan 21, 2019

zheng-da Jan 21, 2019

jermainewang Jan 21, 2019

zheng-da Feb 6, 2019

ZiyueHuang Feb 6, 2019 •

edited

Loading

[Feature]Uniform layer-wise sampler #362

[Feature]Uniform layer-wise sampler #362

Conversation

GaiYu0 commented Jan 21, 2019 • edited Loading

Description

Checklist

Changes

GaiYu0 commented Jan 21, 2019

jermainewang commented Jan 21, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ZiyueHuang Feb 6, 2019 • edited Loading

Choose a reason for hiding this comment

GaiYu0 commented Jan 21, 2019 •

edited

Loading

ZiyueHuang Feb 6, 2019 •

edited

Loading