Skip to content

Commit

Permalink
Merge branch 'master' into gb_cuda_gpu_graph_cache_py
Browse files Browse the repository at this point in the history
  • Loading branch information
mfbalin committed Jun 26, 2024
2 parents eaa965e + d3e4f4f commit c21735d
Show file tree
Hide file tree
Showing 12 changed files with 913 additions and 70 deletions.
3 changes: 3 additions & 0 deletions examples/pytorch/labor/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,9 @@ This is the official Labor sampling example to reproduce the results in the orig
paper with the GraphSAGE GNN model. The model can be changed to any other model where
NeighborSampler can be used.

A more modern and performant version is provided in the
`examples/sampling/graphbolt/pyg/labor` folder.

Requirements
------------

Expand Down
94 changes: 94 additions & 0 deletions examples/sampling/graphbolt/pyg/labor/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,94 @@
Layer-Neighbor Sampling -- Defusing Neighborhood Explosion in GNNs
============

- Paper link: [https://papers.nips.cc/paper_files/paper/2023/hash/51f9036d5e7ae822da8f6d4adda1fb39-Abstract-Conference.html](NeurIPS 2023)
This is an official Labor sampling example to showcase the use of [https://docs.dgl.ai/en/latest/generated/dgl.graphbolt.LayerNeighborSampler.html](dgl.graphbolt.LayerNeighborSampler).

This sampler has 2 parameters, `layer_dependency=[False|True]` and
`batch_dependency=k`, where k is any nonnegative integer.

We use early stopping so that the final accuracy numbers are reported with a
fairly well converged model. Additional contributions to improve the validation
accuracy are welcome, and hence hopefully also improving the test accuracy.

### layer_dependency

Enabling this parameter by the command line option `--layer-dependency` makes it so
that the random variates for sampling are identical across layers. This ensures
that the same vertex gets the same neighborhood in each layer.

### batch_dependency

This method is proposed in Section 3.2 of [https://arxiv.org/pdf/2310.12403](Cooperative Minibatching in Graph Neural Networks), it is denoted as kappa in the paper. It
makes the random variates used across minibatches dependent, thus increasing
temporal locality. When used with a cache, the increase in the temporal locality
can be observed by monitoring the drop in the cache miss rate with higher values
of the batch dependency parameter, speeding up embedding transfers to the GPU.

### Performance

Use the `--torch-compile` option for best performance. If your GPU has spare
memory, consider using `--mode=cuda-cuda-cuda` to move the whole dataset to the
GPU. If not, consider using `--mode=cuda-pinned-cuda --num-gpu-cached-features=N`
to keep the graph on the GPU and features in system RAM with `N` of the node
features cached on the GPU. If you can not even fit the graph on the GPU, then
consider using `--mode=pinned-pinned-cuda --num-gpu-cached-features=N`. Finally,
you can use `--mode=cpu-pinned=cuda --num-gpu-cached-features=N` to perform the
sampling operation on the CPU.

### Examples

We use `--num-gpu-cached-features=500000` to cache the 500k of the node
embeddings for the `ogbn-products` dataset (default). Check the command line
arguments to see which other datasets can be run. When running with the yelp
dataset, using `--dropout=0` gives better final validation and test accuracy.

Example run with batch_dependency=1, cache miss rate is 62%:

```bash
python node_classification.py --num-gpu-cached-features=500000 --batch-dependency=1
Training in pinned-pinned-cuda mode.
Loading data...
The dataset is already preprocessed.
Training: 192it [00:03, 50.95it/s, num_nodes=247243, cache_miss=0.619]
Evaluating: 39it [00:00, 76.01it/s, num_nodes=137466, cache_miss=0.621]
Epoch 00, Loss: 1.1161, Approx. Train: 0.7024, Approx. Val: 0.8612, Time: 3.7688188552856445s
```

Example run with batch_dependency=32, cache miss rate is 22%:

```bash
python node_classification.py --num-gpu-cached-features=500000 --batch-dependency=32
Training in pinned-pinned-cuda mode.
Loading data...
The dataset is already preprocessed.
Training: 192it [00:03, 54.34it/s, num_nodes=250479, cache_miss=0.221]
Evaluating: 39it [00:00, 84.66it/s, num_nodes=135142, cache_miss=0.226]
Epoch 00, Loss: 1.1288, Approx. Train: 0.6993, Approx. Val: 0.8607, Time: 3.5339605808258057s
```

Example run with layer_dependency=True, # sampled nodes is 190k vs 250k without
this option:

```bash
python node_classification.py --num-gpu-cached-features=500000 --layer-dependency
Training in pinned-pinned-cuda mode.
Loading data...
The dataset is already preprocessed.
Training: 192it [00:03, 54.03it/s, num_nodes=191259, cache_miss=0.626]
Evaluating: 39it [00:00, 79.49it/s, num_nodes=108720, cache_miss=0.627]
Epoch 00, Loss: 1.1495, Approx. Train: 0.6932, Approx. Val: 0.8586, Time: 3.5540308952331543s
```

Example run with the original GraphSAGE sampler (Neighbor Sampler), # sampled nodes
is 520k, more than 2x higher than Labor sampler.

```bash
python node_classification.py --num-gpu-cached-features=500000 --sample-mode=sample_neighbor
Training in pinned-pinned-cuda mode.
Loading data...
The dataset is already preprocessed.
Training: 192it [00:04, 45.60it/s, num_nodes=517522, cache_miss=0.563]
Evaluating: 39it [00:00, 77.53it/s, num_nodes=255686, cache_miss=0.565]
Epoch 00, Loss: 1.1152, Approx. Train: 0.7015, Approx. Val: 0.8652, Time: 4.211000919342041s
```
54 changes: 54 additions & 0 deletions examples/sampling/graphbolt/pyg/labor/load_dataset.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
import dgl.graphbolt as gb


def load_dgl(name):
from dgl.data import (
CiteseerGraphDataset,
CoraGraphDataset,
FlickrDataset,
PubmedGraphDataset,
RedditDataset,
YelpDataset,
)

d = {
"cora": CoraGraphDataset,
"citeseer": CiteseerGraphDataset,
"pubmed": PubmedGraphDataset,
"reddit": RedditDataset,
"yelp": YelpDataset,
"flickr": FlickrDataset,
}

dataset = gb.LegacyDataset(d[name]())
new_feature = gb.TorchBasedFeatureStore([])
new_feature._features = dataset.feature._features
dataset._feature = new_feature
multilabel = name in ["yelp"]
return dataset, multilabel


def load_dataset(dataset_name):
multilabel = False
if dataset_name in [
"reddit",
"cora",
"citeseer",
"pubmed",
"yelp",
"flickr",
]:
dataset, multilabel = load_dgl(dataset_name)
elif dataset_name in [
"ogbn-products",
"ogbn-arxiv",
"ogbn-papers100M",
"ogbn-mag240M",
]:
if "mag240M" in dataset_name:
dataset_name = "ogb-lsc-mag240m"
dataset = gb.BuiltinDataset(dataset_name).load()
else:
raise ValueError("unknown dataset")

return dataset, multilabel
Loading

0 comments on commit c21735d

Please sign in to comment.