Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[GraphBolt][Dataset] Contribute IGBH dataset to hetero examples. #7708

Open
wants to merge 47 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
47 commits
Select commit Hold shift + click to select a range
6441133
contribute three IGB dataset (small version)
Aug 15, 2024
abb9f39
Merge branch 'dmlc:master' into add-igbh-to-rgcn
BowenYao18 Aug 15, 2024
85d92c8
contribute three IGB dataset (small version)
Aug 15, 2024
5d2fe56
contribute three IGB dataset (small version)
Aug 15, 2024
543f672
format the code with ufmt
Aug 15, 2024
6bf10cb
added documentation
Aug 15, 2024
5fc930b
Merge branch 'dmlc:master' into add-igbh-to-rgcn
BowenYao18 Aug 15, 2024
42717af
Update examples/graphbolt/rgcn/download.py
BowenYao18 Aug 15, 2024
06f3f29
Update examples/graphbolt/rgcn/download.py
BowenYao18 Aug 15, 2024
19f18a5
Update examples/graphbolt/rgcn/download.py
BowenYao18 Aug 15, 2024
97c1735
Update examples/graphbolt/rgcn/download.py
BowenYao18 Aug 15, 2024
93cb70f
added 2983 class task
Aug 15, 2024
b170a90
fix lint
Aug 15, 2024
55079b8
Update examples/graphbolt/rgcn/download.py
BowenYao18 Aug 15, 2024
ce65746
remove labels from yaml
Aug 16, 2024
8660565
Merge branch 'dmlc:master' into add-igbh-to-rgcn
BowenYao18 Aug 16, 2024
0e0fa09
Merge branch 'dmlc:master' into add-igbh-to-rgcn
BowenYao18 Aug 17, 2024
3322987
add doenload script
Aug 17, 2024
0135b4b
corrected path in processing file
BowenYao18 Aug 17, 2024
7a35313
modify yaml file builder
BowenYao18 Aug 17, 2024
217b885
Merge branch 'dmlc:master' into add-igbh-to-rgcn
BowenYao18 Aug 19, 2024
6e0365b
Merge branch 'dmlc:master' into add-igbh-to-rgcn
BowenYao18 Aug 25, 2024
96c96ad
Merge branch 'dmlc:master' into add-igbh-to-rgcn
BowenYao18 Aug 26, 2024
4b680c7
Merge branch 'dmlc:master' into add-igbh-to-rgcn
BowenYao18 Aug 27, 2024
bbf4c97
Merge branch 'dmlc:master' into add-igbh-to-rgcn
BowenYao18 Aug 29, 2024
c9789cd
Merge branch 'dmlc:master' into add-igbh-to-rgcn
BowenYao18 Aug 29, 2024
39c9772
Merge branch 'dmlc:master' into add-igbh-to-rgcn
BowenYao18 Aug 30, 2024
6a9a9dc
Merge branch 'dmlc:master' into add-igbh-to-rgcn
BowenYao18 Sep 4, 2024
c645bb7
Merge branch 'dmlc:master' into add-igbh-to-rgcn
BowenYao18 Sep 4, 2024
2a6244e
Merge branch 'dmlc:master' into add-igbh-to-rgcn
BowenYao18 Sep 4, 2024
befc958
Merge branch 'dmlc:master' into add-igbh-to-rgcn
BowenYao18 Sep 5, 2024
422ccbe
add igb-het-[tiny|small]
Sep 5, 2024
eb62d9f
Merge branch 'dmlc:master' into add-igbh-to-rgcn
BowenYao18 Sep 6, 2024
8e51701
resolve merge conflict
Sep 6, 2024
337d416
Merge branch 'master' into add-igbh-to-rgcn
BowenYao18 Sep 6, 2024
aaf1da1
Update examples/graphbolt/pyg/hetero/node_classification.py
BowenYao18 Sep 6, 2024
ebdb01f
Update examples/graphbolt/pyg/hetero/node_classification.py
BowenYao18 Sep 6, 2024
d50977d
Update examples/graphbolt/pyg/hetero/node_classification.py
BowenYao18 Sep 6, 2024
1b54977
Merge branch 'master' into add-igbh-to-rgcn
BowenYao18 Sep 6, 2024
071d055
remove main args
Sep 6, 2024
f31d354
remove script
Sep 6, 2024
9e445c6
add all reverse edge type
Sep 6, 2024
e1607ab
Merge branch 'master' into add-igbh-to-rgcn
mfbalin Sep 7, 2024
b043999
add igb-het-large
BowenYao18 Sep 9, 2024
2657ee1
fix format
BowenYao18 Sep 9, 2024
d11f815
fix lint
BowenYao18 Sep 9, 2024
de7dc90
Merge branch 'dmlc:master' into add-igbh-to-rgcn
BowenYao18 Sep 9, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion examples/graphbolt/pyg/hetero/node_classification.py
Original file line number Diff line number Diff line change
Expand Up @@ -340,8 +340,9 @@ def parse_args():
"igb-het-tiny",
"igb-het-small",
"igb-het-medium",
"igb-het-large",
],
help="Dataset name. Possible values: ogb-lsc-mag240m, igb-het-[tiny|small|medium].",
help="Dataset name. Possible values: ogb-lsc-mag240m, igb-het-[tiny|small|medium|large].",
)
parser.add_argument(
"--fanout",
Expand Down
85 changes: 85 additions & 0 deletions examples/graphbolt/rgcn/evaluator.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,85 @@
import numpy as np

try:
import torch
except ImportError:
torch = None


### Evaluator for node property prediction
class IGB_Evaluator:
def __init__(self, name, num_tasks, eval_metric):
self.name = name
self.num_tasks = num_tasks
self.eval_metric = eval_metric

def _parse_and_check_input(self, input_dict):
if self.eval_metric == "acc":
if not "y_true" in input_dict:
raise RuntimeError("Missing key of y_true")
if not "y_pred" in input_dict:
raise RuntimeError("Missing key of y_pred")

y_true, y_pred = input_dict["y_true"], input_dict["y_pred"]

"""
y_true: numpy ndarray or torch tensor of shape (num_nodes num_tasks)
y_pred: numpy ndarray or torch tensor of shape (num_nodes num_tasks)
"""

# converting to torch.Tensor to numpy on cpu
if torch is not None and isinstance(y_true, torch.Tensor):
y_true = y_true.detach().cpu().numpy()

if torch is not None and isinstance(y_pred, torch.Tensor):
y_pred = y_pred.detach().cpu().numpy()

## check type
if not (
isinstance(y_true, np.ndarray)
and isinstance(y_true, np.ndarray)
):
raise RuntimeError(
"Arguments to Evaluator need to be either numpy ndarray or torch tensor"
)

if not y_true.shape == y_pred.shape:
raise RuntimeError(
"Shape of y_true and y_pred must be the same"
)

if not y_true.ndim == 2:
raise RuntimeError(
"y_true and y_pred must to 2-dim arrray, {}-dim array given".format(
y_true.ndim
)
)

if not y_true.shape[1] == self.num_tasks:
raise RuntimeError(
"Number of tasks for {} should be {} but {} given".format(
self.name, self.num_tasks, y_true.shape[1]
)
)

return y_true, y_pred

else:
raise ValueError("Undefined eval metric %s " % (self.eval_metric))

def _eval_acc(self, y_true, y_pred):
acc_list = []

for i in range(y_true.shape[1]):
is_labeled = y_true[:, i] == y_true[:, i]
correct = y_true[is_labeled, i] == y_pred[is_labeled, i]
acc_list.append(float(np.sum(correct)) / len(correct))

return {"acc": sum(acc_list) / len(acc_list)}

def eval(self, input_dict):
if self.eval_metric == "acc":
y_true, y_pred = self._parse_and_check_input(input_dict)
return self._eval_acc(y_true, y_pred)
else:
raise ValueError("Undefined eval metric %s " % (self.eval_metric))
25 changes: 20 additions & 5 deletions examples/graphbolt/rgcn/hetero_rgcn.py
mfbalin marked this conversation as resolved.
Show resolved Hide resolved
Original file line number Diff line number Diff line change
Expand Up @@ -58,6 +58,7 @@
import torch.nn as nn
import torch.nn.functional as F
from dgl.nn import HeteroEmbedding
from evaluator import IGB_Evaluator
from ogb.lsc import MAG240MEvaluator
from ogb.nodeproppred import Evaluator
from tqdm import tqdm
Expand Down Expand Up @@ -141,6 +142,10 @@ def create_dataloader(
if name == "ogb-lsc-mag240m":
node_feature_keys["author"] = ["feat"]
node_feature_keys["institution"] = ["feat"]
if "igb-het" in name:
node_feature_keys["author"] = ["feat"]
node_feature_keys["institute"] = ["feat"]
node_feature_keys["fos"] = ["feat"]
datapipe = datapipe.fetch_feature(features, node_feature_keys)

# Create a DataLoader from the datapipe.
Expand All @@ -158,7 +163,7 @@ def extract_embed(node_embed, input_nodes):

def extract_node_features(name, block, data, node_embed, device):
"""Extract the node features from embedding layer or raw features."""
if name == "ogbn-mag":
if name == "ogbn-mag" or "igb-het" in name:
input_nodes = {
k: v.to(device) for k, v in block.srcdata[dgl.NID].items()
}
Expand Down Expand Up @@ -424,7 +429,9 @@ def evaluate(
model.eval()
category = "paper"
# An evaluator for the dataset.
if name == "ogbn-mag":
if "igb-het" in name:
evaluator = IGB_Evaluator(name=name, num_tasks=1, eval_metric="acc")
elif name == "ogbn-mag":
evaluator = Evaluator(name=name)
else:
evaluator = MAG240MEvaluator()
Expand Down Expand Up @@ -588,7 +595,7 @@ def main(args):
# `institution` are generated in advance and stored in the feature store.
# For `ogbn-mag`, we generate the features on the fly.
embed_layer = None
if args.dataset == "ogbn-mag":
if args.dataset == "ogbn-mag" or "igb-het" in args.dataset:
# Create the embedding layer and move it to the appropriate device.
embed_layer = rel_graph_embed(g, feat_size).to(device)
print(
Expand Down Expand Up @@ -663,8 +670,16 @@ def main(args):
"--dataset",
type=str,
default="ogbn-mag",
choices=["ogbn-mag", "ogb-lsc-mag240m"],
help="Dataset name. Possible values: ogbn-mag, ogb-lsc-mag240m",
choices=[
"ogbn-mag",
"ogb-lsc-mag240m",
"igb-het-tiny",
"igb-het-small",
"igb-het-medium",
"igb-het-large",
],
help="Dataset name. Possible values: ogbn-mag, ogb-lsc-mag240m, "
" igb-het-[tiny|small|medium|large].",
)
parser.add_argument("--num_epochs", type=int, default=3)
parser.add_argument("--num_workers", type=int, default=0)
Expand Down
6 changes: 4 additions & 2 deletions python/dgl/graphbolt/impl/ondisk_dataset.py
Original file line number Diff line number Diff line change
Expand Up @@ -990,10 +990,10 @@ class BuiltinDataset(OnDiskDataset):
Self edges are added to the original graph.
Node features are stored as float32.

**igb-het-[tiny|small|medium]**
**igb-het-[tiny|small|medium|large]**
The igb-hom-[tiny|small|medium] dataset is a heterogeneous citation network,
which is designed for developers to train and evaluate GNN models with
high fidelity. See more details in `igb-het-[tiny|small|medium]
high fidelity. See more details in `igb-het-[tiny|small|medium|large]
<https://github.com/IllinoisGraphBenchmark/IGB-Datasets>`_.

.. note::
Expand Down Expand Up @@ -1047,6 +1047,8 @@ class BuiltinDataset(OnDiskDataset):
"igb-hom-seeds",
"igb-het-medium",
"igb-het-medium-seeds",
"igb-het-large",
"igb-het-large-seeds",
]
_all_datasets = _datasets + _large_datasets

Expand Down
Loading