Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Dataset] SquirrelDataset #5507

Merged
merged 1 commit into from
Mar 28, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions docs/source/api/python/dgl.data.rst
Original file line number Diff line number Diff line change
Expand Up @@ -57,6 +57,7 @@ Datasets for node classification/regression tasks
PATTERNDataset
CLUSTERDataset
ChameleonDataset
SquirrelDataset

Edge Prediction Datasets
---------------------------------------
Expand Down
2 changes: 1 addition & 1 deletion python/dgl/data/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -54,7 +54,7 @@
from .utils import *
from .cluster import CLUSTERDataset
from .pattern import PATTERNDataset
from .wiki_network import ChameleonDataset
from .wiki_network import ChameleonDataset, SquirrelDataset
from .wikics import WikiCSDataset
from .yelp import YelpDataset
from .zinc import ZINCDataset
Expand Down
86 changes: 80 additions & 6 deletions python/dgl/data/wiki_network.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
"""
Wikipedia page-page networks on the chameleon topic.
Wikipedia page-page networks on two topics: chameleons and squirrels.
"""
import os

Expand All @@ -23,8 +23,7 @@ class WikiNetworkDataset(DGLBuiltinDataset):
raw_dir : str
Raw file directory to store the processed data.
force_reload : bool
Whether to always generate the data from scratch rather than load a
cached version.
Whether to re-download the data source.
verbose : bool
Whether to print progress information.
transform : callable
Expand Down Expand Up @@ -123,7 +122,7 @@ class ChameleonDataset(WikiNetworkDataset):
- Nodes: 2277
- Edges: 36101
- Number of Classes: 5
- 10 splits with 60/20/20 train/val/test ratio
- 10 train/val/test splits

- Train: 1092
- Val: 729
Expand All @@ -134,8 +133,7 @@ class ChameleonDataset(WikiNetworkDataset):
raw_dir : str, optional
Raw file directory to store the processed data. Default: ~/.dgl/
force_reload : bool, optional
Whether to always generate the data from scratch rather than load a
cached version. Default: False
Whether to re-download the data source. Default: False
verbose : bool, optional
Whether to print progress information. Default: True
transform : callable, optional
Expand Down Expand Up @@ -182,3 +180,79 @@ def __init__(
verbose=verbose,
transform=transform,
)


class SquirrelDataset(WikiNetworkDataset):
r"""Wikipedia page-page network on squirrels from `Multi-scale Attributed
Node Embedding <https://arxiv.org/abs/1909.13021>`__ and later modified by
`Geom-GCN: Geometric Graph Convolutional Networks
<https://arxiv.org/abs/2002.05287>`

Nodes represent articles from the English Wikipedia, edges reflect mutual
links between them. Node features indicate the presence of particular nouns
in the articles. The nodes were classified into 5 classes in terms of their
average monthly traffic.

Statistics:

- Nodes: 5201
- Edges: 217073
- Number of Classes: 5
- 10 train/val/test splits

- Train: 2496
- Val: 1664
- Test: 1041

Parameters
----------
raw_dir : str, optional
Raw file directory to store the processed data. Default: ~/.dgl/
force_reload : bool, optional
Whether to re-download the data source. Default: False
verbose : bool, optional
Whether to print progress information. Default: True
transform : callable, optional
A transform that takes in a :class:`~dgl.DGLGraph` object and returns
a transformed version. The :class:`~dgl.DGLGraph` object will be
transformed before every access. Default: None

Attributes
----------
num_classes : int
Number of node classes

Notes
-----
The graph does not come with edges for both directions.

Examples
--------

>>> from dgl.data import SquirrelDataset
>>> dataset = SquirrelDataset()
>>> g = dataset[0]
>>> num_class