Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Model] Add BGNN example #2740

Merged
merged 10 commits into from
Apr 6, 2021
Merged

[Model] Add BGNN example #2740

merged 10 commits into from
Apr 6, 2021

Conversation

nd7141
Copy link
Contributor

@nd7141 nd7141 commented Mar 11, 2021

Description

A proposal of a new GNN model coming from ICLR 2021: https://openreview.net/forum?id=ebS5NUfoMKL
The model is implemented with DGL: https://github.com/nd7141/bgnn

Reviewers: @jermainewang @BarclayII

Discussion thread: #2603

This is initial proposal of the model. There are two files: BGNN.py which implements BGNN model and run.py which shows how to use it.

I also include datasets I used in the paper for the sake of reproducibility (although I think that datasets should be included in DGL as a separate request).

Current results match those of the papers (although I have not done thorough hyperparameter tuning).

Here are images showing performance of BGNN, that is similar to the paper (https://openreview.net/forum?id=ebS5NUfoMKL):
image
image
image
image
image
image

I think BGNN.py should be part of the DGL, in the sense that an end user can just do import dgl.BGNNPredictor. On the other hand, run.py can be kept in the example.

A few notes:

  • Currently BGNN.py includes as well plotting, which may not be needed for the final version.
  • I evaluate for train, valid, and test masks during training, which can slow down training. These parts could be excluded, but should be discussed as it impacts early stopping.
  • BGNN relies on CatBoost package (https://catboost.ai/), so it should be installed if an end user uses BGNN.
  • There are some packages which could be installed as well but are not necessary (e.g. tqdm, sklearn).

I am happy to answer your questions and discuss how to integrate BGNN in DGL in the optimal way. Please let me know your thoughts.

@BarclayII
Copy link
Collaborator

Thanks a lot! It's a brand new model contribution so I put an item in the model index README.

Could you also add a README with instructions on how to download the dataset (once you have uploaded them)?

@nd7141
Copy link
Contributor Author

nd7141 commented Mar 17, 2021

@BarclayII @jermainewang I updated the code in the example and provided instructions how you can download datasets. Please let me know if something else is needed from me?

examples/pytorch/bgnn/BGNN.py Outdated Show resolved Hide resolved
examples/pytorch/bgnn/BGNN.py Outdated Show resolved Hide resolved
examples/pytorch/bgnn/BGNN.py Outdated Show resolved Hide resolved
examples/pytorch/bgnn/BGNN.py Show resolved Hide resolved
'''

:param graph: dgl graph
:param X: numpy matrix of input features. Can be composed of categorical features with missing values.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

describe how missing values are represented in the matrix

examples/pytorch/bgnn/BGNN.py Outdated Show resolved Hide resolved
examples/pytorch/bgnn/BGNN.py Outdated Show resolved Hide resolved
:param logging_epochs: log every n epoch
:param metric_name: metric to use for early stopping
:param normalize_features: normalize original input features X (column wise)
:param replace_na: replace NA in X
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if these options (normalize_features and replace_na) should be here. They look like data preprocessing steps that may go to the example code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved them to run.py for preprocessing.

@nd7141
Copy link
Contributor Author

nd7141 commented Apr 2, 2021

Hey @jermainewang

I updated the code documentation and also moved the preprocessing to run.py (that requires adding one additional parameter for fit function, as GBDT still want to operate on non-preprocessed features). That worked on my side. I hope it works for you too.

Copy link
Member

@jermainewang jermainewang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks for the contribution!

@jermainewang
Copy link
Member

@BarclayII do you have other comments?

@BarclayII
Copy link
Collaborator

I'm good. Thanks!

@BarclayII BarclayII merged commit 672e322 into dmlc:master Apr 6, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants