Skip to content

Latest commit

 

History

History
executable file
·
123 lines (82 loc) · 2.87 KB

02_download_datasets.md

File metadata and controls

executable file
·
123 lines (82 loc) · 2.87 KB

Download datasets

All the datasets work with DGL 0.5 or later. Please update the environment using the yml files in the root directory if the use of these datasets throw error(s).


1. TU datasets

Nothing to do. The TU datasets are automatically downloaded.


2. MNIST/CIFAR10 super-pixel datasets

MNIST size is 1.39GB and CIFAR10 size is 2.51GB.

# At the root of the project
cd data/ 
bash script_download_superpixels.sh

Script script_download_superpixels.sh is located here. Codes to reproduce the datasets for MNIST and for CIFAR10.


3. ZINC molecular dataset

ZINC size is 58.9MB.

ZINC-full size is 1.17GB.

# At the root of the project
cd data/ 
bash script_download_molecules.sh

Script script_download_molecules.sh is located here. Code to reproduce the ZINC dataset is here and the ZINC-full dataset is here.


4. PATTERN/CLUSTER SBM datasets

PATTERN size is 1.98GB and CLUSTER size is 1.26GB.

# At the root of the project
cd data/ 
bash script_download_SBMs.sh

Script script_download_SBMs.sh is located here. Codes to reproduce the datasets for PATTERN and for CLUSTER.


5. TSP dataset

TSP size is 1.87GB.

# At the root of the project
cd data/ 
bash script_download_TSP.sh

Script script_download_TSP.sh is located here. Codes to reproduce the TSP dataset is here.


6. CSL dataset

CSL size is 27KB.

# At the root of the project
cd data/ 
bash script_download_CSL.sh

Script script_download_CSL.sh is located here.


7. COLLAB dataset

COLLAB size is 360MB.

No script to run. The COLLAB dataset files will be automatically downloaded from OGB when running the experiment files for COLLAB.


8. GraphTheoryProp dataset

GraphTheoryProp size is 1.2GB.

# At the root of the project
cd data/ 
bash script_download_graphtheoryprop.sh

Script script_download_graphtheoryprop.sh is located here.


9. CYCLES dataset

CYCLES size is 81MB.

# At the root of the project
cd data/ 
bash script_download_cycles.sh

Script script_download_cycles.sh is located here.


10. All datasets

# At the root of the project
cd data/ 
bash script_download_all_datasets.sh

Script script_download_all_datasets.sh is located here.