[Distributed] Specify the graph format for distributed training #2948

zheng-da · 2021-05-26T04:53:11Z

Description

Previously, the graph structure in each partition is stored as CSC. When users call APIs such as find_edges and out_degrees, DGL generates corresponding graph format for these APIs. In the setting of distributed training, the graph structure is shared among all trainers and servers. If an API triggers the construction of another graph format, the graph structure of the new format is stored in local memory. If every trainer and server in a machine does the same thing, a graph will be replicated by many times in the machine.

This PR allows users to specify the graph formats during the launch time. Once the graph format is constructed, any graph API can no longer construct a new graph format during the runtime. If a graph format is required but not created during the launch time, an error is reported.

Checklist

Please feel free to remove inapplicable items for your PR.

The PR title starts with [$CATEGORY] (such as [NN], [Model], [Doc], [Feature]])
Changes are complete (i.e. I finished coding on this PR)
All changes have test coverage
Code is well-documented
To the my best knowledge, examples are either not affected by this change,
or have been fixed to be compatible with this change
Related issue is referred in this PR
If the PR is for a new model/paper, I've updated the example index here.

Changes

dgl-bot · 2021-05-26T04:53:42Z

To trigger regression tests:

@dgl-bot run [instance-type] [which tests] [compare-with-branch];
For example: @dgl-bot run g4dn.4xlarge all dmlc/master or @dgl-bot run c5.9xlarge kernel,api dmlc/master

classicsong

Overall LGTM

python/dgl/distributed/dist_context.py

Zheng and others added 2 commits May 25, 2021 18:10

explicitly set the graph format.

c533f67

fix.

6f34222

classicsong approved these changes May 26, 2021

View reviewed changes

python/dgl/distributed/dist_context.py Show resolved Hide resolved

fix.

9d9cdac

classicsong approved these changes May 26, 2021

View reviewed changes

classicsong and others added 4 commits May 26, 2021 15:48

Merge branch 'master' into dist_format

a81d920

fix launch script.

f9c8e26

Merge branch 'dist_format' of github.com:zheng-da/dgl-1 into dist_format

646107c

fix readme.

8d17e2f

zheng-da merged commit 18dbaeb into dmlc:master May 26, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Distributed] Specify the graph format for distributed training #2948

[Distributed] Specify the graph format for distributed training #2948

zheng-da commented May 26, 2021

dgl-bot commented May 26, 2021

classicsong left a comment

[Distributed] Specify the graph format for distributed training #2948

[Distributed] Specify the graph format for distributed training #2948

Conversation

zheng-da commented May 26, 2021

Description

Checklist

Changes

dgl-bot commented May 26, 2021

classicsong left a comment

Choose a reason for hiding this comment