Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Distributed] Specify the graph format for distributed training #2948

Merged
merged 7 commits into from
May 26, 2021

Conversation

zheng-da
Copy link
Collaborator

Description

Previously, the graph structure in each partition is stored as CSC. When users call APIs such as find_edges and out_degrees, DGL generates corresponding graph format for these APIs. In the setting of distributed training, the graph structure is shared among all trainers and servers. If an API triggers the construction of another graph format, the graph structure of the new format is stored in local memory. If every trainer and server in a machine does the same thing, a graph will be replicated by many times in the machine.

This PR allows users to specify the graph formats during the launch time. Once the graph format is constructed, any graph API can no longer construct a new graph format during the runtime. If a graph format is required but not created during the launch time, an error is reported.

Checklist

Please feel free to remove inapplicable items for your PR.

  • The PR title starts with [$CATEGORY] (such as [NN], [Model], [Doc], [Feature]])
  • Changes are complete (i.e. I finished coding on this PR)
  • All changes have test coverage
  • Code is well-documented
  • To the my best knowledge, examples are either not affected by this change,
    or have been fixed to be compatible with this change
  • Related issue is referred in this PR
  • If the PR is for a new model/paper, I've updated the example index here.

Changes

@dgl-bot
Copy link
Collaborator

dgl-bot commented May 26, 2021

To trigger regression tests:

  • @dgl-bot run [instance-type] [which tests] [compare-with-branch];
    For example: @dgl-bot run g4dn.4xlarge all dmlc/master or @dgl-bot run c5.9xlarge kernel,api dmlc/master

Copy link
Contributor

@classicsong classicsong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM

python/dgl/distributed/dist_context.py Show resolved Hide resolved
@zheng-da zheng-da merged commit 18dbaeb into dmlc:master May 26, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants