Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] ARGO: an easy-to-use runtime to improve GNN training performance on multi-core processors #7003

Merged
merged 44 commits into from
Feb 27, 2024

Conversation

jasonlin316
Copy link
Contributor

@jasonlin316 jasonlin316 commented Jan 23, 2024

Description

The GNN training performance on multi-core processors is limited as the current design cannot scale well. We propose a runtime system named ARGO that can improve the scalability of GNN training on multi-core processors. On a CPU platform where the original program can only scale to 16 cores (meaning that no performance improvement is achieved if more than 16 cores are applied), ARGO can further scale the design up to 64 cores, achieving up to 4.3x speedup compared to the original design without ARGO.

Checklist

Please feel free to remove inapplicable items for your PR.

  • The PR title starts with [$CATEGORY] (such as [NN], [Model], [Doc], [Feature]])
  • I've leverage the tools to beautify the python and c++ code.
  • The PR is complete and small, read the Google eng practice (CL equals to PR) to understand more about small PR. In DGL, we consider PRs with less than 200 lines of core code change are small (example, test and documentation could be exempted).
  • All changes have test coverage
  • Code is well-documented
  • To the best of my knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change
  • If the PR is for a new model/paper, I've updated the example index here.

Changes

@dgl-bot
Copy link
Collaborator

dgl-bot commented Jan 23, 2024

Not authorized to trigger CI. Please ask core developer to help trigger via issuing comment:

  • @dgl-bot

@dgl-bot
Copy link
Collaborator

dgl-bot commented Jan 23, 2024

Commit ID: 8bfa024

Build ID: 1

Status: ❌ CI test failed in Stage [Authentication].

Report path: link

Full logs path: link

Copy link
Collaborator

@anko-intel anko-intel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please adopt the change to the fact that now it is integral part of DGL example and not external repository.

examples/README.md Show resolved Hide resolved
examples/pytorch/ARGO/argo.py Outdated Show resolved Hide resolved
examples/pytorch/ARGO/README.md Outdated Show resolved Hide resolved
examples/pytorch/ARGO/README.md Outdated Show resolved Hide resolved
examples/pytorch/ARGO/README.md Outdated Show resolved Hide resolved
examples/pytorch/ARGO/README.md Outdated Show resolved Hide resolved
examples/pytorch/ARGO/ogb_example.py Outdated Show resolved Hide resolved
examples/pytorch/ARGO/ogb_example_ARGO.py Outdated Show resolved Hide resolved
examples/pytorch/ARGO/ogb_example.py Outdated Show resolved Hide resolved
examples/pytorch/ARGO/ogb_example.py Outdated Show resolved Hide resolved
examples/README.md Outdated Show resolved Hide resolved
@dgl-bot
Copy link
Collaborator

dgl-bot commented Feb 5, 2024

Not authorized to trigger CI. Please ask core developer to help trigger via issuing comment:

  • @dgl-bot

@dgl-bot
Copy link
Collaborator

dgl-bot commented Feb 5, 2024

Commit ID: 24bce24f6d13bf4eaf31a1275bae337c491d6195

Build ID: 2

Status: ❌ CI test failed in Stage [Authentication].

Report path: link

Full logs path: link

@dgl-bot
Copy link
Collaborator

dgl-bot commented Feb 6, 2024

Not authorized to trigger CI. Please ask core developer to help trigger via issuing comment:

  • @dgl-bot

@dgl-bot
Copy link
Collaborator

dgl-bot commented Feb 6, 2024

Commit ID: aa3318628bf1e0993d2f88011fa3a3568e52b36b

Build ID: 3

Status: ❌ CI test failed in Stage [Authentication].

Report path: link

Full logs path: link

@dgl-bot
Copy link
Collaborator

dgl-bot commented Feb 6, 2024

Not authorized to trigger CI. Please ask core developer to help trigger via issuing comment:

  • @dgl-bot

@dgl-bot
Copy link
Collaborator

dgl-bot commented Feb 6, 2024

Commit ID: 02c3264751590fee73a105c4f8615cf01dc332b8

Build ID: 4

Status: ❌ CI test failed in Stage [Authentication].

Report path: link

Full logs path: link

@dgl-bot
Copy link
Collaborator

dgl-bot commented Feb 6, 2024

Not authorized to trigger CI. Please ask core developer to help trigger via issuing comment:

  • @dgl-bot

@dgl-bot
Copy link
Collaborator

dgl-bot commented Feb 6, 2024

Commit ID: 20282490bf28fa6a3e3e3f86ed3cf400ad844a6b

Build ID: 5

Status: ❌ CI test failed in Stage [Authentication].

Report path: link

Full logs path: link

@dgl-bot
Copy link
Collaborator

dgl-bot commented Feb 6, 2024

Not authorized to trigger CI. Please ask core developer to help trigger via issuing comment:

  • @dgl-bot

@dgl-bot
Copy link
Collaborator

dgl-bot commented Feb 6, 2024

Commit ID: c8e77a58093dc0d5777974473a0a7c9c60f21219

Build ID: 6

Status: ❌ CI test failed in Stage [Authentication].

Report path: link

Full logs path: link

@dgl-bot
Copy link
Collaborator

dgl-bot commented Feb 6, 2024

Not authorized to trigger CI. Please ask core developer to help trigger via issuing comment:

  • @dgl-bot

@dgl-bot
Copy link
Collaborator

dgl-bot commented Feb 6, 2024

Commit ID: 4306493eafb43b7565521d94dcdc596f6f9af2c7

Build ID: 7

Status: ❌ CI test failed in Stage [Authentication].

Report path: link

Full logs path: link

@dgl-bot
Copy link
Collaborator

dgl-bot commented Feb 6, 2024

Not authorized to trigger CI. Please ask core developer to help trigger via issuing comment:

  • @dgl-bot

@dgl-bot
Copy link
Collaborator

dgl-bot commented Feb 6, 2024

Commit ID: f583ec9ea8872ec7ef22c2452b43962daf13610c

Build ID: 8

Status: ❌ CI test failed in Stage [Authentication].

Report path: link

Full logs path: link

@dgl-bot
Copy link
Collaborator

dgl-bot commented Feb 19, 2024

Not authorized to trigger CI. Please ask core developer to help trigger via issuing comment:

  • @dgl-bot

Co-authored-by: Andrzej Kotłowski <Andrzej.Kotlowski@intel.com>
@dgl-bot
Copy link
Collaborator

dgl-bot commented Feb 19, 2024

Commit ID: 2e53ac68ca21e27e7d9c112a26dc64fbbc09c726

Build ID: 35

Status: ❌ CI test failed in Stage [Authentication].

Report path: link

Full logs path: link

@dgl-bot
Copy link
Collaborator

dgl-bot commented Feb 19, 2024

Not authorized to trigger CI. Please ask core developer to help trigger via issuing comment:

  • @dgl-bot

@dgl-bot
Copy link
Collaborator

dgl-bot commented Feb 19, 2024

Commit ID: d3dae4c991ecf9b982b863b3bcf6ddad3e06b0c3

Build ID: 36

Status: ❌ CI test failed in Stage [Authentication].

Report path: link

Full logs path: link

Co-authored-by: Andrzej Kotłowski <Andrzej.Kotlowski@intel.com>
@dgl-bot
Copy link
Collaborator

dgl-bot commented Feb 19, 2024

Not authorized to trigger CI. Please ask core developer to help trigger via issuing comment:

  • @dgl-bot

@jasonlin316 jasonlin316 marked this pull request as ready for review February 19, 2024 19:09
@dgl-bot
Copy link
Collaborator

dgl-bot commented Feb 19, 2024

Commit ID: ab4331fcc8b18c3078c2c442be7a306eee1def8a

Build ID: 37

Status: ❌ CI test failed in Stage [Authentication].

Report path: link

Full logs path: link

@anko-intel
Copy link
Collaborator

I have run ogb_example.py (base) and ogb_example_ARGO.py and observe following performance improvements:

image

Well done!
(measured on Ubuntu 22.04.4, DGL 2.0.0, pytorch 2.2.1)

@Rhett-Ying
Copy link
Collaborator

@dgl-bot

@dgl-bot
Copy link
Collaborator

dgl-bot commented Feb 26, 2024

Commit ID: a98604144e1065fff0c3b4c5237ad03e397e0418

Build ID: 38

Status: ✅ CI test succeeded.

Report path: link

Full logs path: link

@frozenbugs frozenbugs self-requested a review February 26, 2024 09:03
@frozenbugs
Copy link
Collaborator

Can you separate out the tutorial part to another PR? For the example, we can merge directly as long as it is runnable with concrete readme.

@dgl-bot
Copy link
Collaborator

dgl-bot commented Feb 26, 2024

Not authorized to trigger CI. Please ask core developer to help trigger via issuing comment:

  • @dgl-bot

@dgl-bot
Copy link
Collaborator

dgl-bot commented Feb 26, 2024

Commit ID: ee87efedd628c229e0aa0a79f27963dd408822b6

Build ID: 39

Status: ❌ CI test failed in Stage [Authentication].

Report path: link

Full logs path: link

@jasonlin316
Copy link
Contributor Author

Can you separate out the tutorial part to another PR? For the example, we can merge directly as long as it is runnable with concrete readme.

No problem. I have removed the tutorial from this pull request, and created another one here: #7155

@frozenbugs
Copy link
Collaborator

It would be more impactful if you can add an example with graphbolt, our new dataloading package. graphbolt example: https://github.com/dmlc/dgl/blob/master/examples/sampling/graphbolt/node_classification.py

@frozenbugs
Copy link
Collaborator

@dgl-bot

@dgl-bot
Copy link
Collaborator

dgl-bot commented Feb 27, 2024

Not authorized to trigger CI. Please ask core developer to help trigger via issuing comment:

  • @dgl-bot

@dgl-bot
Copy link
Collaborator

dgl-bot commented Feb 27, 2024

Commit ID: 3ad4c2ea51b0820463ff5a1d9cdafb6eebed2e13

Build ID: 41

Status: ❌ CI test failed in Stage [Authentication].

Report path: link

Full logs path: link

@frozenbugs
Copy link
Collaborator

@dgl-bot

@dgl-bot
Copy link
Collaborator

dgl-bot commented Feb 27, 2024

Commit ID: d73aa7fb571ef3ebd05e837dbfc881f40f3aac63

Build ID: 40

Status: ⚪️ CI test cancelled due to overrun.

Report path: link

Full logs path: link

@frozenbugs frozenbugs merged commit 2d2ad71 into dmlc:master Feb 27, 2024
1 of 2 checks passed
This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants