Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[GraphBolt][CUDA] Async sample neighbors and compaction. #7682

Merged
merged 14 commits into from
Aug 14, 2024

Conversation

mfbalin
Copy link
Collaborator

@mfbalin mfbalin commented Aug 11, 2024

Description

We want to hide the latency of GPU CPU synchronization using pipelining. We want to eliminate the white gaps in the profile below:

Before:
image

After:
image

This is going to be achieved by using pipelining so that the output of one stage is not required by the next. Then, we can launch kernels for all stages at the same time in an async manner, ensuring there are no white gaps.

Preliminary results:

Without torch.compile

Without asynchronous: 1.51s
With asynchronous: 1.37s

With torch.compile:

Without asynchronous: 1.11s
WIth asynchronous: 0.98s

Checklist

Please feel free to remove inapplicable items for your PR.

  • The PR title starts with [$CATEGORY] (such as [NN], [Model], [Doc], [Feature]])
  • I've leverage the tools to beautify the python and c++ code.
  • The PR is complete and small, read the Google eng practice (CL equals to PR) to understand more about small PR. In DGL, we consider PRs with less than 200 lines of core code change are small (example, test and documentation could be exempted).
  • All changes have test coverage
  • Code is well-documented
  • To the best of my knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change
  • Related issue is referred in this PR
  • If the PR is for a new model/paper, I've updated the example index here.

Changes

@dgl-bot
Copy link
Collaborator

dgl-bot commented Aug 11, 2024

To trigger regression tests:

  • @dgl-bot run [instance-type] [which tests] [compare-with-branch];
    For example: @dgl-bot run g4dn.4xlarge all dmlc/master or @dgl-bot run c5.9xlarge kernel,api dmlc/master

@dgl-bot
Copy link
Collaborator

dgl-bot commented Aug 11, 2024

Commit ID: 9043fc16b7b2444ef65c62d197348452f230d45e

Build ID: 1

Status: ✅ CI test succeeded.

Report path: link

Full logs path: link

@dgl-bot
Copy link
Collaborator

dgl-bot commented Aug 12, 2024

Commit ID: 2eda203

Build ID: 2

Status: ✅ CI test succeeded.

Report path: link

Full logs path: link

@mfbalin mfbalin marked this pull request as draft August 12, 2024 14:18
@dgl-bot
Copy link
Collaborator

dgl-bot commented Aug 12, 2024

Commit ID: b3b17bb

Build ID: 3

Status: ✅ CI test succeeded.

Report path: link

Full logs path: link

@dgl-bot
Copy link
Collaborator

dgl-bot commented Aug 12, 2024

Commit ID: 9453834

Build ID: 4

Status: ⚪️ CI test cancelled due to overrun.

Report path: link

Full logs path: link

@dgl-bot
Copy link
Collaborator

dgl-bot commented Aug 12, 2024

Commit ID: 7f541e8

Build ID: 5

Status: ✅ CI test succeeded.

Report path: link

Full logs path: link

@dgl-bot
Copy link
Collaborator

dgl-bot commented Aug 13, 2024

Commit ID: c9d789d902df587dc3937f3454652bf181abc896

Build ID: 6

Status: ❌ CI test failed in Stage [Lint Check].

Report path: link

Full logs path: link

@dgl-bot
Copy link
Collaborator

dgl-bot commented Aug 13, 2024

Commit ID: d9376023f47221257fc1a97371d7d1d730e96f2c

Build ID: 7

Status: ✅ CI test succeeded.

Report path: link

Full logs path: link

@mfbalin mfbalin marked this pull request as ready for review August 13, 2024 02:56
@mfbalin mfbalin force-pushed the gb_cuda_async_sample_neighbors branch from 76b766d to 3c76e58 Compare August 13, 2024 02:56
@dgl-bot
Copy link
Collaborator

dgl-bot commented Aug 13, 2024

Commit ID: 2fbbdedb4c4c19e54fa217f5c444ba9fb113e4d4

Build ID: 8

Status: ⚪️ CI test cancelled due to overrun.

Report path: link

Full logs path: link

@dgl-bot
Copy link
Collaborator

dgl-bot commented Aug 13, 2024

Commit ID: 1c429e55f75f7f35ef5d0958a72c65b0363b93c9

Build ID: 9

Status: ✅ CI test succeeded.

Report path: link

Full logs path: link

@dgl-bot
Copy link
Collaborator

dgl-bot commented Aug 13, 2024

Commit ID: 5e802549614b9d0c09f7b24945f05f3658306c2e

Build ID: 10

Status: ✅ CI test succeeded.

Report path: link

Full logs path: link

@dgl-bot
Copy link
Collaborator

dgl-bot commented Aug 13, 2024

Commit ID: c6a8414

Build ID: 11

Status: ✅ CI test succeeded.

Report path: link

Full logs path: link

@dgl-bot
Copy link
Collaborator

dgl-bot commented Aug 14, 2024

Commit ID: 61c8fa9939e3456a64abcdf5d7fa9fe504bc04c9

Build ID: 12

Status: ⚪️ CI test cancelled due to overrun.

Report path: link

Full logs path: link

@dgl-bot
Copy link
Collaborator

dgl-bot commented Aug 14, 2024

Commit ID: 2375b47d860b59439158c013592fd347b635d2d5

Build ID: 13

Status: ✅ CI test succeeded.

Report path: link

Full logs path: link

@dgl-bot
Copy link
Collaborator

dgl-bot commented Aug 14, 2024

Commit ID: a57474c4b2e4d1a76cadf083e92805913a462290

Build ID: 14

Status: ✅ CI test succeeded.

Report path: link

Full logs path: link

@mfbalin mfbalin added the expedited if it doesn't affect the main path approve first to unblock related projects, and review later label Aug 14, 2024
@mfbalin mfbalin merged commit 60d0b66 into dmlc:master Aug 14, 2024
2 checks passed
@mfbalin mfbalin deleted the gb_cuda_async_sample_neighbors branch August 14, 2024 21:51
@frozenbugs
Copy link
Collaborator

LGTM

This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
expedited if it doesn't affect the main path approve first to unblock related projects, and review later
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants