Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[GraphBolt] gb.DataLoader can simply be a datapipe. #7732

Merged
merged 6 commits into from
Aug 23, 2024

Conversation

mfbalin
Copy link
Collaborator

@mfbalin mfbalin commented Aug 21, 2024

Description

There is no need to derive from DataLoader as we are not using any of its functionalities.

I also remove some datapipe modification only meant for num_workers>0 to a if num_workers > 0: branch.

Checklist

Please feel free to remove inapplicable items for your PR.

  • The PR title starts with [$CATEGORY] (such as [NN], [Model], [Doc], [Feature]])
  • I've leverage the tools to beautify the python and c++ code.
  • The PR is complete and small, read the Google eng practice (CL equals to PR) to understand more about small PR. In DGL, we consider PRs with less than 200 lines of core code change are small (example, test and documentation could be exempted).
  • All changes have test coverage
  • Code is well-documented
  • To the best of my knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change
  • Related issue is referred in this PR
  • If the PR is for a new model/paper, I've updated the example index here.

Changes

@dgl-bot
Copy link
Collaborator

dgl-bot commented Aug 21, 2024

To trigger regression tests:

  • @dgl-bot run [instance-type] [which tests] [compare-with-branch];
    For example: @dgl-bot run g4dn.4xlarge all dmlc/master or @dgl-bot run c5.9xlarge kernel,api dmlc/master

@dgl-bot
Copy link
Collaborator

dgl-bot commented Aug 21, 2024

Commit ID: dadb638

Build ID: 1

Status: ⚪️ CI test cancelled due to overrun.

Report path: link

Full logs path: link

@dgl-bot
Copy link
Collaborator

dgl-bot commented Aug 21, 2024

Commit ID: f703e552a23097b70bc47cd2f7f6152da922f45d

Build ID: 2

Status: ❌ CI test failed in Stage [Torch CPU Unit test].

Report path: link

Full logs path: link

@dgl-bot
Copy link
Collaborator

dgl-bot commented Aug 22, 2024

Commit ID: 6edc601

Build ID: 3

Status: ❌ CI test failed in Stage [Torch CPU (Win64) Unit test].

Report path: link

Full logs path: link

@dgl-bot
Copy link
Collaborator

dgl-bot commented Aug 22, 2024

Commit ID: cabf8b2

Build ID: 4

Status: ✅ CI test succeeded.

Report path: link

Full logs path: link

python/dgl/graphbolt/dataloader.py Show resolved Hide resolved
[5, 0],
[5, 0],
[3, 3],
[5, 5],
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why the expected results are changed?

Copy link
Collaborator Author

@mfbalin mfbalin Aug 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess torch.utils.data.DataLoader does things that we don't know about if simply inheriting from it has side effects.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code is here, it is quite long so we need good justification to keep deriving from it. https://github.com/pytorch/pytorch/blob/main/torch/utils/data/dataloader.py

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is weird, it seems that the seed is respected in a different way.

@mfbalin mfbalin added the Release Candidate Candidate PRs for the upcoming release label Aug 23, 2024
@mfbalin
Copy link
Collaborator Author

mfbalin commented Aug 23, 2024

@frozenbugs this is a small PR that can potentially affect performance slightly (The less code we run, the potential to become faster.). Do you want to take a quick look as well?

@dgl-bot
Copy link
Collaborator

dgl-bot commented Aug 23, 2024

Commit ID: f140011969eece3cecf589ddb7437f4b0d9764fb

Build ID: 5

Status: ⚪️ CI test cancelled due to overrun.

Report path: link

Full logs path: link

@dgl-bot
Copy link
Collaborator

dgl-bot commented Aug 23, 2024

Commit ID: 19e507c765b22aa42eced1774347d3859484ef57

Build ID: 6

Status: ✅ CI test succeeded.

Report path: link

Full logs path: link

@frozenbugs
Copy link
Collaborator

@dgl-bot

@frozenbugs frozenbugs merged commit 37d1064 into dmlc:master Aug 23, 2024
1 of 2 checks passed
@mfbalin mfbalin deleted the gb_dataloader_datapipe branch August 23, 2024 05:03
This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Release Candidate Candidate PRs for the upcoming release
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants