Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[GraphBolt] items are not shuffled across the whole set if num_workers>0 #6947

Open
Rhett-Ying opened this issue Jan 14, 2024 · 1 comment
Open
Assignees
Labels
Work Item Work items tracked in project tracker

Comments

@Rhett-Ying
Copy link
Collaborator

Rhett-Ying commented Jan 14, 2024

🔨Work Item

IMPORTANT:

  • This template is only for dev team to track project progress. For feature request or bug report, please use the corresponding issue templates.
  • DO NOT create a new work item if the purpose is to fix an existing issue or feature request. We will directly use the issue in the project tracker.

Project tracker: https://github.com/orgs/dmlc/projects/2

Description

split item before shuffle results in significant accuracy drop. We should shuffle across the whole set first, then split items among workers. It's worth checking if torch.DL shuffle in this way.

buffer_size of ItemShufflerAndBatcher could always be len(item_set) to simplify the code logic.

Depending work items or issues

@Rhett-Ying Rhett-Ying added the Work Item Work items tracked in project tracker label Jan 14, 2024
@Rhett-Ying Rhett-Ying added this to the 2023 12.30 Graphbolt milestone Jan 14, 2024
@Rhett-Ying
Copy link
Collaborator Author

Rhett-Ying commented Jan 23, 2024

For ItemSampler , it's fixed in #6982. One side effect of this fix is all workers are using same seed generator for DistributedItemSampler's shuffle.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Work Item Work items tracked in project tracker
Projects
Status: 📋 Planned
Development

No branches or pull requests

3 participants