Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Fixed sampler with limit on sampled nodes/edges in batch subgraph #6668

Merged
merged 19 commits into from
Jul 9, 2024

Conversation

ayushnoori
Copy link
Contributor

Description

Subgraph sampler that sets an upper bound on the number of nodes included in each layer of the sampled subgraph. At each layer, the frontier is randomly subsampled. Rare node types can also be upsampled by taking the scaled square root of the sampling probabilities (best strategy TBD). The new FixedSampler performs node-wise neighbor sampling and returns the subgraph induced by all the sampled nodes.

The relevant issue is: #6623, thanks to @frozenbugs and @jermainewang for their input and review.

Checklist

Please feel free to remove inapplicable items for your PR.

  • The PR title starts with [$CATEGORY] (such as [NN], [Model], [Doc], [Feature]])
  • Code is well-documented
  • Related issue is referred in this PR
  • All changes have test coverage

Please note that unit tests for FixedSampler are not yet written.

@dgl-bot
Copy link
Collaborator

dgl-bot commented Dec 4, 2023

Not authorized to trigger CI. Please ask core developer to help trigger via issuing comment:

  • @dgl-bot

@dgl-bot
Copy link
Collaborator

dgl-bot commented Dec 4, 2023

Commit ID: 928fe21

Build ID: 1

Status: ❌ CI test failed in Stage [Authentication].

Report path: link

Full logs path: link

@dgl-bot
Copy link
Collaborator

dgl-bot commented Dec 28, 2023

Not authorized to trigger CI. Please ask core developer to help trigger via issuing comment:

  • @dgl-bot

@dgl-bot
Copy link
Collaborator

dgl-bot commented Dec 28, 2023

Commit ID: fc5ce86

Build ID: 2

Status: ❌ CI test failed in Stage [Authentication].

Report path: link

Full logs path: link

python/dgl/dataloading/fixed.py Outdated Show resolved Hide resolved
python/dgl/dataloading/fixed.py Outdated Show resolved Hide resolved
python/dgl/dataloading/fixed.py Outdated Show resolved Hide resolved
python/dgl/dataloading/fixed.py Outdated Show resolved Hide resolved
python/dgl/dataloading/fixed.py Outdated Show resolved Hide resolved
python/dgl/dataloading/fixed.py Outdated Show resolved Hide resolved
python/dgl/dataloading/fixed.py Outdated Show resolved Hide resolved
python/dgl/dataloading/fixed.py Outdated Show resolved Hide resolved
python/dgl/dataloading/fixed.py Outdated Show resolved Hide resolved
python/dgl/dataloading/fixed.py Outdated Show resolved Hide resolved
@ayushnoori
Copy link
Contributor Author

Thanks for the edits, @frozenbugs. Happy to address all the formatting errors. Do you see any issues with the function implementation itself?

@dgl-bot
Copy link
Collaborator

dgl-bot commented Jan 28, 2024

Not authorized to trigger CI. Please ask core developer to help trigger via issuing comment:

  • @dgl-bot

@dgl-bot
Copy link
Collaborator

dgl-bot commented Jan 28, 2024

Commit ID: 4228344

Build ID: 3

Status: ❌ CI test failed in Stage [Authentication].

Report path: link

Full logs path: link

@dgl-bot
Copy link
Collaborator

dgl-bot commented Jan 28, 2024

Not authorized to trigger CI. Please ask core developer to help trigger via issuing comment:

  • @dgl-bot

@dgl-bot
Copy link
Collaborator

dgl-bot commented Jan 28, 2024

Commit ID: 6f8acdb

Build ID: 4

Status: ❌ CI test failed in Stage [Authentication].

Report path: link

Full logs path: link

@frozenbugs
Copy link
Collaborator

@dgl-bot

@dgl-bot
Copy link
Collaborator

dgl-bot commented Jan 29, 2024

Commit ID: 6f8acdb

Build ID: 5

Status: ❌ CI test failed in Stage [Lint Check].

Report path: link

Full logs path: link

@frozenbugs
Copy link
Collaborator

frozenbugs commented Jan 29, 2024

the code LGTM now, can you run lintrunner -a to fix the lint error? and can you add a unit test for your code.

The test case can be put here: tests/python/pytorch/dataloading, and you can use this test case as example: https://github.com/dmlc/dgl/blob/master/tests/python/pytorch/graphbolt/impl/test_in_subgraph_sampler.py

@dgl-bot
Copy link
Collaborator

dgl-bot commented Feb 9, 2024

Not authorized to trigger CI. Please ask core developer to help trigger via issuing comment:

  • @dgl-bot

@dgl-bot
Copy link
Collaborator

dgl-bot commented Feb 9, 2024

Commit ID: d42b1ec2bdca14a28da11c3e5cbb5cab5f2aa768

Build ID: 6

Status: ❌ CI test failed in Stage [Authentication].

Report path: link

Full logs path: link

@frozenbugs
Copy link
Collaborator

@dgl-bot

@dgl-bot
Copy link
Collaborator

dgl-bot commented Feb 19, 2024

Commit ID: cdc82adc722c688ee0abd1125069d970bb157f05

Build ID: 7

Status: ❌ CI test failed in Stage [Lint Check].

Report path: link

Full logs path: link

@dgl-bot
Copy link
Collaborator

dgl-bot commented Jul 6, 2024

Commit ID: 69ffb17fab9147de1a4d527b7f0b1fef3602f8c7

Build ID: 10

Status: ❌ CI test failed in Stage [Lint Check].

Report path: link

Full logs path: link

@dgl-bot
Copy link
Collaborator

dgl-bot commented Jul 6, 2024

Not authorized to trigger CI. Please ask core developer to help trigger via issuing comment:

  • @dgl-bot

@dgl-bot
Copy link
Collaborator

dgl-bot commented Jul 6, 2024

Commit ID: c11b871

Build ID: 11

Status: ❌ CI test failed in Stage [Authentication].

Report path: link

Full logs path: link

@ayushnoori
Copy link
Contributor Author

Hmm, not sure why CI test failed. I already ran lintrunner -a, this is what I see:

> lintrunner -a
Warning: Could not find a lintrunner config at: '.lintrunner.private.toml'. Continuing without using configuration file.
WARNING: No previous init data found. If this is the first time you're running lintrunner, you should run `lintrunner init`.
ok No lint issues.
Successfully applied all patches.

> git status
On branch master
Your branch is up to date with 'origin/master'.

nothing to commit, working tree clean

Have also merged changes so this PR is up-to-date.

@mfbalin
Copy link
Collaborator

mfbalin commented Jul 6, 2024

There are 2 lint checkers, you passed the initial one but not the one in the CI that runs after that.

https://dgl-ci-result.s3.us-west-2.amazonaws.com/dgl/PR-6668/10/10/logs/logs_dir/tmp1ptbor3n.log

************* Module dgl.dataloading.capped_neighbor_sampler
python/dgl/dataloading/capped_neighbor_sampler.py:65:4: W0221: Parameters differ from overridden 'sample' method (arguments-differ)
python/dgl/dataloading/capped_neighbor_sampler.py:156:27: C0103: Variable name "node_IDs" doesn't conform to snake_case naming style (invalid-name)
python/dgl/dataloading/capped_neighbor_sampler.py:107:27: W0612: Unused variable 'rel_type' (unused-variable)
python/dgl/dataloading/capped_neighbor_sampler.py:107:37: W0612: Unused variable 'dst_type' (unused-variable)

@dgl-bot
Copy link
Collaborator

dgl-bot commented Jul 6, 2024

Not authorized to trigger CI. Please ask core developer to help trigger via issuing comment:

  • @dgl-bot

@dgl-bot
Copy link
Collaborator

dgl-bot commented Jul 6, 2024

Commit ID: 161edd6

Build ID: 12

Status: ❌ CI test failed in Stage [Authentication].

Report path: link

Full logs path: link

@ayushnoori
Copy link
Contributor Author

Thanks, @mfbalin. Should be fixed now.

@mfbalin
Copy link
Collaborator

mfbalin commented Jul 6, 2024

@dgl-bot

@dgl-bot
Copy link
Collaborator

dgl-bot commented Jul 6, 2024

Commit ID: 161edd6

Build ID: 13

Status: ❌ CI test failed in Stage [Lint Check].

Report path: link

Full logs path: link

@ayushnoori
Copy link
Contributor Author

ayushnoori commented Jul 6, 2024

Looks like the error still is:

************* Module dgl.dataloading.capped_neighbor_sampler
python/dgl/dataloading/capped_neighbor_sampler.py:65:4: W0221: Parameters differ from overridden 'sample' method (arguments-differ)

The function signature in my code is:

def sample(self, g, indices, exclude_eids=None):

The sample() function signature of dgl.dataloading.base.Sampler is:

def sample(self, g, indices):

This is intentional, I added an optional exclude_eids argument. All other parameters should match. @mfbalin, may you please advise why this is a problem? Thanks!

@mfbalin
Copy link
Collaborator

mfbalin commented Jul 8, 2024

@ayushnoori You can add # pylint: disable=arguments-differ on the same line to disable the warning if you really need it to be that way.

@dgl-bot
Copy link
Collaborator

dgl-bot commented Jul 8, 2024

Not authorized to trigger CI. Please ask core developer to help trigger via issuing comment:

  • @dgl-bot

@dgl-bot
Copy link
Collaborator

dgl-bot commented Jul 8, 2024

Commit ID: 722cce1

Build ID: 14

Status: ❌ CI test failed in Stage [Authentication].

Report path: link

Full logs path: link

@ayushnoori
Copy link
Contributor Author

Thanks, @mfbalin – done.

@mfbalin
Copy link
Collaborator

mfbalin commented Jul 8, 2024

@dgl-bot

@dgl-bot
Copy link
Collaborator

dgl-bot commented Jul 8, 2024

Commit ID: 722cce1

Build ID: 15

Status: ✅ CI test succeeded.

Report path: link

Full logs path: link

Copy link
Contributor Author

@ayushnoori ayushnoori left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feedback from @frozenbugs should now be addressed.

@dgl-bot
Copy link
Collaborator

dgl-bot commented Jul 9, 2024

Not authorized to trigger CI. Please ask core developer to help trigger via issuing comment:

  • @dgl-bot

@frozenbugs frozenbugs merged commit 025767c into dmlc:master Jul 9, 2024
1 check passed
@dgl-bot
Copy link
Collaborator

dgl-bot commented Jul 9, 2024

Commit ID: c01e31b1337109632410539871c2ae927255db3f

Build ID: 16

Status: ❌ CI test failed in Stage [Authentication].

Report path: link

Full logs path: link

@frozenbugs
Copy link
Collaborator

Hi @ayushnoori , I have this PR merged. And it can be used in nightly DGL wheel, and next DGL release.
But please be aware that the lagacy dataloading will be fully deprecated and removed as I mentioned before, so soon in the future the code will be removed together with the legacy dataloading. If you want to retain it in DGL, please follow up with a Graphbolt implementation.

@ayushnoori
Copy link
Contributor Author

Understood, thanks so much, @frozenbugs!

This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants