Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[GraphBolt][CUDA] Fetch indices for NS later. #7665

Merged
merged 7 commits into from
Aug 7, 2024

Conversation

mfbalin
Copy link
Collaborator

@mfbalin mfbalin commented Aug 7, 2024

Description

This PR implements overlapping indices fetches for NS after sampling is completed.

The PR ended up on the slightly large side.

The changes are:

  1. Add returning_indices_is_optional to C++ and Python API (pretty repetitive change).
  2. Add returning_indices_is_optional to SamplePerLayer, with False as default value.
  3. Modify gb.DataLoader so that returning_indices_is_optional is enabled for sample_neighbor code path (Important logic here.)
  4. Modify the default max_uva_threads from 6144 to 10240 as larger value is faster on 4090 (we can watch the regression tests for sample_layer_neighbor).
  5. Implement the logic for the indices overlapped fetch in SamplePerLayer (Important logic here.)
  6. Modify the tests to fix them and add tests for the new feature.

Checklist

Please feel free to remove inapplicable items for your PR.

  • The PR title starts with [$CATEGORY] (such as [NN], [Model], [Doc], [Feature]])
  • I've leverage the tools to beautify the python and c++ code.
  • The PR is complete and small, read the Google eng practice (CL equals to PR) to understand more about small PR. In DGL, we consider PRs with less than 200 lines of core code change are small (example, test and documentation could be exempted).
  • All changes have test coverage
  • Code is well-documented
  • To the best of my knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change
  • Related issue is referred in this PR
  • If the PR is for a new model/paper, I've updated the example index here.

Changes

@mfbalin mfbalin added the expedited if it doesn't affect the main path approve first to unblock related projects, and review later label Aug 7, 2024
@mfbalin mfbalin requested a review from frozenbugs August 7, 2024 04:38
@dgl-bot
Copy link
Collaborator

dgl-bot commented Aug 7, 2024

To trigger regression tests:

  • @dgl-bot run [instance-type] [which tests] [compare-with-branch];
    For example: @dgl-bot run g4dn.4xlarge all dmlc/master or @dgl-bot run c5.9xlarge kernel,api dmlc/master

@dgl-bot
Copy link
Collaborator

dgl-bot commented Aug 7, 2024

Commit ID: 6b81550

Build ID: 1

Status: ❌ CI test failed in Stage [Lint Check].

Report path: link

Full logs path: link

@dgl-bot
Copy link
Collaborator

dgl-bot commented Aug 7, 2024

Commit ID: 255f3dc

Build ID: 2

Status: ❌ CI test failed in Stage [Torch GPU Example test].

Report path: link

Full logs path: link

@dgl-bot
Copy link
Collaborator

dgl-bot commented Aug 7, 2024

Commit ID: 28705b2

Build ID: 3

Status: ⚪️ CI test cancelled due to overrun.

Report path: link

Full logs path: link

@dgl-bot
Copy link
Collaborator

dgl-bot commented Aug 7, 2024

Commit ID: 614ed6e0f0d25acaada4ad83a529fd2adc226d6d

Build ID: 4

Status: ✅ CI test succeeded.

Report path: link

Full logs path: link

@dgl-bot
Copy link
Collaborator

dgl-bot commented Aug 7, 2024

Commit ID: a47047b2073d23d13eabeffc3e810c5e39506d2f

Build ID: 5

Status: ⚪️ CI test cancelled due to overrun.

Report path: link

Full logs path: link

@dgl-bot
Copy link
Collaborator

dgl-bot commented Aug 7, 2024

Commit ID: 077a73b929a8ae9752e5945c6b9434139d7ed9ac

Build ID: 6

Status: ⚪️ CI test cancelled due to overrun.

Report path: link

Full logs path: link

@dgl-bot
Copy link
Collaborator

dgl-bot commented Aug 7, 2024

Commit ID: dd2fb2a0c90b075421943ce8d98b752f59ca5fb9

Build ID: 7

Status: ✅ CI test succeeded.

Report path: link

Full logs path: link

@mfbalin mfbalin merged commit b5ee45f into dmlc:master Aug 7, 2024
2 checks passed
@mfbalin mfbalin deleted the gb_ns_overlap_indices branch August 7, 2024 14:52
@@ -733,6 +734,10 @@ def sample_neighbors(
corresponding to each neighboring edge of a node. It must be a 1D
floating-point or boolean tensor, with the number of elements
equalling the total number of edges.
returning_indices_is_optional: bool
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a very confusing option. We definitely need to rename it, and can you help me understand why do we need this option?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is it confusing? One of the sampling results is the sampled indices. This option, when enabled makes it optional to return the sampled indices. I named it intentionally like this and thought it would be crystal clear.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When it is enabled, returning indices becomes optional. Then, when it is not returned, it is our job to fetch it.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For neighbor sampling, we need to fetch indices only for the sampled indices. Our current overlap_graph_fetch code path fetches the full insubgraph and overlap_graph_fetch was not providing speedup for NS in our regression results. This new code path provides speedup for NS.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For an api, it is confusing to has a field says, it is optional for it to do a thing.
In the future, when someone try to debug / refactor the code, they will be confused on how to respect the argument, should I return or not return the indices.

Why not make it deterministic? If true, always return, otherwise never return?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because I don't want to pessimize the code somehow. It should be not returned only when indices are pinned and when gpu sampling. However, when overlap fetch is true, indices are fetched and is on the gpu already.

// TODO @mfbalin: remove true from here once fetching indices later is
// setup.
if (true || layer || utils::is_on_gpu(indices)) {
if (!returning_indices_is_optional || utils::is_on_gpu(indices)) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not we move this 2 check to python, such that we only need to pass a deterministic option down?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is already deterministic. I coded it this way to ensure puregpu is not affected.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I know the code pass is deterministic. The problem comes from the parameter name, it mis-led the user. After several months when we look back to the code, we will get trapped.

It is ok to leave this as it, and finish the primary work first.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still don't understand why the user will be misled. If returning indices is optional, then it is optional. Maybe it is gonna be there, maybe not.

returning_indices_and_original_edge_ids_are_optional: bool
Boolean indicating whether it is okay for the call to this function
to leave the indices and the original edge ids tensors
uninitialized. In this case, it is the user's responsibility to
gather them using _edge_ids_in_fused_csc_sampling_graph if either is
missing.

Copy link
Collaborator

@frozenbugs frozenbugs Aug 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I try to understand a deterministic statement with the word maybe in it, I will be confused.
I understood the intention of the code now, let me try find time to polish it.

datapipe_graph = replace_dp(
datapipe_graph,
sampler,
sampler.datapipe.sample_per_layer(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you put the view at the caller of this sample_per_layer. I have no idea what does returning_indices_is_optional to me, since it does not affect the output of this method.

From the caller of sample_per_layer POV, I am setting this bit to turn on a optimization.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the simplest solution is renaming it to enable_{awesome_name}_optimization, and document how the optimization works.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No user will use sample_per_layer though. It is not exposed anymore. It is only used inside neighbor_sampler.py

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It has become internal.

This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
expedited if it doesn't affect the main path approve first to unblock related projects, and review later
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants