[GraphBolt][CUDA] Fetch indices for NS later. #7665

mfbalin · 2024-08-07T04:38:21Z

Description

This PR implements overlapping indices fetches for NS after sampling is completed.

The PR ended up on the slightly large side.

The changes are:

Add returning_indices_is_optional to C++ and Python API (pretty repetitive change).
Add returning_indices_is_optional to SamplePerLayer, with False as default value.
Modify gb.DataLoader so that returning_indices_is_optional is enabled for sample_neighbor code path (Important logic here.)
Modify the default max_uva_threads from 6144 to 10240 as larger value is faster on 4090 (we can watch the regression tests for sample_layer_neighbor).
Implement the logic for the indices overlapped fetch in SamplePerLayer (Important logic here.)
Modify the tests to fix them and add tests for the new feature.

Checklist

Please feel free to remove inapplicable items for your PR.

The PR title starts with [$CATEGORY] (such as [NN], [Model], [Doc], [Feature]])
I've leverage the tools to beautify the python and c++ code.
The PR is complete and small, read the Google eng practice (CL equals to PR) to understand more about small PR. In DGL, we consider PRs with less than 200 lines of core code change are small (example, test and documentation could be exempted).
All changes have test coverage
Code is well-documented
To the best of my knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change
Related issue is referred in this PR
If the PR is for a new model/paper, I've updated the example index here.

Changes

dgl-bot · 2024-08-07T04:38:46Z

To trigger regression tests:

@dgl-bot run [instance-type] [which tests] [compare-with-branch];
For example: @dgl-bot run g4dn.4xlarge all dmlc/master or @dgl-bot run c5.9xlarge kernel,api dmlc/master

dgl-bot · 2024-08-07T04:40:36Z

Commit ID: 6b81550

Build ID: 1

Status: ❌ CI test failed in Stage [Lint Check].

Report path: link

Full logs path: link

dgl-bot · 2024-08-07T05:24:31Z

Commit ID: 255f3dc

Build ID: 2

Status: ❌ CI test failed in Stage [Torch GPU Example test].

Report path: link

Full logs path: link

dgl-bot · 2024-08-07T13:23:58Z

Commit ID: 28705b2

Build ID: 3

Status: ⚪️ CI test cancelled due to overrun.

Report path: link

Full logs path: link

dgl-bot · 2024-08-07T13:54:58Z

Commit ID: 614ed6e0f0d25acaada4ad83a529fd2adc226d6d

Build ID: 4

Status: ✅ CI test succeeded.

Report path: link

Full logs path: link

dgl-bot · 2024-08-07T14:07:04Z

Commit ID: a47047b2073d23d13eabeffc3e810c5e39506d2f

Build ID: 5

Status: ⚪️ CI test cancelled due to overrun.

Report path: link

Full logs path: link

dgl-bot · 2024-08-07T14:16:22Z

Commit ID: 077a73b929a8ae9752e5945c6b9434139d7ed9ac

Build ID: 6

Status: ⚪️ CI test cancelled due to overrun.

Report path: link

Full logs path: link

dgl-bot · 2024-08-07T14:47:20Z

Commit ID: dd2fb2a0c90b075421943ce8d98b752f59ca5fb9

Build ID: 7

Status: ✅ CI test succeeded.

Report path: link

Full logs path: link

frozenbugs · 2024-08-15T08:20:15Z

python/dgl/graphbolt/impl/fused_csc_sampling_graph.py

@@ -733,6 +734,10 @@ def sample_neighbors(
            corresponding to each neighboring edge of a node. It must be a 1D
            floating-point or boolean tensor, with the number of elements
            equalling the total number of edges.
+        returning_indices_is_optional: bool


This is a very confusing option. We definitely need to rename it, and can you help me understand why do we need this option?

Why is it confusing? One of the sampling results is the sampled indices. This option, when enabled makes it optional to return the sampled indices. I named it intentionally like this and thought it would be crystal clear.

When it is enabled, returning indices becomes optional. Then, when it is not returned, it is our job to fetch it.

For neighbor sampling, we need to fetch indices only for the sampled indices. Our current overlap_graph_fetch code path fetches the full insubgraph and overlap_graph_fetch was not providing speedup for NS in our regression results. This new code path provides speedup for NS.

For an api, it is confusing to has a field says, it is optional for it to do a thing.
In the future, when someone try to debug / refactor the code, they will be confused on how to respect the argument, should I return or not return the indices.

Why not make it deterministic? If true, always return, otherwise never return?

Because I don't want to pessimize the code somehow. It should be not returned only when indices are pinned and when gpu sampling. However, when overlap fetch is true, indices are fetched and is on the gpu already.

frozenbugs · 2024-08-21T09:38:49Z

graphbolt/src/cuda/neighbor_sampler.cu

-        // TODO @mfbalin: remove true from here once fetching indices later is
-        // setup.
-        if (true || layer || utils::is_on_gpu(indices)) {
+        if (!returning_indices_is_optional || utils::is_on_gpu(indices)) {


Why not we move this 2 check to python, such that we only need to pass a deterministic option down?

I think it is already deterministic. I coded it this way to ensure puregpu is not affected.

Yes, I know the code pass is deterministic. The problem comes from the parameter name, it mis-led the user. After several months when we look back to the code, we will get trapped.

It is ok to leave this as it, and finish the primary work first.

I still don't understand why the user will be misled. If returning indices is optional, then it is optional. Maybe it is gonna be there, maybe not.

dgl/python/dgl/graphbolt/impl/fused_csc_sampling_graph.py

Lines 797 to 802 in c45d299

returning_indices_and_original_edge_ids_are_optional: bool

Boolean indicating whether it is okay for the call to this function

to leave the indices and the original edge ids tensors

uninitialized. In this case, it is the user's responsibility to

gather them using _edge_ids_in_fused_csc_sampling_graph if either is

missing.

When I try to understand a deterministic statement with the word maybe in it, I will be confused.
I understood the intention of the code now, let me try find time to polish it.

frozenbugs · 2024-08-21T09:41:35Z

python/dgl/graphbolt/dataloader.py

+                    datapipe_graph = replace_dp(
+                        datapipe_graph,
+                        sampler,
+                        sampler.datapipe.sample_per_layer(


If you put the view at the caller of this sample_per_layer. I have no idea what does returning_indices_is_optional to me, since it does not affect the output of this method.

From the caller of sample_per_layer POV, I am setting this bit to turn on a optimization.

the simplest solution is renaming it to enable_{awesome_name}_optimization, and document how the optimization works.

No user will use sample_per_layer though. It is not exposed anymore. It is only used inside neighbor_sampler.py

It has become internal.

[GraphBolt][CUDA] Fetch indices for NS later.

6b81550

mfbalin added the expedited if it doesn't affect the main path approve first to unblock related projects, and review later label Aug 7, 2024

mfbalin requested a review from frozenbugs August 7, 2024 04:38

linting

255f3dc

mfbalin added 2 commits August 7, 2024 08:57

fix implementation and add tests.

28705b2

make the code more robust.

2b0b4a4

mfbalin added 2 commits August 7, 2024 10:02

Refine the code, make more robust.

6c05c26

take back unnecessary diff.

2f8ad56

change default uva thread count.

97dc625

mfbalin merged commit b5ee45f into dmlc:master Aug 7, 2024
2 checks passed

mfbalin deleted the gb_ns_overlap_indices branch August 7, 2024 14:52

frozenbugs reviewed Aug 15, 2024

View reviewed changes

frozenbugs reviewed Aug 21, 2024

View reviewed changes

This pull request was closed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[GraphBolt][CUDA] Fetch indices for NS later. #7665

[GraphBolt][CUDA] Fetch indices for NS later. #7665

mfbalin commented Aug 7, 2024 •

edited

Loading

dgl-bot commented Aug 7, 2024

dgl-bot commented Aug 7, 2024

dgl-bot commented Aug 7, 2024

dgl-bot commented Aug 7, 2024

dgl-bot commented Aug 7, 2024

dgl-bot commented Aug 7, 2024

dgl-bot commented Aug 7, 2024

dgl-bot commented Aug 7, 2024

frozenbugs Aug 15, 2024

mfbalin Aug 15, 2024

mfbalin Aug 15, 2024

mfbalin Aug 15, 2024

frozenbugs Aug 19, 2024

mfbalin Aug 19, 2024

frozenbugs Aug 21, 2024

mfbalin Aug 21, 2024

frozenbugs Aug 22, 2024

mfbalin Aug 22, 2024

frozenbugs Aug 23, 2024 •

edited

Loading

frozenbugs Aug 21, 2024

frozenbugs Aug 21, 2024

mfbalin Aug 22, 2024

mfbalin Aug 22, 2024

	returning_indices_and_original_edge_ids_are_optional: bool
	Boolean indicating whether it is okay for the call to this function
	to leave the indices and the original edge ids tensors
	uninitialized. In this case, it is the user's responsibility to
	gather them using _edge_ids_in_fused_csc_sampling_graph if either is
	missing.

[GraphBolt][CUDA] Fetch indices for NS later. #7665

[GraphBolt][CUDA] Fetch indices for NS later. #7665

Conversation

mfbalin commented Aug 7, 2024 • edited Loading

Description

Checklist

Changes

dgl-bot commented Aug 7, 2024

dgl-bot commented Aug 7, 2024

dgl-bot commented Aug 7, 2024

dgl-bot commented Aug 7, 2024

dgl-bot commented Aug 7, 2024

dgl-bot commented Aug 7, 2024

dgl-bot commented Aug 7, 2024

dgl-bot commented Aug 7, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

frozenbugs Aug 23, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mfbalin commented Aug 7, 2024 •

edited

Loading

frozenbugs Aug 23, 2024 •

edited

Loading