-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[GraphBolt][CUDA] Fetch indices for NS later. #7665
Conversation
To trigger regression tests:
|
@@ -733,6 +734,10 @@ def sample_neighbors( | |||
corresponding to each neighboring edge of a node. It must be a 1D | |||
floating-point or boolean tensor, with the number of elements | |||
equalling the total number of edges. | |||
returning_indices_is_optional: bool |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a very confusing option. We definitely need to rename it, and can you help me understand why do we need this option?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is it confusing? One of the sampling results is the sampled indices
. This option, when enabled makes it optional to return the sampled indices. I named it intentionally like this and thought it would be crystal clear.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When it is enabled, returning indices becomes optional. Then, when it is not returned, it is our job to fetch it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For neighbor sampling, we need to fetch indices only for the sampled indices. Our current overlap_graph_fetch code path fetches the full insubgraph and overlap_graph_fetch was not providing speedup for NS in our regression results. This new code path provides speedup for NS.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For an api, it is confusing to has a field says, it is optional for it to do a thing.
In the future, when someone try to debug / refactor the code, they will be confused on how to respect the argument, should I return or not return the indices.
Why not make it deterministic? If true, always return, otherwise never return?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because I don't want to pessimize the code somehow. It should be not returned only when indices are pinned and when gpu sampling. However, when overlap fetch is true, indices are fetched and is on the gpu already.
// TODO @mfbalin: remove true from here once fetching indices later is | ||
// setup. | ||
if (true || layer || utils::is_on_gpu(indices)) { | ||
if (!returning_indices_is_optional || utils::is_on_gpu(indices)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not we move this 2 check to python, such that we only need to pass a deterministic option down?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is already deterministic. I coded it this way to ensure puregpu is not affected.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I know the code pass is deterministic. The problem comes from the parameter name, it mis-led the user. After several months when we look back to the code, we will get trapped.
It is ok to leave this as it, and finish the primary work first.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still don't understand why the user will be misled. If returning indices is optional, then it is optional. Maybe it is gonna be there, maybe not.
dgl/python/dgl/graphbolt/impl/fused_csc_sampling_graph.py
Lines 797 to 802 in c45d299
returning_indices_and_original_edge_ids_are_optional: bool | |
Boolean indicating whether it is okay for the call to this function | |
to leave the indices and the original edge ids tensors | |
uninitialized. In this case, it is the user's responsibility to | |
gather them using _edge_ids_in_fused_csc_sampling_graph if either is | |
missing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When I try to understand a deterministic statement with the word maybe
in it, I will be confused.
I understood the intention of the code now, let me try find time to polish it.
datapipe_graph = replace_dp( | ||
datapipe_graph, | ||
sampler, | ||
sampler.datapipe.sample_per_layer( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you put the view at the caller of this sample_per_layer. I have no idea what does returning_indices_is_optional to me, since it does not affect the output of this method.
From the caller of sample_per_layer POV, I am setting this bit to turn on a optimization.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the simplest solution is renaming it to enable_{awesome_name}_optimization, and document how the optimization works.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No user will use sample_per_layer though. It is not exposed anymore. It is only used inside neighbor_sampler.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It has become internal.
Description
This PR implements overlapping indices fetches for NS after sampling is completed.
The PR ended up on the slightly large side.
The changes are:
max_uva_threads
from6144
to10240
as larger value is faster on 4090 (we can watch the regression tests for sample_layer_neighbor).Checklist
Please feel free to remove inapplicable items for your PR.
Changes