-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature] Optimize dist_graph/_split_even_to_part memory usage #3132
[Feature] Optimize dist_graph/_split_even_to_part memory usage #3132
Conversation
To trigger regression tests:
|
9fb8771
to
86dcfac
Compare
We support PyTorch as old as 1.5.0 so I would recommend you use numpy's |
86dcfac
to
297c9aa
Compare
c645210
to
9420a3d
Compare
…part_memory_usage
…part_memory_usage
@zheng-da Could you please take a look at this PR? |
…part_memory_usage
…part_memory_usage
@@ -290,6 +291,10 @@ def clamp(data, min_val, max_val): | |||
def replace_inf_with_zero(x): | |||
return th.masked_fill(x, th.isinf(x), 0) | |||
|
|||
def count_nonzero(input): | |||
# TODO: fallback to numpy for backward compatibility | |||
return np.count_nonzero(input) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we not need to convert it into numpy array first?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It will convert implicitly.
# Get the elements that belong to the partition. | ||
partid = partition_book.partid | ||
part_eles = eles[offsets[partid] : offsets[partid + 1]] | ||
elements = F.tensor(elements) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here elements
is a vector stored in a local machine?
if we can store the entire elements
array in the local machine, why do we still have memory issue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
elements is a boolean mask to split train, validate and test sets (I learnt how to use it from this example). Since it is boolean type, it cost 1/8 memory of entire nonzero_1d tensor (which is int64 type), so at least my machines have enough memory for it.
Do you have any plan about redesign the data split part to further reduce the memory cost?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i see. then my general comment is that we can unify the code for DistTensor and local tensor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. But we need to implement a distributed count_nonzero method first. I can do it next week, what do you think?
Description
According to this issue, this PR will make the memory consumption of edge_split and node_split decrease when number of worker increases. In my case (graph with 13 billion edges), the memory consumption of edge_split decreases from ~108GB to ~15GB.
@zheng-da Could you please take a look at this PR?
Checklist
Please feel free to remove inapplicable items for your PR.
or have been fixed to be compatible with this change
Changes