-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Performance] Track sorted status of COO from creation #2645
Conversation
@@ -267,7 +267,8 @@ class CSR : public GraphInterface { | |||
class COO : public GraphInterface { | |||
public: | |||
// Create a coo graph that shares the given src and dst | |||
COO(int64_t num_vertices, IdArray src, IdArray dst); | |||
COO(int64_t num_vertices, IdArray src, IdArray dst, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jermainewang @zheng-da is immutable_graph
data structure still in use?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think so, but we could leave the change here.
src/array/cpu/spmat_op_impl_coo.cc
Outdated
const int64_t nz_end = std::min(NNZ, nz_start+nz_chunk); | ||
|
||
// each thread searchs the row array for a change, and marks it's | ||
// location in Bp. Threads have overlapping ranges from nz_start-1 to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please elaborate more on the overlapping ranges?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I expanded it and added an example in d824573
341 // Each thread searches the row array for a change, and marks it's
342 // location in Bp. Threads, other than the first, start at the last
343 // index covered by the previous, in order to detect changes in the row
344 // array between thread partitions. This means that each thread after
345 // the first, searches the range [nz_start-1, nz_end). That is,
346 // if we had 10 non-zeros, and 4 threads, the indexes searched by each
347 // thread would be:
348 // 0: [0, 1, 2]
349 // 1: [2, 3, 4, 5]
350 // 2: [5, 6, 7, 8]
351 // 3: [8, 9]
352 //
353 // That way, if the row array were [0, 0, 1, 2, 2, 2, 4, 5, 5, 6], each
354 // change in row would be captured by one thread:
355 //
356 // 0: [0, 0, 1] - row 0
357 // 1: [1, 2, 2, 2] - row 1
358 // 2: [2, 4, 5, 5] - rows 2, 3, and 4
359 // 3: [5, 6] - rows 5 and 6
360 //
@@ -312,26 +313,65 @@ CSRMatrix COOToCSR(COOMatrix coo) { | |||
NDArray ret_indices; | |||
NDArray ret_data; | |||
|
|||
bool row_sorted = coo.row_sorted; | |||
bool col_sorted = coo.col_sorted; | |||
const bool row_sorted = coo.row_sorted; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we also change cuda/spmat_op_impl_coo.cu
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The implementation in cuda/coo2csr.cu
only operates on the sorted COO (and sorts it if its not).
I updated this, as properly tracking the sorted status of a COO was causing a regression, where the conversion was being done serially. Now that both paths are parallel, properly tracking the COO to be sorted does not result in a slowdown.
src/graph/heterograph_capi.cc
Outdated
// setup sorted flags | ||
bool row_sorted, col_sorted; | ||
std::tie(row_sorted, col_sorted) = COOIsSorted( | ||
aten::COOMatrix(num_src, num_dst, row, col)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding this check here could potentially slowdown graph construction. We should expose the sorted flags up to the user-facing API:
g = dgl.graph((src, dst), row_sorted=True, col_sorted=True)
Probably also need to expose a utility function for checking sorting status:
row_sorted, col_sorted = dgl.utils.is_sorted_srcdst(src, dst)
We could push the later utility function to next PR. When creating the livejournal graph, we manually set both flags to true.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jermainewang Given that previously COOIsSorted()
was being called at each format conversion, and that it's roughly the cost of copying the graph, would it be fine to make it the default to check?
We could have a flag check_sorted
that would default to True
, but would be ignored if row_sorted
was true. That way power users and internal calls could skip the check:
g = dgl.graph((src, dst)) # IsSorted check is invoked
g = dgl.graph((src, dst), check_sorted=False) # IsSorted is not invoked
g = dgl.graph((src, dst), row_sorted=True, col_sorted=True) # IsSorted is not invoked
g = dgl.graph((src, dst), row_sorted=True) # IsSorted is not invoked
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added the dgl.utils.is_sorted_srcdst
in #2685
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jermainewang Given that previously
COOIsSorted()
was being called at each format conversion, and that it's roughly the cost of copying the graph, would it be fine to make it the default to check?
It is possible that users only want to perform operations on COO thus not involving any format conversion. For example, in graph classification, once each sample is converted to a graph (likely from a COO format), users call dgl.batch
immediately before any computation. In this case, adding the sorted check will likely slowdown the graph construction.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In 26e7364 I changed the check_sorted
parameter to default to false.
@nv-dlasalle I pushed some changes myself and approved the PR. I further removed the |
Description
This PR tries to make sure we check if a COO is sorted, when it is created via the CAPI. This avoids adding a synchronization during forward/backward to check if its sorted.
Because the unsorted path on the CPU was the only parallel path for COO->CSR, this also parallelizes the sorted path, to ensure we don't see a regression for accurately marking matrices as sorted.
Checklist
Please feel free to remove inapplicable items for your PR.
or have been fixed to be compatible with this change
Changes
Performance on COO->CSR conversion:
pre #2391:
post #2391:
This PR: