[Performance] Track sorted status of COO from creation #2645

nv-dlasalle · 2021-02-09T05:09:13Z

Description

This PR tries to make sure we check if a COO is sorted, when it is created via the CAPI. This avoids adding a synchronization during forward/backward to check if its sorted.

Because the unsorted path on the CPU was the only parallel path for COO->CSR, this also parallelizes the sorted path, to ensure we don't see a regression for accurately marking matrices as sorted.

Checklist

Please feel free to remove inapplicable items for your PR.

The PR title starts with [$CATEGORY] (such as [NN], [Model], [Doc], [Feature]])
Changes are complete (i.e. I finished coding on this PR)
All changes have test coverage
Code is well-documented
To the my best knowledge, examples are either not affected by this change,
or have been fixed to be compatible with this change

Changes

Performance on COO->CSR conversion:

pre #2391:

=== livejournal ===
CPU: 0.40786104202270507 s
GPU: 0.009723591804504394 s

post #2391:

=== livejournal ===
CPU: 0.229170560836792 s
GPU: 0.08134148120880128 s

This PR:

=== livejournal ===
CPU: 0.10662269592285156 s
GPU: 0.005210304260253906 s

yzh119 · 2021-02-13T13:00:10Z

include/dgl/immutable_graph.h

@@ -267,7 +267,8 @@ class CSR : public GraphInterface {
 class COO : public GraphInterface {
 public:
  // Create a coo graph that shares the given src and dst
-  COO(int64_t num_vertices, IdArray src, IdArray dst);
+  COO(int64_t num_vertices, IdArray src, IdArray dst,


@jermainewang @zheng-da is immutable_graph data structure still in use?

I don't think so, but we could leave the change here.

yzh119 · 2021-02-14T04:03:00Z

src/array/cpu/spmat_op_impl_coo.cc

+        const int64_t nz_end = std::min(NNZ, nz_start+nz_chunk);
+
+        // each thread searchs the row array for a change, and marks it's
+        // location in Bp. Threads have overlapping ranges from nz_start-1 to


Could you please elaborate more on the overlapping ranges?

I expanded it and added an example in d824573

341 // Each thread searches the row array for a change, and marks it's 342 // location in Bp. Threads, other than the first, start at the last 343 // index covered by the previous, in order to detect changes in the row 344 // array between thread partitions. This means that each thread after 345 // the first, searches the range [nz_start-1, nz_end). That is, 346 // if we had 10 non-zeros, and 4 threads, the indexes searched by each 347 // thread would be: 348 // 0: [0, 1, 2] 349 // 1: [2, 3, 4, 5] 350 // 2: [5, 6, 7, 8] 351 // 3: [8, 9] 352 // 353 // That way, if the row array were [0, 0, 1, 2, 2, 2, 4, 5, 5, 6], each 354 // change in row would be captured by one thread: 355 // 356 // 0: [0, 0, 1] - row 0 357 // 1: [1, 2, 2, 2] - row 1 358 // 2: [2, 4, 5, 5] - rows 2, 3, and 4 359 // 3: [5, 6] - rows 5 and 6 360 //

yzh119 · 2021-02-14T04:23:38Z

src/array/cpu/spmat_op_impl_coo.cc

@@ -312,26 +313,65 @@ CSRMatrix COOToCSR(COOMatrix coo) {
  NDArray ret_indices;
  NDArray ret_data;

-  bool row_sorted = coo.row_sorted;
-  bool col_sorted = coo.col_sorted;
+  const bool row_sorted = coo.row_sorted;


Should we also change cuda/spmat_op_impl_coo.cu?

The implementation in cuda/coo2csr.cu only operates on the sorted COO (and sorts it if its not).

I updated this, as properly tracking the sorted status of a COO was causing a regression, where the conversion was being done serially. Now that both paths are parallel, properly tracking the COO to be sorted does not result in a slowdown.

jermainewang · 2021-02-18T11:06:45Z

src/graph/heterograph_capi.cc

+    // setup sorted flags
+    bool row_sorted, col_sorted;
+    std::tie(row_sorted, col_sorted) = COOIsSorted(
+        aten::COOMatrix(num_src, num_dst, row, col));


Adding this check here could potentially slowdown graph construction. We should expose the sorted flags up to the user-facing API:

g = dgl.graph((src, dst), row_sorted=True, col_sorted=True)

Probably also need to expose a utility function for checking sorting status:

row_sorted, col_sorted = dgl.utils.is_sorted_srcdst(src, dst)

We could push the later utility function to next PR. When creating the livejournal graph, we manually set both flags to true.

@jermainewang Given that previously COOIsSorted() was being called at each format conversion, and that it's roughly the cost of copying the graph, would it be fine to make it the default to check?

We could have a flag check_sorted that would default to True, but would be ignored if row_sorted was true. That way power users and internal calls could skip the check:

g = dgl.graph((src, dst)) # IsSorted check is invoked g = dgl.graph((src, dst), check_sorted=False) # IsSorted is not invoked g = dgl.graph((src, dst), row_sorted=True, col_sorted=True) # IsSorted is not invoked g = dgl.graph((src, dst), row_sorted=True) # IsSorted is not invoked

I added the dgl.utils.is_sorted_srcdst in #2685

@jermainewang Given that previously COOIsSorted() was being called at each format conversion, and that it's roughly the cost of copying the graph, would it be fine to make it the default to check?

It is possible that users only want to perform operations on COO thus not involving any format conversion. For example, in graph classification, once each sample is converted to a graph (likely from a COO format), users call dgl.batch immediately before any computation. In this case, adding the sorted check will likely slowdown the graph construction.

In 26e7364 I changed the check_sorted parameter to default to false.

jermainewang · 2021-04-15T06:36:23Z

@nv-dlasalle I pushed some changes myself and approved the PR. I further removed the check_sorted flag in dgl.graph. I think users should explicitly call is_srcdst_sorted to get those flags if they wish to optimize format conversion. DGL just trusts any flags user provided and by default assumes nothing (not sorted). For the next step, we should refactor all our graph datasets to return sorted graphs by default.

…o sorted_check

nv-dlasalle added 8 commits February 9, 2021 08:23

Add row/col sorted flags

10fce63

improve sorting paths

e1e1d21

Remove print statement

37c1144

Keep track of sorted matrices

40ee094

Remove sort check in to_block

abf2b0d

Improve CPU sorted COO->CSR

8e7a41b

Handle the zero edge case

30c27ab

Remove omp default clause to work with MSVC

d8406f5

yzh119 reviewed Feb 14, 2021

View reviewed changes

Update comments on sorted COO->CSR cpu implementatoin

d824573

jermainewang requested changes Feb 18, 2021

View reviewed changes

Expose sorted to python interface

9ab524b

nv-dlasalle mentioned this pull request Feb 19, 2021

[Feature] Add dgl.utils.is_sorted_srcdst() #2685

Merged

6 tasks

nv-dlasalle and others added 4 commits February 22, 2021 09:38

Make check_sorted default to false for dgl.graph()

26e7364

Merge branch 'master' into sorted_check

d1c4d56

remove check sorted; add utests

3b0c318

Merge branch 'master' into sorted_check

037cb3b

jermainewang approved these changes Apr 15, 2021

View reviewed changes

jermainewang added 3 commits April 15, 2021 13:12

remove check_sorted flag

6b1827c

Merge branch 'sorted_check' of https://github.com/nv-dlasalle/dgl int…

6fc48db

…o sorted_check

Merge branch 'master' into sorted_check

2881626

jermainewang merged commit bbebde4 into dmlc:master Apr 16, 2021

erickim555 mentioned this pull request Sep 1, 2021

[Performance] Pass row_sorted, col_sorted from dgl.heterograph() to create_from_edges() #3310

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Performance] Track sorted status of COO from creation #2645

[Performance] Track sorted status of COO from creation #2645

nv-dlasalle commented Feb 9, 2021

yzh119 Feb 13, 2021

jermainewang Feb 18, 2021

yzh119 Feb 14, 2021

nv-dlasalle Feb 17, 2021

yzh119 Feb 14, 2021

nv-dlasalle Feb 17, 2021

jermainewang Feb 18, 2021

nv-dlasalle Feb 19, 2021

nv-dlasalle Feb 19, 2021

jermainewang Feb 20, 2021

nv-dlasalle Feb 22, 2021

jermainewang commented Apr 15, 2021

[Performance] Track sorted status of COO from creation #2645

[Performance] Track sorted status of COO from creation #2645

Conversation

nv-dlasalle commented Feb 9, 2021

Description

Checklist

Changes

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jermainewang commented Apr 15, 2021