Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Determinism] Enable environment var to use cusparse spmm deterministic algorithm #7310

Merged
merged 8 commits into from
Apr 19, 2024

Conversation

TristonC
Copy link
Collaborator

Description

Use environment variable USE_DETERMINISTIC_ALG to enable user to pick deterministic cusparse algorithm
for the issue like 7241

Checklist

Please feel free to remove inapplicable items for your PR.

  • The PR title starts with [$CATEGORY] (such as [NN], [Model], [Doc], [Feature]])
  • I've leverage the tools to beautify the python and c++ code.
  • The PR is complete and small, read the Google eng practice (CL equals to PR) to understand more about small PR. In DGL, we consider PRs with less than 200 lines of core code change are small (example, test and documentation could be exempted).
  • All changes have test coverage
  • Code is well-documented
  • To the best of my knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change
  • Related issue is referred in this PR
  • If the PR is for a new model/paper, I've updated the example index here.

Changes

  • Add new environment variable USE_DETERMINISTIC_ALG
  • C++ function SpMMCsr and SpMMCsrHetero will read this environment variable and pass it into CusparseCsrmm2 function
  • If set, cusparseSpMM will use the deterministic CUSPARSE_SPMM_CSR_ALG3 algo

@dgl-bot
Copy link
Collaborator

dgl-bot commented Apr 16, 2024

Not authorized to trigger CI. Please ask core developer to help trigger via issuing comment:

  • @dgl-bot

@dgl-bot
Copy link
Collaborator

dgl-bot commented Apr 16, 2024

Commit ID: fdc5108

Build ID: 1

Status: ❌ CI test failed in Stage [Authentication].

Report path: link

Full logs path: link

@TristonC TristonC changed the title [Feature] Enable environment var to use cusparse spmm deterministic algorithm [Determinism] Enable environment var to use cusparse spmm deterministic algorithm Apr 17, 2024
@dgl-bot
Copy link
Collaborator

dgl-bot commented Apr 17, 2024

Not authorized to trigger CI. Please ask core developer to help trigger via issuing comment:

  • @dgl-bot

@dgl-bot
Copy link
Collaborator

dgl-bot commented Apr 17, 2024

Commit ID: a8ae77c

Build ID: 2

Status: ❌ CI test failed in Stage [Authentication].

Report path: link

Full logs path: link

@mfbalin
Copy link
Collaborator

mfbalin commented Apr 17, 2024

Hi Triston, you can use lintrunner -a command to apply linting. I believe simply running pip install lintrunner should install it.

@TristonC
Copy link
Collaborator Author

TristonC commented Apr 17, 2024

@frozenbugs Should we update the .clang-format file? The original .clang-format failed on local machine. I then generated my local .clang-format and did the lintrunner. But it seems there are some difference here.

[update] Never mind, my local machine clang-format is in old version clang-format version 10.0.0-4ubuntu1. Redid the Linter.

@frozenbugs
Copy link
Collaborator

@dgl-bot

@dgl-bot
Copy link
Collaborator

dgl-bot commented Apr 18, 2024

Commit ID: 9223ad84611f30a2582614c8b5bf18fcbbba4465

Build ID: 3

Status: ✅ CI test succeeded.

Report path: link

Full logs path: link

@dgl-bot
Copy link
Collaborator

dgl-bot commented Apr 18, 2024

Not authorized to trigger CI. Please ask core developer to help trigger via issuing comment:

  • @dgl-bot

@dgl-bot
Copy link
Collaborator

dgl-bot commented Apr 18, 2024

Commit ID: 371b90b

Build ID: 4

Status: ❌ CI test failed in Stage [Authentication].

Report path: link

Full logs path: link

@dgl-bot
Copy link
Collaborator

dgl-bot commented Apr 18, 2024

Not authorized to trigger CI. Please ask core developer to help trigger via issuing comment:

  • @dgl-bot

@dgl-bot
Copy link
Collaborator

dgl-bot commented Apr 18, 2024

Commit ID: 1e5c18d

Build ID: 5

Status: ❌ CI test failed in Stage [Authentication].

Report path: link

Full logs path: link

@dgl-bot
Copy link
Collaborator

dgl-bot commented Apr 18, 2024

Not authorized to trigger CI. Please ask core developer to help trigger via issuing comment:

  • @dgl-bot

@dgl-bot
Copy link
Collaborator

dgl-bot commented Apr 18, 2024

Commit ID: 871fb50

Build ID: 6

Status: ❌ CI test failed in Stage [Authentication].

Report path: link

Full logs path: link

@dgl-bot
Copy link
Collaborator

dgl-bot commented Apr 18, 2024

Not authorized to trigger CI. Please ask core developer to help trigger via issuing comment:

  • @dgl-bot

@dgl-bot
Copy link
Collaborator

dgl-bot commented Apr 18, 2024

Commit ID: 1fedbb0

Build ID: 7

Status: ❌ CI test failed in Stage [Authentication].

Report path: link

Full logs path: link

@mfbalin
Copy link
Collaborator

mfbalin commented Apr 18, 2024

@dgl-bot

@dgl-bot
Copy link
Collaborator

dgl-bot commented Apr 18, 2024

Not authorized to trigger CI. Please ask core developer to help trigger via issuing comment:

  • @dgl-bot

@dgl-bot
Copy link
Collaborator

dgl-bot commented Apr 18, 2024

Commit ID: 037cd4e

Build ID: 9

Status: ❌ CI test failed in Stage [Authentication].

Report path: link

Full logs path: link

@TristonC
Copy link
Collaborator Author

@dgl-bot

@dgl-bot
Copy link
Collaborator

dgl-bot commented Apr 18, 2024

Not authorized to trigger CI via issuing comment.

@dgl-bot
Copy link
Collaborator

dgl-bot commented Apr 18, 2024

Commit ID: 037cd4e

Build ID: 10

Status: ❌ CI test failed in Stage [AuthenticationComment].

Report path: link

Full logs path: link

@mfbalin
Copy link
Collaborator

mfbalin commented Apr 18, 2024

@dgl-bot

@dgl-bot
Copy link
Collaborator

dgl-bot commented Apr 18, 2024

Commit ID: 502b7b3a3183c915a1bf79f8883cd5f430ec023f

Build ID: 8

Status: ⚪️ CI test cancelled due to overrun.

Report path: link

Full logs path: link

@dgl-bot
Copy link
Collaborator

dgl-bot commented Apr 18, 2024

Commit ID: 037cd4e

Build ID: 11

Status: ❌ CI test failed in Stage [Distributed Torch CPU Unit test].

Report path: link

Full logs path: link

@mfbalin
Copy link
Collaborator

mfbalin commented Apr 18, 2024

@dgl-bot

@mfbalin
Copy link
Collaborator

mfbalin commented Apr 18, 2024

Occasional CI failures are expected, it needs to be investigated though.

@dgl-bot
Copy link
Collaborator

dgl-bot commented Apr 18, 2024

Commit ID: 037cd4e

Build ID: 12

Status: ✅ CI test succeeded.

Report path: link

Full logs path: link

@frozenbugs frozenbugs merged commit a4e1969 into dmlc:master Apr 19, 2024
2 checks passed
This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants