Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Import PyTorch's pin_memory() method for DGL graph structure #5366

Merged
merged 23 commits into from
Apr 11, 2023

Conversation

chang-l
Copy link
Collaborator

@chang-l chang-l commented Feb 22, 2023

Description

This PR imports pytorch's CachingHostAllocator [ref] and its associated functions (like CachingHostAllocator_recordEvent and CachingHostAllocator_emptyCache) to DGL's tensoradaptor. It enhances/enriches DGL's native memory pinning strategy (cudaHostRegister), by using backend's pinned memory pool (viacudaHostAlloc) to avoid overhead from cudaHostRegister and frequent memory allocations/deallocations.

While cudaHostRegister (DGLGraph.pin_memory_) is more suitable to pin large data (like entire feature/graph) ONCE for future frequent GPU accesses (no memory allocation involved but with the compromising of speed), pytorch's pinned memory pool (DGLGraph(Index).pin_memory) is better for objects (small memory blocks) that could be created/destroyed often (like DGLGraph/feature for batches).

Based on this PR, it would be easier to implement pin_memory() method for DGLGraph object, and let PyTorch's dataloader pick it up when pin_memory=True to enable async H2D copying of (sub)graph structure (or feature if needed). Complete dataloader support should be included in a separate PR.

Checklist

Please feel free to remove inapplicable items for your PR.

  • The PR title starts with [$CATEGORY] (such as [NN], [Model], [Doc], [Feature]])
  • I've leverage the tools to beautify the python and c++ code.
  • The PR is complete and small, read the Google eng practice (CL equals to PR) to understand more about small PR. In DGL, we consider PRs with less than 200 lines of core code change are small (example, test and documentation could be exempted).
  • All changes have test coverage
  • Code is well-documented
  • To the best of my knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change
  • Related issue is referred in this PR
  • If the PR is for a new model/paper, I've updated the example index here.

Changes

C++ (DGL core):

  • Imported getCachingHostAllocator, CachingHostAllocator_recordEvent and CachingHostAllocator_emptyCache to completely support pytorch's cachingHostAllocator.
  • Added/propagated pin_memory* methods to all necessary classes(ndarry/csr/coo/unitgraph/heterograph).
  • Added RecordedCopyDataFromTo to fuse data copy (H2D) and following CachingHostAllocator_recordEvent to manage the (pinned)resource consistently withbackend PyTorch.
  • Empty the caching pool if calling cudaHostRegister to release resources/prepare enough page-locked memory needed by cudaHostRegister

Python:

  • Added outplace pin_memory() for heterograph_index
  • Unit test

To-dos

(maybe need separate PRs)

Python:

  • Add outplace pin_memory() for heterograph or other equivalents to integrate with dataloader.

@dgl-bot
Copy link
Collaborator

dgl-bot commented Feb 22, 2023

To trigger regression tests:

  • @dgl-bot run [instance-type] [which tests] [compare-with-branch];
    For example: @dgl-bot run g4dn.4xlarge all dmlc/master or @dgl-bot run c5.9xlarge kernel,api dmlc/master

@dgl-bot

This comment was marked as outdated.

@chang-l
Copy link
Collaborator Author

chang-l commented Feb 22, 2023

@jermainewang Pls help add other reviewers. Thanks.

@dgl-bot

This comment was marked as outdated.

@dgl-bot

This comment was marked as outdated.

@dgl-bot

This comment was marked as outdated.

@dgl-bot

This comment was marked as outdated.

@dgl-bot

This comment was marked as outdated.

src/graph/heterograph.h Outdated Show resolved Hide resolved
src/graph/unit_graph.cc Outdated Show resolved Hide resolved
src/graph/unit_graph.cc Outdated Show resolved Hide resolved
src/graph/unit_graph.cc Outdated Show resolved Hide resolved
src/runtime/cuda/cuda_device_api.cc Outdated Show resolved Hide resolved
tests/python/common/test_heterograph_index.py Outdated Show resolved Hide resolved
tests/python/common/test_heterograph_index.py Outdated Show resolved Hide resolved
tests/python/common/test_heterograph_index.py Outdated Show resolved Hide resolved
tests/python/common/test_heterograph_index.py Outdated Show resolved Hide resolved
tests/python/common/test_heterograph_index.py Outdated Show resolved Hide resolved
@frozenbugs
Copy link
Collaborator

Hi Chang, sorry being picky on the code quality. Overall the PR looks very good to me, just some small tweak to make it shiny.

@chang-l
Copy link
Collaborator Author

chang-l commented Feb 24, 2023

No worries @frozenbugs I am always happy to iterate and improve in every aspect.

Thank you for taking the time and review this PR. I will address them all tomorrow.

@dgl-bot

This comment was marked as outdated.

Copy link
Collaborator

@yaox12 yaox12 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM except for some comments issues.

include/dgl/runtime/ndarray.h Outdated Show resolved Hide resolved
include/dgl/runtime/ndarray.h Show resolved Hide resolved
include/dgl/runtime/ndarray.h Show resolved Hide resolved
src/graph/unit_graph.cc Outdated Show resolved Hide resolved
src/graph/unit_graph.cc Outdated Show resolved Hide resolved
src/graph/unit_graph.h Outdated Show resolved Hide resolved
include/dgl/aten/coo.h Outdated Show resolved Hide resolved
include/dgl/runtime/ndarray.h Show resolved Hide resolved
include/dgl/runtime/tensordispatch.h Outdated Show resolved Hide resolved
python/dgl/heterograph_index.py Outdated Show resolved Hide resolved
src/graph/unit_graph.cc Outdated Show resolved Hide resolved
src/runtime/ndarray.cc Outdated Show resolved Hide resolved
include/dgl/runtime/device_api.h Outdated Show resolved Hide resolved
include/dgl/runtime/ndarray.h Outdated Show resolved Hide resolved
include/dgl/runtime/ndarray.h Outdated Show resolved Hide resolved
include/dgl/runtime/ndarray.h Outdated Show resolved Hide resolved
include/dgl/aten/coo.h Outdated Show resolved Hide resolved
include/dgl/aten/coo.h Outdated Show resolved Hide resolved
include/dgl/runtime/device_api.h Show resolved Hide resolved
include/dgl/runtime/device_api.h Outdated Show resolved Hide resolved
src/runtime/cpu_device_api.cc Outdated Show resolved Hide resolved
src/runtime/cpu_device_api.cc Show resolved Hide resolved
src/runtime/cpu_device_api.cc Show resolved Hide resolved
src/runtime/cuda/cuda_device_api.cc Show resolved Hide resolved
src/runtime/cuda/cuda_device_api.cc Show resolved Hide resolved
include/dgl/aten/coo.h Outdated Show resolved Hide resolved
include/dgl/runtime/ndarray.h Outdated Show resolved Hide resolved
include/dgl/runtime/ndarray.h Outdated Show resolved Hide resolved
include/dgl/runtime/tensordispatch.h Outdated Show resolved Hide resolved
include/dgl/runtime/tensordispatch.h Outdated Show resolved Hide resolved
src/runtime/cuda/cuda_device_api.cc Outdated Show resolved Hide resolved
src/runtime/cuda/cuda_device_api.cc Outdated Show resolved Hide resolved
src/runtime/cuda/cuda_device_api.cc Outdated Show resolved Hide resolved
src/runtime/ndarray.cc Outdated Show resolved Hide resolved
tensoradapter/include/tensoradapter.h Outdated Show resolved Hide resolved
include/dgl/aten/coo.h Outdated Show resolved Hide resolved
include/dgl/aten/csr.h Outdated Show resolved Hide resolved
include/dgl/runtime/ndarray.h Outdated Show resolved Hide resolved
include/dgl/runtime/ndarray.h Outdated Show resolved Hide resolved
src/graph/unit_graph.cc Outdated Show resolved Hide resolved
src/runtime/ndarray.cc Outdated Show resolved Hide resolved
src/runtime/ndarray.cc Outdated Show resolved Hide resolved
include/dgl/aten/coo.h Outdated Show resolved Hide resolved
include/dgl/aten/csr.h Outdated Show resolved Hide resolved
src/runtime/ndarray.cc Outdated Show resolved Hide resolved
include/dgl/aten/csr.h Outdated Show resolved Hide resolved
@chang-l chang-l force-pushed the port-caching-host-allocator branch from 2be8547 to bd4ff7a Compare March 22, 2023 06:16
@dgl-bot
Copy link
Collaborator

dgl-bot commented Mar 22, 2023

Commit ID: 2c6f7b087423f22420c7544446d7d8bfe8b37fb0

Build ID: 19

Status: ⚪️ CI test cancelled due to overrun.

Report path: link

Full logs path: link

@dgl-bot
Copy link
Collaborator

dgl-bot commented Mar 22, 2023

Commit ID: 4616369cecd36909b298b466cd31e1a38be2cc6e

Build ID: 20

Status: ❌ CI test failed in Stage [Tensorflow CPU Unit test].

Report path: link

Full logs path: link

@dgl-bot
Copy link
Collaborator

dgl-bot commented Mar 29, 2023

Commit ID: cf08a1c

Build ID: 21

Status: ⚪️ CI test cancelled due to overrun.

Report path: link

Full logs path: link

@dgl-bot
Copy link
Collaborator

dgl-bot commented Mar 29, 2023

Commit ID: 32dcb35

Build ID: 22

Status: ❌ CI test failed in Stage [Lint Check].

Report path: link

Full logs path: link

@chang-l
Copy link
Collaborator Author

chang-l commented Mar 29, 2023

Sorry for the delay. Ready for the next round @frozenbugs @BarclayII @yaox12

@dgl-bot
Copy link
Collaborator

dgl-bot commented Mar 29, 2023

Commit ID: ebcfa02

Build ID: 23

Status: ❌ CI test failed in Stage [Lint Check].

Report path: link

Full logs path: link

@chang-l chang-l force-pushed the port-caching-host-allocator branch from ebcfa02 to 0353012 Compare March 29, 2023 05:51
@chang-l chang-l force-pushed the port-caching-host-allocator branch from 0353012 to 5249390 Compare March 29, 2023 05:58
@dgl-bot
Copy link
Collaborator

dgl-bot commented Mar 29, 2023

Commit ID: 9908b4b55d09ef87b36781b11112b70f026e2605

Build ID: 24

Status: ⚪️ CI test cancelled due to overrun.

Report path: link

Full logs path: link

@dgl-bot
Copy link
Collaborator

dgl-bot commented Mar 29, 2023

Commit ID: 1407e1912b7b06fb15dabdfe143b3bc2a6d0bb42

Build ID: 25

Status: ❌ CI test failed in Stage [Torch CPU (Win64) Unit test].

Report path: link

Full logs path: link

include/dgl/runtime/ndarray.h Outdated Show resolved Hide resolved
include/dgl/runtime/ndarray.h Outdated Show resolved Hide resolved
include/dgl/runtime/ndarray.h Outdated Show resolved Hide resolved
include/dgl/runtime/ndarray.h Outdated Show resolved Hide resolved
include/dgl/runtime/ndarray.h Outdated Show resolved Hide resolved
src/runtime/ndarray.cc Outdated Show resolved Hide resolved
src/runtime/ndarray.cc Outdated Show resolved Hide resolved
src/runtime/ndarray.cc Outdated Show resolved Hide resolved
tensoradapter/pytorch/torch.cpp Outdated Show resolved Hide resolved
tests/python/common/test_heterograph_index.py Outdated Show resolved Hide resolved
@dgl-bot
Copy link
Collaborator

dgl-bot commented Mar 31, 2023

Commit ID: ced69b9

Build ID: 26

Status: ✅ CI test succeeded.

Report path: link

Full logs path: link

Copy link
Collaborator

@frozenbugs frozenbugs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, overall this is a giant PR which we should try to avoid if we can.

@chang-l chang-l requested a review from BarclayII April 5, 2023 03:56
@dgl-bot
Copy link
Collaborator

dgl-bot commented Apr 5, 2023

Commit ID: b6090eb

Build ID: 27

Status: ✅ CI test succeeded.

Report path: link

Full logs path: link

Copy link
Collaborator

@BarclayII BarclayII left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm good. Thanks a lot!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants