Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add buffer pool/arena to enable re-use of temporary buffers during graph execution #108

Merged
merged 5 commits into from
Apr 23, 2024

Conversation

robertknight
Copy link
Owner

@robertknight robertknight commented Apr 22, 2024

Graph execution often spends a significant amount of time allocating or freeing large buffers using the system allocator. So far this is mitigated for some operators by running them in-place on the first input, however there are many important operations which cannot run in-place, and many cases where operators that can run in place do not because an input is needed by a subsequent operation.

This PR introduces a tensor buffer pool (TensorPool), which is created at the start of the graph run, and used by operators as an allocator for their outputs. Once a value is no longer needed by subsequent steps of graph execution, the buffer is added to the pool and made available for use by subsequent steps. New output has been added to the timing report enabled by RTEN_TIMING, reporting the total number of allocation requests to the pool and the hit rate (how often buffer requests were fulfilled from the pool).

The pool is disabled by default and enabled by setting the RTEN_USE_POOL env var. It will be enabled once the majority of operators are converted to allocate from the pool.

To verify this works, a subset of operators have been converted to allocate from the pool, based on ops used by the YOLOv8 example. In this example, this reduces execution times on my laptop from 210-220ms to 180-190ms, and this may improve further when additional operators are converted to use pool allocation.

Most operators do not yet allocate from the pool, and they will be converted in subsequent commits.

TODO:

  • Tests for new Tensor* methods
  • Tests for TensorPool
  • Add feature flag to make the pool opt-in until more operators are converted to use it

This extracts the data buffer from a tensor without making it contiguous.
This will be useful for a tensor pool/arena in the rten crate.
This initializes a `Tensor<MaybeUninit<T>>` by copying data from an existing
view/tensor.
@robertknight robertknight force-pushed the pool-alloc branch 2 times, most recently from 312c665 to 9fb4793 Compare April 23, 2024 06:41
Improve buffer re-use during graph execution by adding a pool from which
operators can allocate output buffers, and into which buffers are added
when their ref count drops to zero (ie. when they are no longer needed
by subsequent graph execution steps). This significantly reduces how
often execution needs to allocate "fresh" buffers from the system
allocator and free them back.

In this initial implementation, a reference to the pool is passed to all
operators via `Operator::run`, but only a subset actually use the pool.
This subset was chosen to benefit the YOLOv8 example.

 - Add `pool` argument to `Operator::run`, specifying a pool from which
   operators should allocate their outputs

 - Create a pool at the start of graph execution and release it at the end.
   Intermediate values that are no longer needed are added to the pool after
   each operator runs.

 - Report the number of allocations from the pools and the hit rate (how often
   the pool was able to satisfy allocations) as part of timing info.

 - Modify an initial subset of allocators to allocate from the pool, based on
   what helps the YOLOv8 example.
If the `RTEN_USE_POOL` env var is set, the pool will be used. Otherwise the pool
is still created, but buffers are never added to it, so all allocations go
through the system allocator as before.
@robertknight robertknight marked this pull request as ready for review April 23, 2024 08:10
@robertknight robertknight merged commit dfa490e into main Apr 23, 2024
2 checks passed
@robertknight robertknight deleted the pool-alloc branch April 23, 2024 08:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant