Reduce allocations during graph execution #243

robertknight · 2024-06-20T21:59:57Z

This is a collection of micro-optimizations to reduce from overhead when executing a graph. Mostly this involves using SmallVec instead of Vec in cases where we expect the number of items will almost always be small.

Also I found that the overhead of creating tensor views could be reduced by implementing Clone manually for DynLayout, in order to use a SmallVec fast path for Copy-able items.

In the test scenario described in #239 this decreases time spent doing non-compute work on the "main" thread (the one where Graph::run_plan runs) by ~3%.

Given that: - Node IDs are incrementally assigned (because they are indices into an array) - A significant proportion of the valid node IDs are value nodes and will be assigned a refcount at some point during the run - The maximum refcount for each value is typically small Then it works out to be more efficient to use an array of refcounts rather than a hash map.

Refactor internals to reduce the number of allocations required in the likely case that the number of inputs is small.

- Change `InputList` to store inputs as a `Cow` - Change `Graph::run_plan` to use a `SmallVec` to store the input list for operators In the process I ran into a borrow-checking issue in the `Slice` op that I didn't get to the bottom of, but found a workaround for.

`<SmallVec as Clone>::clone` internally uses `SmallVec::from`. We want to use the more efficient `SmallVec::from_slice` instead since we know we're dealing with `Copy` items.

robertknight added 5 commits June 20, 2024 21:38

Reduce allocations in Concat operator

509c407

Refactor internals to reduce the number of allocations required in the likely case that the number of inputs is small.

Use SmallVec to save a few allocations in Gather, Slice, layout ops

44d0e56

Implement Clone manually for DynLayout

f9062e6

`<SmallVec as Clone>::clone` internally uses `SmallVec::from`. We want to use the more efficient `SmallVec::from_slice` instead since we know we're dealing with `Copy` items.

robertknight merged commit cd07259 into main Jun 20, 2024
2 checks passed

robertknight deleted the gpt2-micro-opt branch June 20, 2024 22:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reduce allocations during graph execution #243

Reduce allocations during graph execution #243

robertknight commented Jun 20, 2024 •

edited

Loading

Reduce allocations during graph execution #243

Reduce allocations during graph execution #243

Conversation

robertknight commented Jun 20, 2024 • edited Loading

robertknight commented Jun 20, 2024 •

edited

Loading