Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce allocations during graph execution #243

Merged
merged 5 commits into from
Jun 20, 2024
Merged

Conversation

robertknight
Copy link
Owner

@robertknight robertknight commented Jun 20, 2024

This is a collection of micro-optimizations to reduce from overhead when executing a graph. Mostly this involves using SmallVec instead of Vec in cases where we expect the number of items will almost always be small.

Also I found that the overhead of creating tensor views could be reduced by implementing Clone manually for DynLayout, in order to use a SmallVec fast path for Copy-able items.

In the test scenario described in #239 this decreases time spent doing non-compute work on the "main" thread (the one where Graph::run_plan runs) by ~3%.

Given that:

 - Node IDs are incrementally assigned (because they are indices into an
   array)
 - A significant proportion of the valid node IDs are value nodes and
   will be assigned a refcount at some point during the run
 - The maximum refcount for each value is typically small

Then it works out to be more efficient to use an array of refcounts rather than
a hash map.
Refactor internals to reduce the number of allocations required in the likely
case that the number of inputs is small.
 - Change `InputList` to store inputs as a `Cow`
 - Change `Graph::run_plan` to use a `SmallVec` to store the input list
   for operators

In the process I ran into a borrow-checking issue in the `Slice` op that
I didn't get to the bottom of, but found a workaround for.
`<SmallVec as Clone>::clone` internally uses `SmallVec::from`. We want to use
the more efficient `SmallVec::from_slice` instead since we know we're dealing
with `Copy` items.
@robertknight robertknight merged commit cd07259 into main Jun 20, 2024
2 checks passed
@robertknight robertknight deleted the gpt2-micro-opt branch June 20, 2024 22:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant