Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid prepacking weights in RNN operators for short input sequences #74

Merged
merged 1 commit into from
Apr 1, 2024

Conversation

robertknight
Copy link
Owner

@robertknight robertknight commented Apr 1, 2024

The RNN operators (LSTM, GRU) prepacked weights before looping over the sequence in order to amortize packing costs. If the sequence is short however, then the operator may run much faster without prepacking, since prepacking requires extra memory.

In the TTS system at https://github.com/robertknight/xd-tts/tree/rten-inference, the decoder loop executes the LSTM with a sequence length of 1 each time. In that case prepacking is completely wasted. By disabling prepacking for short sequences, audio generation speed improves from ~1.6x realtime to ~0.5x realtime. This gets much closer to ONNX Runtime generation speed (~0.25x realtime).

The current threshold for prepacking is a short value that seems reasonable. It could be refined by doing actual testing with a variety of sequence lengths.

The RNN operators (LSTM, GRU) prepacked weights before looping over the sequence
in order to amortize packing costs. If the sequence is short however, then
the operator may run much faster without prepacking, since prepacking requires
extra memory.

In the TTS system at https://github.com/robertknight/xd-tts/tree/rten-inference,
the decoder loop executes the LSTM with a sequence length of 1 each time. In
that case prepacking is completely wasted. By disabling prepacking for short
sequences, audio generation speed improves from ~1.6x realtime to ~0.5x
realtime.

The current threshold for prepacking is a short value that seems reasonable.
It could be refined by doing actual testing with a variety of sequence lengths.
@robertknight robertknight merged commit 479ceed into main Apr 1, 2024
2 checks passed
@robertknight robertknight deleted the conditional-rnn-prepack branch April 1, 2024 08:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant