Avoid prepacking weights in RNN operators for short input sequences #74

robertknight · 2024-04-01T08:42:53Z

The RNN operators (LSTM, GRU) prepacked weights before looping over the sequence in order to amortize packing costs. If the sequence is short however, then the operator may run much faster without prepacking, since prepacking requires extra memory.

In the TTS system at https://github.com/robertknight/xd-tts/tree/rten-inference, the decoder loop executes the LSTM with a sequence length of 1 each time. In that case prepacking is completely wasted. By disabling prepacking for short sequences, audio generation speed improves from ~1.6x realtime to ~0.5x realtime. This gets much closer to ONNX Runtime generation speed (~0.25x realtime).

The current threshold for prepacking is a short value that seems reasonable. It could be refined by doing actual testing with a variety of sequence lengths.

The RNN operators (LSTM, GRU) prepacked weights before looping over the sequence in order to amortize packing costs. If the sequence is short however, then the operator may run much faster without prepacking, since prepacking requires extra memory. In the TTS system at https://github.com/robertknight/xd-tts/tree/rten-inference, the decoder loop executes the LSTM with a sequence length of 1 each time. In that case prepacking is completely wasted. By disabling prepacking for short sequences, audio generation speed improves from ~1.6x realtime to ~0.5x realtime. The current threshold for prepacking is a short value that seems reasonable. It could be refined by doing actual testing with a variety of sequence lengths.

robertknight merged commit 479ceed into main Apr 1, 2024
2 checks passed

robertknight deleted the conditional-rnn-prepack branch April 1, 2024 08:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Avoid prepacking weights in RNN operators for short input sequences #74

Avoid prepacking weights in RNN operators for short input sequences #74

robertknight commented Apr 1, 2024 •

edited

Loading

Avoid prepacking weights in RNN operators for short input sequences #74

Avoid prepacking weights in RNN operators for short input sequences #74

Conversation

robertknight commented Apr 1, 2024 • edited Loading

robertknight commented Apr 1, 2024 •

edited

Loading