Tune parallelism in Softmax op #258

robertknight · 2024-06-29T11:35:47Z

Avoid parallelism entirely if the total number of output elements is small
If the output is large enough to justify parallelism but the lanes are small, process multiple lanes in each parallel chunk

The smaller GPT-2 models for example have many small softmaxes, for which avoiding parallelism entirely is more efficient.

- Avoid parallelism entirely if the total number of output elements is small - If the output is large enough to justify parallelism but the lanes are small, process multiple lanes in each parallel chunk The smaller GPT-2 models for example have many small softmaxes, for which avoid parallelism entirely is more efficient.

robertknight merged commit 9c51bd7 into main Jun 29, 2024
2 checks passed

robertknight deleted the no-par-small-softmax branch June 29, 2024 11:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tune parallelism in Softmax op #258

Tune parallelism in Softmax op #258

robertknight commented Jun 29, 2024

Tune parallelism in Softmax op #258

Tune parallelism in Softmax op #258

Conversation

robertknight commented Jun 29, 2024