Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tune parallelism in Softmax op #258

Merged
merged 1 commit into from
Jun 29, 2024
Merged

Tune parallelism in Softmax op #258

merged 1 commit into from
Jun 29, 2024

Conversation

robertknight
Copy link
Owner

  • Avoid parallelism entirely if the total number of output elements is small
  • If the output is large enough to justify parallelism but the lanes are small, process multiple lanes in each parallel chunk

The smaller GPT-2 models for example have many small softmaxes, for which avoiding parallelism entirely is more efficient.

 - Avoid parallelism entirely if the total number of output elements is small
 - If the output is large enough to justify parallelism but the lanes are small,
   process multiple lanes in each parallel chunk

The smaller GPT-2 models for example have many small softmaxes, for which avoid
parallelism entirely is more efficient.
@robertknight robertknight merged commit 9c51bd7 into main Jun 29, 2024
2 checks passed
@robertknight robertknight deleted the no-par-small-softmax branch June 29, 2024 11:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant