Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Flash Attention #19418

Open
innat opened this issue Apr 1, 2024 · 1 comment
Open

Add Flash Attention #19418

innat opened this issue Apr 1, 2024 · 1 comment
Assignees
Labels
stat:awaiting keras-eng Awaiting response from Keras engineer type:feature The user is asking for a new feature. type:others

Comments

@innat
Copy link

innat commented Apr 1, 2024

Describe

FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness.

Paper https://arxiv.org/abs/2205.14135
Cited by: 671

Implementation

Huggingface https://huggingface.co/docs/text-generation-inference/en/conceptual/flash_attention

Others

Has version 2 of it.

https://arxiv.org/abs/2307.08691

FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning

@fchollet
Copy link
Member

fchollet commented Apr 2, 2024

For JAX, we may want to rely on Pallas. For TF, since we can't rely on custom ops, we may have to skip support.

Presumably we should add it in the form of a new backend op, ops.nn.flash_attention.

@SuryanarayanaY SuryanarayanaY added type:feature The user is asking for a new feature. stat:awaiting keras-eng Awaiting response from Keras engineer labels Apr 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stat:awaiting keras-eng Awaiting response from Keras engineer type:feature The user is asking for a new feature. type:others
Projects
None yet
Development

No branches or pull requests

3 participants