Add Flash Attention #19418

innat · 2024-04-01T20:53:37Z

Describe

FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness.

Paper https://arxiv.org/abs/2205.14135
Cited by: 671

Implementation

PyTorch: https://github.com/Dao-AILab/flash-attention
Jax: https://github.com/lucidrains/flash-attention-jax
TensorFlow (with custom ops): https://github.com/intelligent-machine-learning/dlrover/tree/master/tfplus/tfplus/flash_attn

Huggingface https://huggingface.co/docs/text-generation-inference/en/conceptual/flash_attention

Others

Has version 2 of it.

https://arxiv.org/abs/2307.08691

FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning

fchollet · 2024-04-02T04:15:13Z

For JAX, we may want to rely on Pallas. For TF, since we can't rely on custom ops, we may have to skip support.

Presumably we should add it in the form of a new backend op, ops.nn.flash_attention.

google-ml-butler bot added the type:others label Apr 1, 2024

github-actions bot assigned SuryanarayanaY Apr 1, 2024

SuryanarayanaY added type:feature The user is asking for a new feature. stat:awaiting keras-eng Awaiting response from Keras engineer labels Apr 2, 2024

fchollet mentioned this issue Apr 6, 2024

🚀 Contributing to Keras 🚀 #18442

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Flash Attention #19418

Add Flash Attention #19418

innat commented Apr 1, 2024

fchollet commented Apr 2, 2024 •

edited

Loading

Add Flash Attention #19418

Add Flash Attention #19418

Comments

innat commented Apr 1, 2024

fchollet commented Apr 2, 2024 • edited Loading

fchollet commented Apr 2, 2024 •

edited

Loading