Add Flash Attention #19418
Labels
stat:awaiting keras-eng
Awaiting response from Keras engineer
type:feature
The user is asking for a new feature.
type:others
Describe
FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness.
Paper https://arxiv.org/abs/2205.14135
Cited by: 671
Implementation
Huggingface https://huggingface.co/docs/text-generation-inference/en/conceptual/flash_attention
Others
Has version 2 of it.
https://arxiv.org/abs/2307.08691
FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning
The text was updated successfully, but these errors were encountered: