Add FA2 & SDPA support for RoBERTa & XLM-RoBERTa #30450

tomaarsen · 2024-04-24T10:01:56Z

Hello!

Pull Request overview

Add FA2 & SDPA support for RoBERTa
Add FA2 & SDPA support for XLM-RoBERTa

Details

The world of embedding models still very much relies on bert, roberta, xlm_roberta, mpnet, etc., but these model architectures have not yet received the benefits of FA2/SDPA. I'd like to make a start with that today.
I recognize that these models are tricky to change, as BERT especially is tangled in a big web of "Copied from" connections. However, I suspect that I've implemented FA2/SDPA such that it could be extended for a lot of architectures. However, I'd like to get reviews on the current implementation before I potentially expand to new architectures.

Most of the code is based on the Llama2 FA2/SDPA, so it should be fairly familiar. I want to note some limitations:

output_attentions does not work for FA2/SDPA - this is fairly standard.
head_mask does not work for FA2/SDPA.
position_embedding_type with anything other than "absolute" (i.e., the default) does not work for FA2/SDPA.

Additionally, I have yet to write tests & I haven't tested all ways to use these models. Instead, I've only experimented with Sentence Transformers.
For a small RoBERTa-based model (https://huggingface.co/sentence-transformers/all-distilroberta-v1, 82M params), I get about a 10% speedup at one sample and a ~25% speedup at a large batch size with FA2 or SDPA. For a large XLM-RoBERTa-based model (https://huggingface.co/BAAI/bge-m3, 8192 sequence length), the speedup is up to 3x with FA2. Because newer embedding models are using larger sequence lengths, FA2/SDPA will become more important for them.

Before submitting

Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

@ArthurZucker @younesbelkada

If I have a bit of a go-ahead, I can move forward with other architectures. Let me know if you'd like me to work on tests first, though. I'm also aware that the "copies" tests will currently fail due to these changes.

Tom Aarsen

HuggingFaceDocBuilderDev · 2024-04-24T10:20:52Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

huseinzol05 · 2024-05-24T04:17:34Z

thanks for this PR, use sdpa saved a ton of memory for GPU poor like me, super grateful

ArthurZucker · 2024-06-05T11:45:12Z

@tomaarsen do you need a review on this one?

tomaarsen · 2024-06-05T11:48:42Z

@ArthurZucker Would be nice, though there's some conflicts now. I'll be off next week, so I'll be able to take care of the conflicts & any comments starting the 17th again.
Ideally, in the long term I'd like to get FA2/SDPA support for all common encoder architectures (notably BERT, RoBERTa, and their multilingual variants) as this is important for the efficiency of embedding models.

Tom Aarsen

younesbelkada

Looks pretty clean already, thanks a lot @tomaarsen ! Can you make sure to propagate the changes into the encoders that copy from Roberta by running make fix-copies. You would also need to update this file: https://github.com/huggingface/transformers/blob/main/docs/source/en/perf_infer_gpu_one.md to mention Roberta and all other models that now support FA2 & SDPA.
You also need to fix the merge conflicts that should be easy to fix ! 🙏

src/transformers/models/roberta/modeling_roberta.py

Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>

…sen/transformers into feat/roberta_sdpa_fa2make

nbroad1881 · 2024-06-22T03:38:42Z

There is another similar PR by the way: #30510

github-actions · 2024-07-16T08:05:24Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

Add FA2 & SDPA support for RoBERTa & XLM-RoBERTa

e2c82f6

tomaarsen added Flash Attention SDPA labels Apr 24, 2024

tomaarsen marked this pull request as ready for review April 25, 2024 08:55

tomaarsen mentioned this pull request Apr 25, 2024

Allow passing model_args to ST UKPLab/sentence-transformers#2578

Merged

ArthurZucker requested a review from younesbelkada June 7, 2024 08:44

younesbelkada reviewed Jun 7, 2024

View reviewed changes

src/transformers/models/roberta/modeling_roberta.py Outdated Show resolved Hide resolved

tomaarsen and others added 3 commits June 7, 2024 14:19

Remove comment as suggested

9734fa9

Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>

Use correct config names

2ffe9b4

Merge branch 'feat/roberta_sdpa_fa2make' of https://github.com/tomaar…

3b7ae81

…sen/transformers into feat/roberta_sdpa_fa2make

github-actions bot closed this Jul 25, 2024

This pull request was closed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add FA2 & SDPA support for RoBERTa & XLM-RoBERTa #30450

Add FA2 & SDPA support for RoBERTa & XLM-RoBERTa #30450

tomaarsen commented Apr 24, 2024

HuggingFaceDocBuilderDev commented Apr 24, 2024

huseinzol05 commented May 24, 2024

ArthurZucker commented Jun 5, 2024

tomaarsen commented Jun 5, 2024

younesbelkada left a comment

nbroad1881 commented Jun 22, 2024

github-actions bot commented Jul 16, 2024

Add FA2 & SDPA support for RoBERTa & XLM-RoBERTa #30450

Add FA2 & SDPA support for RoBERTa & XLM-RoBERTa #30450

Conversation

tomaarsen commented Apr 24, 2024

Pull Request overview

Details

Before submitting

Who can review?

HuggingFaceDocBuilderDev commented Apr 24, 2024

huseinzol05 commented May 24, 2024

ArthurZucker commented Jun 5, 2024

tomaarsen commented Jun 5, 2024

younesbelkada left a comment

Choose a reason for hiding this comment

nbroad1881 commented Jun 22, 2024

github-actions bot commented Jul 16, 2024