Add gguf support for bloom #33473

VladOS95-cyber · 2024-09-13T14:14:44Z

What does this PR do?

Add Bloom GGUF loading support

Fixes # (issue)

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case. Link: Community contribution: Adding GGUF support for more architectures #33260
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Regarding the task @SunMarc @LysandreJik @ArthurZucker .

src/transformers/convert_slow_tokenizer.py

src/transformers/integrations/ggml.py

VladOS95-cyber · 2024-09-17T13:53:43Z

Hi @SunMarc @LysandreJik @ArthurZucker! This PR is ready for review. There is one thing that looks odd to me. After dequantization and loading the model, It genereates a wrong sequence, not as expected when using a normal pretrained model. Instead oftensor([[59414, 15, 473, 3370, 4026, 427, 5894, 861, 473, 912, 5636]]) , it generates smth like [[59414, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15]]. I cannot find a root cause of this problem, i've already checked mapping and so on several times, it should be correct. It looks like that weights are not correct but I am not sure...

SunMarc · 2024-09-17T17:47:07Z

This PR is ready for review. There is one thing that looks odd to me. After dequantization and loading the model, It genereates a wrong sequence, not as expected when using a normal pretrained model. Instead oftensor([[59414, 15, 473, 3370, 4026, 427, 5894, 861, 473, 912, 5636]]) , it generates smth like [[59414, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15]]. I cannot find a root cause of this problem, i've already checked mapping and so on several times, it should be correct. It looks like that weights are not correct but I am not sure...

Since the model was quantized, THis is normal that it is not behaving the same of the normal pretrained model. Dequantization doesn't recover the precision of the original model. Could you check that it behaves similarly as the original model that was converted to gguf in fp16 precision or full precision ? This way we have a way to compare the model loaded from gguf file.

akx reviewed Sep 14, 2024

View reviewed changes

VladOS95-cyber added 3 commits September 17, 2024 15:47

add bloom arch support for gguf

411a968

apply format

a33ebcd

small refactoring, bug fix in GGUF_TENSOR_MAPPING naming

c23788b

VladOS95-cyber force-pushed the add-GGUF-support-for-Bloom branch from 6f3e643 to c23788b Compare September 17, 2024 13:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add gguf support for bloom #33473

Add gguf support for bloom #33473

VladOS95-cyber commented Sep 13, 2024

VladOS95-cyber commented Sep 17, 2024 •

edited

Loading

SunMarc commented Sep 17, 2024

Add gguf support for bloom #33473

Are you sure you want to change the base?

Add gguf support for bloom #33473

Conversation

VladOS95-cyber commented Sep 13, 2024

What does this PR do?

Before submitting

Who can review?

VladOS95-cyber commented Sep 17, 2024 • edited Loading

SunMarc commented Sep 17, 2024

VladOS95-cyber commented Sep 17, 2024 •

edited

Loading