Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add gguf support for bloom #33473

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

VladOS95-cyber
Copy link
Contributor

What does this PR do?

Add Bloom GGUF loading support

Fixes # (issue)

Before submitting

Who can review?

Regarding the task @SunMarc @LysandreJik @ArthurZucker .

src/transformers/convert_slow_tokenizer.py Outdated Show resolved Hide resolved
src/transformers/convert_slow_tokenizer.py Outdated Show resolved Hide resolved
src/transformers/convert_slow_tokenizer.py Outdated Show resolved Hide resolved
src/transformers/integrations/ggml.py Show resolved Hide resolved
@VladOS95-cyber
Copy link
Contributor Author

VladOS95-cyber commented Sep 17, 2024

Hi @SunMarc @LysandreJik @ArthurZucker! This PR is ready for review. There is one thing that looks odd to me. After dequantization and loading the model, It genereates a wrong sequence, not as expected when using a normal pretrained model. Instead oftensor([[59414, 15, 473, 3370, 4026, 427, 5894, 861, 473, 912, 5636]]) , it generates smth like [[59414, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15]]. I cannot find a root cause of this problem, i've already checked mapping and so on several times, it should be correct. It looks like that weights are not correct but I am not sure...

@SunMarc
Copy link
Member

SunMarc commented Sep 17, 2024

This PR is ready for review. There is one thing that looks odd to me. After dequantization and loading the model, It genereates a wrong sequence, not as expected when using a normal pretrained model. Instead oftensor([[59414, 15, 473, 3370, 4026, 427, 5894, 861, 473, 912, 5636]]) , it generates smth like [[59414, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15]]. I cannot find a root cause of this problem, i've already checked mapping and so on several times, it should be correct. It looks like that weights are not correct but I am not sure...

Since the model was quantized, THis is normal that it is not behaving the same of the normal pretrained model. Dequantization doesn't recover the precision of the original model. Could you check that it behaves similarly as the original model that was converted to gguf in fp16 precision or full precision ? This way we have a way to compare the model loaded from gguf file.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants