Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

qwen2-vl float16 推理 报错 #77

Open
GeLee-Q opened this issue Sep 2, 2024 · 3 comments
Open

qwen2-vl float16 推理 报错 #77

GeLee-Q opened this issue Sep 2, 2024 · 3 comments

Comments

@GeLee-Q
Copy link

GeLee-Q commented Sep 2, 2024

报错代码:

Traceback (most recent call last):
  File "/workspace/qwenvl-dev/test_infer.py", line 47, in <module>
    output_ids = model.generate(**inputs, max_new_tokens=128)
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/transformers/generation/utils.py", line 2015, in generate
    result = self._sample(
  File "/usr/local/lib/python3.10/dist-packages/transformers/generation/utils.py", line 2998, in _sample
    next_tokens = torch.multinomial(probs, num_samples=1).squeeze(1)

运行代码

from PIL import Image
import requests
import torch
from torchvision import io
from typing import Dict
from transformers import Qwen2VLForConditionalGeneration, AutoTokenizer, AutoProcessor


model_dir = "/workspace/qwenvl-dev/Qwen2-VL-2B-Instruct"
# Load the model in half-precision on the available device(s)
model = Qwen2VLForConditionalGeneration.from_pretrained(
    model_dir,
    torch_dtype=torch.float16,  # Explicitly set to float16 for half-precision
    device_map="auto",
)


processor = AutoProcessor.from_pretrained(model_dir)

# Image
image_path = "/workspace/qwenvl-dev/demo.jpeg"
image = Image.open(image_path)

conversation = [
    {
        "role": "user",
        "content": [
            {
                "type": "image",
            },
            {"type": "text", "text": "Describe this image."},
        ],
    }
]


# Preprocess the inputs
text_prompt = processor.apply_chat_template(conversation, add_generation_prompt=True)
# Excepted output: '<|im_start|>system\nYou are a helpful assistant.<|im_end|>\n<|im_start|>user\n<|vision_start|><|image_pad|><|vision_end|>Describe this image.<|im_end|>\n<|im_start|>assistant\n'

inputs = processor(
    text=[text_prompt], images=[image], padding=True, return_tensors="pt"
)
inputs = inputs.to("cuda")

# Inference: Generation of the output
output_ids = model.generate(**inputs, max_new_tokens=128)
generated_ids = [
    output_ids[len(input_ids) :]
    for input_ids, output_ids in zip(inputs.input_ids, output_ids)
]
output_text = processor.batch_decode(
    generated_ids, skip_special_tokens=True, clean_up_tokenization_spaces=True
)
print(output_text)

@GeLee-Q GeLee-Q changed the title qwen2-vl float16 推理 qwen2-vl float16 推理 报错 Sep 2, 2024
@deepbodo
Copy link

deepbodo commented Sep 3, 2024

Any solutions u found.

@GeLee-Q
Copy link
Author

GeLee-Q commented Sep 3, 2024

Any solutions u found.

no, bf16 and float32 are ok

@ShuaiBai623
Copy link
Collaborator

There is a solution huggingface/transformers#33312

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants