Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hosting ChatUniVi in Sanic Application: ZeroDivisionError #52

Open
ervinweelzwork opened this issue Jul 24, 2024 · 0 comments
Open

Hosting ChatUniVi in Sanic Application: ZeroDivisionError #52

ervinweelzwork opened this issue Jul 24, 2024 · 0 comments

Comments

@ervinweelzwork
Copy link

Problem Description

I'm currently working on integrating ChatUniVi into a Sanic web application and encountered an unexpected error during runtime. The error occurs when attempting to load a pretrained model using the load_pretrained_model function from the ChatUniVi package. The traceback provided below outlines the sequence of events leading to the error.

Error Traceback

Traceback (most recent call last):
  File "C:\Users\Ervin\Documents\Chat-UniVi\main.py", line 22, in <module>
    tokenizer, _, _, _ = load_pretrained_model(model_path, None, model_name)
  File "C:\Users\Ervin\Documents\Chat-UniVi\ChatUniVi\model\builder.py", line 75, in load_pretrained_model
    model = AutoModelForCausalLM.from_pretrained(model_path, low_cpu_mem_usage=True, **kwargs)
  File "C:\Users\Ervin\anaconda3\envs\chatunivi\lib\site-packages\transformers\models\auto\auto_factory.py", line 563, in from_pretrained
    return model_class.from_pretrained(
  File "C:\Users\Ervin\anaconda3\envs\chatunivi\lib\site-packages\transformers\modeling_utils.py", line 3685, in from_pretrained
    max_memory = get_balanced_memory(
  File "C:\Users\Ervin\anaconda3\envs\chatunivi\lib\site-packages\accelerate\utils\modeling.py", line 753, in get_balanced_memory
    per_gpu = module_sizes[""] // (num_devices - 1 if low_zero else num_devices)
ZeroDivisionError: integer division or modulo by zero

Steps to Reproduce

  1. Ensure all dependencies are installed and up-to-date.
  2. Set the model_path and model_name variables in main.py.
  3. Run the application. Error immediately appears

Expected Behavior
The application should successfully load the pretrained model and serve it through the defined endpoints (/img and /vid) without throwing errors.

Environment
Python version: 3.10.4
Sanic version: 24.6.0
Operating System: Windows 10

Requested Action
Could you please provide guidance on how to resolve this issue? Thank you for your assistance.

main.py

import torch
import os
from ChatUniVi.constants import *
from ChatUniVi.conversation import conv_templates, SeparatorStyle
from ChatUniVi.model.builder import load_pretrained_model
from ChatUniVi.utils import disable_torch_init
from ChatUniVi.mm_utils import tokenizer_image_token, get_model_name_from_path, KeywordsStoppingCriteria
from PIL import Image
from decord import VideoReader, cpu
import numpy as np

from sanic import Sanic, json, response
from sanic.log import logger

app = Sanic("ChatUnivi_SANIC")


global model, tokenizer, image_processor
model_path = "C:/Users/Ervin/Documents/Chat-UniVi/model"  # Adjust the path as necessary
model_name = "ChatUniVi"
_, model, image_processor, _ = load_pretrained_model(model_path, None, model_name , low_cpu_mem_usage=False)
tokenizer, _, _, _ = load_pretrained_model(model_path, None, model_name, low_cpu_mem_usage=False)
model.eval()  # Set the model to evaluation mode




def _get_rawvideo_dec(video_path, image_processor, max_frames=MAX_IMAGE_LENGTH, image_resolution=224, video_framerate=1, s=None, e=None):
    # speed up video decode via decord.

    if s is None:
        start_time, end_time = None, None
    else:
        start_time = int(s)
        end_time = int(e)
        start_time = start_time if start_time >= 0. else 0.
        end_time = end_time if end_time >= 0. else 0.
        if start_time > end_time:
            start_time, end_time = end_time, start_time
        elif start_time == end_time:
            end_time = start_time + 1

    if os.path.exists(video_path):
        vreader = VideoReader(video_path, ctx=cpu(0))
    else:
        print(video_path)
        raise FileNotFoundError

    fps = vreader.get_avg_fps()
    f_start = 0 if start_time is None else int(start_time * fps)
    f_end = int(min(1000000000 if end_time is None else end_time * fps, len(vreader) - 1))
    num_frames = f_end - f_start + 1
    if num_frames > 0:
        # T x 3 x H x W
        sample_fps = int(video_framerate)
        t_stride = int(round(float(fps) / sample_fps))

        all_pos = list(range(f_start, f_end + 1, t_stride))
        if len(all_pos) > max_frames:
            sample_pos = [all_pos[_] for _ in np.linspace(0, len(all_pos) - 1, num=max_frames, dtype=int)]
        else:
            sample_pos = all_pos

        patch_images = [Image.fromarray(f) for f in vreader.get_batch(sample_pos).asnumpy()]

        patch_images = torch.stack([image_processor.preprocess(img, return_tensors='pt')['pixel_values'][0] for img in patch_images])
        slice_len = patch_images.shape[0]

        return patch_images, slice_len
    else:
        print("video path: {} error.".format(video_path))


@app.post('/img')
async def img(request):
        # Model Parameter
        # model_path = "C:/Users/Ervin/Documents/Chat-UniVi/model"  # or "Chat-UniVi/Chat-UniVi-13B"
        # model_path = "Chat-UniVi/Chat-UniVi"  # or "Chat-UniVi/Chat-UniVi-13B"
        image_path = "sam.jpg"

        # Input Text
        qs = "Describe the image."

        # Sampling Parameter
        conv_mode = "simple"
        temperature = 0.2
        top_p = None
        num_beams = 1

        disable_torch_init()
        # model_path = os.path.expanduser(model_path)
        # model_name = "ChatUniVi"
        # tokenizer, model, image_processor, context_len = load_pretrained_model(model_path, None, model_name)

        mm_use_im_start_end = getattr(model.config, "mm_use_im_start_end", False)
        mm_use_im_patch_token = getattr(model.config, "mm_use_im_patch_token", True)
        if mm_use_im_patch_token:
            tokenizer.add_tokens([DEFAULT_IMAGE_PATCH_TOKEN], special_tokens=True)
        if mm_use_im_start_end:
            tokenizer.add_tokens([DEFAULT_IM_START_TOKEN, DEFAULT_IM_END_TOKEN], special_tokens=True)
        model.resize_token_embeddings(len(tokenizer))

        vision_tower = model.get_vision_tower()
        if not vision_tower.is_loaded:
            vision_tower.load_model()
        image_processor = vision_tower.image_processor

        # Check if the video exists
        if image_path is not None:
            cur_prompt = qs
            if model.config.mm_use_im_start_end:
                qs = DEFAULT_IM_START_TOKEN + DEFAULT_IMAGE_TOKEN + DEFAULT_IM_END_TOKEN + '\n' + qs
            else:
                qs = DEFAULT_IMAGE_TOKEN + '\n' + qs

            conv = conv_templates[conv_mode].copy()
            conv.append_message(conv.roles[0], qs)
            conv.append_message(conv.roles[1], None)
            prompt = conv.get_prompt()

            input_ids = tokenizer_image_token(prompt, tokenizer, IMAGE_TOKEN_INDEX, return_tensors='pt').unsqueeze(0).cuda()

            image = Image.open(image_path)
            image_tensor = image_processor.preprocess(image, return_tensors='pt')['pixel_values'][0]

            stop_str = conv.sep if conv.sep_style != SeparatorStyle.TWO else conv.sep2
            keywords = [stop_str]
            stopping_criteria = KeywordsStoppingCriteria(keywords, tokenizer, input_ids)

            with torch.inference_mode():
                output_ids = model.generate(
                    input_ids,
                    images=image_tensor.unsqueeze(0).half().cuda(),
                    do_sample=True,
                    temperature=temperature,
                    top_p=top_p,
                    num_beams=num_beams,
                    max_new_tokens=1024,
                    use_cache=True,
                    stopping_criteria=[stopping_criteria])

            input_token_len = input_ids.shape[1]
            n_diff_input_output = (input_ids != output_ids[:, :input_token_len]).sum().item()
            if n_diff_input_output > 0:
                print(f'[Warning] {n_diff_input_output} output_ids are not the same as the input_ids')
            outputs = tokenizer.batch_decode(output_ids[:, input_token_len:], skip_special_tokens=True)[0]
            outputs = outputs.strip()
            if outputs.endswith(stop_str):
                outputs = outputs[:-len(stop_str)]
            outputs = outputs.strip()
            print(outputs)
        return json(outputs)
    
@app.post("/vid")
async def vid(request):
    # model_path = "C:/Users/Ervin/Documents/Chat-UniVi/model"  # or "Chat-UniVi/Chat-UniVi-13B"
    video_path = "Indo.mp4"

    # The number of visual tokens varies with the length of the video. "max_frames" is the maximum number of frames.
    # When the video is long, we will uniformly downsample the video to meet the frames when equal to the "max_frames".
    max_frames = 100

    # The number of frames retained per second in the video.
    video_framerate = 1

    # Input Text
    qs = "Describe the video."

    # Sampling Parameter
    conv_mode = "simple"
    temperature = 0.2
    top_p = None
    num_beams = 1

    disable_torch_init()
    # model_path = os.path.expanduser(model_path)
    # model_name = "ChatUniVi"
    # tokenizer, model, image_processor, context_len = load_pretrained_model(model_path, None, model_name)

    mm_use_im_start_end = getattr(model.config, "mm_use_im_start_end", False)
    mm_use_im_patch_token = getattr(model.config, "mm_use_im_patch_token", True)
    if mm_use_im_patch_token:
        tokenizer.add_tokens([DEFAULT_IMAGE_PATCH_TOKEN], special_tokens=True)
    if mm_use_im_start_end:
        tokenizer.add_tokens([DEFAULT_IM_START_TOKEN, DEFAULT_IM_END_TOKEN], special_tokens=True)
    model.resize_token_embeddings(len(tokenizer))

    vision_tower = model.get_vision_tower()
    if not vision_tower.is_loaded:
        vision_tower.load_model()
    image_processor = vision_tower.image_processor

    if model.config.config["use_cluster"]:
        for n, m in model.named_modules():
            m = m.to(dtype=torch.bfloat16)

    # Check if the video exists
    if video_path is not None:
        video_frames, slice_len = _get_rawvideo_dec(video_path, image_processor, max_frames=max_frames, video_framerate=video_framerate)

        cur_prompt = qs
        if model.config.mm_use_im_start_end:
            qs = DEFAULT_IM_START_TOKEN + DEFAULT_IMAGE_TOKEN * slice_len + DEFAULT_IM_END_TOKEN + '\n' + qs
        else:
            qs = DEFAULT_IMAGE_TOKEN * slice_len + '\n' + qs

        conv = conv_templates[conv_mode].copy()
        conv.append_message(conv.roles[0], qs)
        conv.append_message(conv.roles[1], None)
        prompt = conv.get_prompt()

        input_ids = tokenizer_image_token(prompt, tokenizer, IMAGE_TOKEN_INDEX, return_tensors='pt').unsqueeze(
            0).cuda()

        stop_str = conv.sep if conv.sep_style != SeparatorStyle.TWO else conv.sep2
        keywords = [stop_str]
        stopping_criteria = KeywordsStoppingCriteria(keywords, tokenizer, input_ids)

        with torch.inference_mode():
            output_ids = model.generate(
                input_ids,
                images=video_frames.half().cuda(),
                do_sample=True,
                temperature=temperature,
                top_p=top_p,
                num_beams=num_beams,
                output_scores=True,
                return_dict_in_generate=True,
                max_new_tokens=1024,
                use_cache=True,
                stopping_criteria=[stopping_criteria])

        output_ids = output_ids.sequences
        input_token_len = input_ids.shape[1]
        n_diff_input_output = (input_ids != output_ids[:, :input_token_len]).sum().item()
        if n_diff_input_output > 0:
            print(f'[Warning] {n_diff_input_output} output_ids are not the same as the input_ids')
        outputs = tokenizer.batch_decode(output_ids[:, input_token_len:], skip_special_tokens=True)[0]
        outputs = outputs.strip()
        if outputs.endswith(stop_str):
            outputs = outputs[:-len(stop_str)]
        outputs = outputs.strip()
        print(outputs)
        return json(outputs)
    

if __name__ == '__main__':
    app.run(host='0.0.0.0', port=8888, debug=True, access_log=True)



Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant