ModuleNotFoundError: No module named 'ChatUniVi.model.language_model.phi' #49

guzixian · 2024-07-09T04:19:26Z

hi i would like to use phi because of the limit of gpu memory. But got the problem describe in the title. Can you share the phi.py file?

jpthu17 · 2024-07-09T16:10:19Z

I added phi2 code, but the code seems to have bugs. See https://github.com/PKU-YuanGroup/Chat-UniVi/tree/main/ChatUniVi/model/language_model

I hope this helps. But the code seems to have bugs. The phi2 model often hangs when attempting to amalgamate training images and videos. I guess this error comes from the deepspeed bug (microsoft/DeepSpeed#2223).

guzixian · 2024-07-10T05:54:15Z

Thank you very much, I'll try it first. By the way, IS the training of the following configuration correctly? the current training comes from the video training data collected by itself, using zero3_offload.

deepspeed
--include localhost:0,1
--master_port=29602
ChatUniVi/train/train_mem.py
--deepspeed scripts/zero3_offload.json
--model_name_or_path /zxgu/Video-LLaVA/lmsys/vicuna-7b-v1.5
--version v1
--model_use FINETUNE
--dataset_use VIDEO
--vision_tower /zxgu/Chat-UniVi/openai/clip-vit-large-patch14-336
--pretrain_mm_mlp_adapter /zxgu/Chat-UniVi/Chat-UniVi-7B-v1.5-Stage-1/mm_projector.bin
--mm_vision_select_layer -2
--mm_use_im_start_end False
--mm_use_im_patch_token False
--bf16 True
--fp16 False
--output_dir /zxgu/Chat-UniVi/hf_save_checkpoint/stage1/20240708_humanfactor_20Wdata
--num_train_epochs 3
--per_device_train_batch_size 32
--per_device_eval_batch_size 4
--gradient_accumulation_steps 1
--evaluation_strategy "no"
--save_strategy "steps"
--save_steps 100
--save_total_limit 10
--learning_rate 2e-3
--weight_decay 0.
--warmup_ratio 0.03
--lr_scheduler_type "cosine"
--logging_steps 1
--tf32 True
--model_max_length 2048
--gradient_checkpointing True
--dataloader_num_workers 0
--lazy_preprocess True \

I found that if I follow finetune1, the loss hardly changes like the pink curve (set to true tune_mm_mlp_adapter), but with finetune2, the loss will shake violently at the beginning of training, is this normal?

jpthu17 · 2024-07-18T18:19:53Z

It's not normal. Typically, the LLM loss is expected to range between 0 and 5. However, I've noticed that your reported loss has peaked at 10, which suggests that there may be an issue affecting the training process.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ModuleNotFoundError: No module named 'ChatUniVi.model.language_model.phi' #49

ModuleNotFoundError: No module named 'ChatUniVi.model.language_model.phi' #49

guzixian commented Jul 9, 2024

jpthu17 commented Jul 9, 2024

guzixian commented Jul 10, 2024 •

edited

Loading

jpthu17 commented Jul 18, 2024

ModuleNotFoundError: No module named 'ChatUniVi.model.language_model.phi' #49

ModuleNotFoundError: No module named 'ChatUniVi.model.language_model.phi' #49

Comments

guzixian commented Jul 9, 2024

jpthu17 commented Jul 9, 2024

guzixian commented Jul 10, 2024 • edited Loading

jpthu17 commented Jul 18, 2024

guzixian commented Jul 10, 2024 •

edited

Loading