CoVLM: Composing Visual Entities and Relationships in Large Language Models Via Communicative Decoding

This repository contains the official code for CoVLM: Composing Visual Entities and Relationships in Large Language Models Via Communicative Decoding.

[Project Page] [Paper]

News and ToDo List

Installation

conda create -n covlm python=3.9
conda activate covlm
# CUDA 10.2
conda install pytorch==1.12.1 torchvision==0.13.1 torchaudio==0.12.1 cudatoolkit=10.2 -c pytorch
# CUDA 11.3
conda install pytorch==1.12.1 torchvision==0.13.1 torchaudio==0.12.1 cudatoolkit=11.3 -c pytorch
# CUDA 11.6
conda install pytorch==1.12.1 torchvision==0.13.1 torchaudio==0.12.1 cudatoolkit=11.6 -c pytorch -c conda-forge
pip install -e transformers/
pip install -e YOLOX/
pip install -r requirements.txt
pip install -e .
python -m spacy download en_core_web_md

Checkpoint

Model	vision encoder	LLM	Checkpoint
CoVLM-1.4B	ViT-L-14	pythia-1.4b	Hugging Face
CoVLM-2.8B	ViT-L-14	pythia-2.8b	Hugging Face

Evaluation

Prepare evaluation datasets

RefCOCO/RefCOCOg/RefCOCOplus

bash eval_refcocog.sh CHECKPOINT

Cola

bash eval_cola.sh CHECKPOINT

ARO

bash eval_aro.sh CHECKPOINT

VQAv2

bash eval_vqav2.sh CHECKPOINT

More tasks will be available soon

Citation

If our work is useful or relevant to your research, please kindly recognize our contributions by citing our paper:

@misc{li2023covlm,
      title={CoVLM: Composing Visual Entities and Relationships in Large Language Models Via Communicative Decoding}, 
      author={Junyan Li and Delin Chen and Yining Hong and Zhenfang Chen and Peihao Chen and Yikang Shen and Chuang Gan},
      year={2023},
      eprint={2311.03354},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
GroundingDINO		GroundingDINO
LLaVA		LLaVA
YOLOX		YOLOX
image/README		image/README
open_flamingo		open_flamingo
scripts		scripts
transformers		transformers
.gitignore		.gitignore
GroundingDINO_SwinT_OGC.py		GroundingDINO_SwinT_OGC.py
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CoVLM: Composing Visual Entities and Relationships in Large Language Models Via Communicative Decoding

News and ToDo List

Installation

Checkpoint

Evaluation

Prepare evaluation datasets

RefCOCO/RefCOCOg/RefCOCOplus

Cola

ARO

VQAv2

More tasks will be available soon

Citation

About

Releases

Packages

Contributors 2

Languages

License

UMass-Foundation-Model/CoVLM

Folders and files

Latest commit

History

Repository files navigation

CoVLM: Composing Visual Entities and Relationships in Large Language Models Via Communicative Decoding

News and ToDo List

Installation

Checkpoint

Evaluation

Prepare evaluation datasets

RefCOCO/RefCOCOg/RefCOCOplus

Cola

ARO

VQAv2

More tasks will be available soon

Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages