Skip to content

Latest commit

 

History

History
126 lines (98 loc) · 5.02 KB

README.md

File metadata and controls

126 lines (98 loc) · 5.02 KB

Text2Performer: Text-Driven Human Video Generation

1S-Lab, Nanyang Technological University  2Shanghai AI Laboratory

Paper | Project Page | Dataset | Video

Text2Performer synthesizes human videos by taking the text descriptions as the only input.

📖 For more visual results, go checkout our project page

Installation

Clone this repo:

git clone https://github.com/yumingj/Text2Performer.git
cd Text2Performer

Dependencies:

conda env create -f env.yaml
conda activate text2performer

(1) Dataset Preparation

In this work, we contribute a human video dataset with rich label and text annotations named Fashion-Text2Video Dataset.

You can download our processed dataset from this Google Drive. After downloading the dataset, unzip the file and put them under the dataset folder with the following structure:

./datasets
├── FashionDataset_frames_crop
    ├── xxxxxx
        ├── 000.png
        ├── 001.png
        ├── ...
    ├── xxxxxx
    └── xxxxxx
├── train_frame_num.txt
├── val_frame_num.txt
├── test_frame_num.txt
├── moving_frames.npy
├── captions