1S-Lab, Nanyang Technological University 2Shanghai AI Laboratory
Paper | Project Page | Dataset | Video
Text2Performer synthesizes human videos by taking the text descriptions as the only input.
📖 For more visual results, go checkout our project page
Clone this repo:
git clone https://github.com/yumingj/Text2Performer.git
cd Text2Performer
Dependencies:
conda env create -f env.yaml
conda activate text2performer
In this work, we contribute a human video dataset with rich label and text annotations named Fashion-Text2Video Dataset.
You can download our processed dataset from this Google Drive. After downloading the dataset, unzip the file and put them under the dataset folder with the following structure:
./datasets
├── FashionDataset_frames_crop
├── xxxxxx
├── 000.png
├── 001.png
├── ...
├── xxxxxx
└── xxxxxx
├── train_frame_num.txt
├── val_frame_num.txt
├── test_frame_num.txt
├── moving_frames.npy
├── captions