From 9023c7315d7bc7e532697dd71d0d56ac9b9a518d Mon Sep 17 00:00:00 2001
From: Shadow Cun <vinthony@gmail.com>
Date: Thu, 20 Apr 2023 22:58:35 +0800
Subject: [PATCH 1/9] Update README.md

---
 README.md | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/README.md b/README.md
index af83e996..58e98743 100644
--- a/README.md
+++ b/README.md
@@ -121,9 +121,10 @@ Tutorials from communities: [中文windows教程](https://www.bilibili.com/video
 ### Windows ([中文windows教程](https://www.bilibili.com/video/BV1Dc411W7V6/)):
 
 1. Install [Python 3.10.6](https://www.python.org/downloads/windows/), checking "Add Python to PATH".
-2. Install [git](https://git-scm.com/download/win).
-3. Install `ffmpeg`, following [this instruction](https://www.wikihow.com/Install-FFmpeg-on-Windows).
+2. Install [git](https://git-scm.com/download/win) manually (OR `scoop install git` via [scoop](https://scoop.sh/)).
+3. Install `ffmpeg`, following [this instruction](https://www.wikihow.com/Install-FFmpeg-on-Windows) (OR using `scoop install ffmpeg` via [scoop](https://scoop.sh/)).
 4. Download our SadTalker repository, for example by running `git clone https://github.com/Winfredy/SadTalker.git`.
+5. Download the `checkpoint` and `gfpgan` [below↓](https://github.com/Winfredy/SadTalker#-2-download-trained-models).
 5. Run `start.bat` from Windows Explorer as normal, non-administrator, user, a gradio WebUI demo will be started.
 
 ### Macbook:

From a930df3c2309305b702f5680cb62493f4e00e9a5 Mon Sep 17 00:00:00 2001
From: Shadow Cun <vinthony@gmail.com>
Date: Tue, 25 Apr 2023 01:12:40 +0800
Subject: [PATCH 2/9] Update FAQ.md

---
 docs/FAQ.md | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/docs/FAQ.md b/docs/FAQ.md
index fe758809..41e2dab3 100644
--- a/docs/FAQ.md
+++ b/docs/FAQ.md
@@ -26,3 +26,15 @@ Make sure you have downloaded the checkpoints and gfpgan as [here](https://githu
 **Q: RuntimeError: unexpected EOF, expected 237192 more bytes. The file might be corrupted.**
 
 The files are not automatically downloaded. Please update the code and download the gfpgan folders as [here](https://github.com/Winfredy/SadTalker#-2-download-trained-models).
+
+**Q: CUDA out of memory error**
+
+please refer to https://stackoverflow.com/questions/73747731/runtimeerror-cuda-out-of-memory-how-setting-max-split-size-mb
+
+``` 
+# windows
+set PYTORCH_CUDA_ALLOC_CONF=max_split_size_mb:128 python inference.py ...
+
+# linux
+export PYTORCH_CUDA_ALLOC_CONF=max_split_size_mb:128 python inference.py ...
+```

From 643fc4c9d20bb23633428916db903e90d55729ba Mon Sep 17 00:00:00 2001
From: Shadow Cun <vinthony@gmail.com>
Date: Tue, 25 Apr 2023 01:13:00 +0800
Subject: [PATCH 3/9] Update FAQ.md

---
 docs/FAQ.md | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/docs/FAQ.md b/docs/FAQ.md
index 41e2dab3..763e24a4 100644
--- a/docs/FAQ.md
+++ b/docs/FAQ.md
@@ -33,8 +33,10 @@ please refer to https://stackoverflow.com/questions/73747731/runtimeerror-cuda-o
 
 ``` 
 # windows
-set PYTORCH_CUDA_ALLOC_CONF=max_split_size_mb:128 python inference.py ...
+set PYTORCH_CUDA_ALLOC_CONF=max_split_size_mb:128 
+python inference.py ...
 
 # linux
-export PYTORCH_CUDA_ALLOC_CONF=max_split_size_mb:128 python inference.py ...
+export PYTORCH_CUDA_ALLOC_CONF=max_split_size_mb:128 
+python inference.py ...
 ```

From f8ad3222b259cc0c7486fe509c7226091fa5bd23 Mon Sep 17 00:00:00 2001
From: Shadow Cun <vinthony@gmail.com>
Date: Tue, 25 Apr 2023 01:15:48 +0800
Subject: [PATCH 4/9] Update FAQ.md

---
 docs/FAQ.md | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/docs/FAQ.md b/docs/FAQ.md
index 763e24a4..6451a226 100644
--- a/docs/FAQ.md
+++ b/docs/FAQ.md
@@ -40,3 +40,7 @@ python inference.py ...
 export PYTORCH_CUDA_ALLOC_CONF=max_split_size_mb:128 
 python inference.py ...
 ```
+
+**Q: Error while decoding stream #0:0: Invalid data found when processing input [mp3float @ 0000015037628c00] Header missing**
+
+Our method only support wav or mp3 files as input, please make sure the feeded audios are in these formats.

From 0fc2f9c0e96f51ccf120dc3ee6ba55e9ad13e90a Mon Sep 17 00:00:00 2001
From: Chenxi <chenxi.whitehouse@gmail.com>
Date: Tue, 2 May 2023 06:55:12 +0000
Subject: [PATCH 5/9] replicate

---
 README.md                 |   3 +-
 cog.yaml                  |  35 +++++++
 predict.py                | 214 ++++++++++++++++++++++++++++++++++++++
 src/facerender/animate.py |   3 +-
 4 files changed, 252 insertions(+), 3 deletions(-)
 create mode 100644 cog.yaml
 create mode 100644 predict.py

diff --git a/README.md b/README.md
index 58e98743..14c7ad44 100644
--- a/README.md
+++ b/README.md
@@ -5,8 +5,7 @@
 
 <!--<h2> 😭 SadTalker： <span style="font-size:12px">Learning Realistic 3D Motion Coefficients for Stylized Audio-Driven Single Image Talking Face Animation </span> </h2> -->
 
-  <a href='https://arxiv.org/abs/2211.12194'><img src='https://img.shields.io/badge/ArXiv-PDF-red'></a> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<a href='https://sadtalker.github.io'><img src='https://img.shields.io/badge/Project-Page-Green'></a> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp; [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Winfredy/SadTalker/blob/main/quick_demo.ipynb) &nbsp;&nbsp;&nbsp;&nbsp;&nbsp; [![Hugging Face Spaces](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue)](https://huggingface.co/spaces/vinthony/SadTalker) &nbsp;&nbsp;&nbsp;&nbsp;&nbsp; [![sd webui-colab](https://img.shields.io/badge/Automatic1111-Colab-green)](https://colab.research.google.com/github/camenduru/stable-diffusion-webui-colab/blob/main/video/stable/stable_diffusion_1_5_video_webui_colab.ipynb)
-
+  <a href='https://arxiv.org/abs/2211.12194'><img src='https://img.shields.io/badge/ArXiv-PDF-red'></a> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<a href='https://sadtalker.github.io'><img src='https://img.shields.io/badge/Project-Page-Green'></a> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp; [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Winfredy/SadTalker/blob/main/quick_demo.ipynb) &nbsp;&nbsp;&nbsp;&nbsp;&nbsp; [![Hugging Face Spaces](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue)](https://huggingface.co/spaces/vinthony/SadTalker) &nbsp;&nbsp;&nbsp;&nbsp;&nbsp; [![sd webui-colab](https://img.shields.io/badge/Automatic1111-Colab-green)](https://colab.research.google.com/github/camenduru/stable-diffusion-webui-colab/blob/main/video/stable/stable_diffusion_1_5_video_webui_colab.ipynb) &nbsp;&nbsp;&nbsp;&nbsp;&nbsp; [![Replicate](https://replicate.com/cjwbw/sadtalker/badge)](https://replicate.com/cjwbw/sadtalker) 
 
 <div>
     <a target='_blank'>Wenxuan Zhang <sup>*,1,2</sup> </a>&emsp;
diff --git a/cog.yaml b/cog.yaml
new file mode 100644
index 00000000..05bcbd58
--- /dev/null
+++ b/cog.yaml
@@ -0,0 +1,35 @@
+build:
+  gpu: true
+  cuda: "11.3"
+  python_version: "3.8"
+  system_packages:
+    - "ffmpeg"
+    - "libgl1-mesa-glx"
+    - "libglib2.0-0"
+  python_packages:
+    - "torch==1.12.1"
+    - "torchvision==0.13.1"
+    - "torchaudio==0.12.1"
+    - "joblib==1.1.0"
+    - "scikit-image==0.19.3"
+    - "basicsr==1.4.2"
+    - "facexlib==0.3.0"
+    - "resampy==0.3.1"
+    - "pydub==0.25.1"
+    - "scipy==1.10.1"
+    - "kornia==0.6.8"
+    - "face_alignment==1.3.5"
+    - "imageio==2.19.3"
+    - "imageio-ffmpeg==0.4.7"
+    - "librosa==0.9.2" #
+    - "tqdm==4.65.0"
+    - "yacs==0.1.8"
+    - "gfpgan==1.3.8"
+    - "dlib-bin==19.24.1"
+    - "av==10.0.0"
+    - "trimesh==3.9.20"
+  run:
+    - mkdir -p /root/.cache/torch/hub/checkpoints/ && wget --output-document "/root/.cache/torch/hub/checkpoints/s3fd-619a316812.pth" "https://www.adrianbulat.com/downloads/python-fan/s3fd-619a316812.pth"
+    - mkdir -p /root/.cache/torch/hub/checkpoints/ && wget --output-document "/root/.cache/torch/hub/checkpoints/2DFAN4-cd938726ad.zip" "https://www.adrianbulat.com/downloads/python-fan/2DFAN4-cd938726ad.zip"
+
+predict: "predict.py:Predictor"
diff --git a/predict.py b/predict.py
new file mode 100644
index 00000000..1a44a663
--- /dev/null
+++ b/predict.py
@@ -0,0 +1,214 @@
+"""run bash scripts/download_models.sh first to prepare the weights file"""
+import os
+import shutil
+from argparse import Namespace
+from src.utils.preprocess import CropAndExtract
+from src.test_audio2coeff import Audio2Coeff
+from src.facerender.animate import AnimateFromCoeff
+from src.generate_batch import get_data
+from src.generate_facerender_batch import get_facerender_data
+from cog import BasePredictor, Input, Path
+
+checkpoints = "checkpoints"
+
+
+class Predictor(BasePredictor):
+    def setup(self):
+        """Load the model into memory to make running multiple predictions efficient"""
+        device = "cuda"
+
+        path_of_lm_croper = os.path.join(
+            checkpoints, "shape_predictor_68_face_landmarks.dat"
+        )
+        path_of_net_recon_model = os.path.join(checkpoints, "epoch_20.pth")
+        dir_of_BFM_fitting = os.path.join(checkpoints, "BFM_Fitting")
+        wav2lip_checkpoint = os.path.join(checkpoints, "wav2lip.pth")
+
+        audio2pose_checkpoint = os.path.join(checkpoints, "auido2pose_00140-model.pth")
+        audio2pose_yaml_path = os.path.join("src", "config", "auido2pose.yaml")
+
+        audio2exp_checkpoint = os.path.join(checkpoints, "auido2exp_00300-model.pth")
+        audio2exp_yaml_path = os.path.join("src", "config", "auido2exp.yaml")
+
+        free_view_checkpoint = os.path.join(
+            checkpoints, "facevid2vid_00189-model.pth.tar"
+        )
+
+        # init model
+        self.preprocess_model = CropAndExtract(
+            path_of_lm_croper, path_of_net_recon_model, dir_of_BFM_fitting, device
+        )
+
+        self.audio_to_coeff = Audio2Coeff(
+            audio2pose_checkpoint,
+            audio2pose_yaml_path,
+            audio2exp_checkpoint,
+            audio2exp_yaml_path,
+            wav2lip_checkpoint,
+            device,
+        )
+
+        self.animate_from_coeff = {
+            "full": AnimateFromCoeff(
+                free_view_checkpoint,
+                os.path.join(checkpoints, "mapping_00109-model.pth.tar"),
+                os.path.join("src", "config", "facerender_still.yaml"),
+                device,
+            ),
+            "others": AnimateFromCoeff(
+                free_view_checkpoint,
+                os.path.join(checkpoints, "mapping_00229-model.pth.tar"),
+                os.path.join("src", "config", "facerender.yaml"),
+                device,
+            ),
+        }
+
+    def predict(
+        self,
+        source_image: Path = Input(
+            description="Upload the source image, it can be video.mp4 or picture.png",
+        ),
+        driven_audio: Path = Input(
+            description="Upload the driven audio, accepts .wav and .mp4 file",
+        ),
+        enhancer: str = Input(
+            description="Choose a face enhancer",
+            choices=["gfpgan", "RestoreFormer"],
+            default="gfpgan",
+        ),
+        preprocess: str = Input(
+            description="how to preprocess the images",
+            choices=["crop", "resize", "full"],
+            default="full",
+        ),
+        ref_eyeblink: Path = Input(
+            description="path to reference video providing eye blinking",
+            default=None,
+        ),
+        ref_pose: Path = Input(
+            description="path to reference video providing pose",
+            default=None,
+        ),
+        still: bool = Input(
+            description="can crop back to the original videos for the full body aniamtion when preprocess is full",
+            default=True,
+        ),
+    ) -> Path:
+        """Run a single prediction on the model"""
+
+        animate_from_coeff = (
+            self.animate_from_coeff["full"]
+            if preprocess == "full"
+            else self.animate_from_coeff["others"]
+        )
+
+        args = load_default()
+        args.pic_path = str(source_image)
+        args.audio_path = str(driven_audio)
+        device = "cuda"
+        args.still = still
+        args.ref_eyeblink = None if ref_eyeblink is None else str(ref_eyeblink)
+        args.ref_pose = None if ref_pose is None else str(ref_pose)
+
+        # crop image and extract 3dmm from image
+        results_dir = "results"
+        if os.path.exists(results_dir):
+            shutil.rmtree(results_dir)
+        os.makedirs(results_dir)
+        first_frame_dir = os.path.join(results_dir, "first_frame_dir")
+        os.makedirs(first_frame_dir)
+
+        print("3DMM Extraction for source image")
+        first_coeff_path, crop_pic_path, crop_info = self.preprocess_model.generate(
+            args.pic_path, first_frame_dir, preprocess, source_image_flag=True
+        )
+        if first_coeff_path is None:
+            print("Can't get the coeffs of the input")
+            return
+
+        if ref_eyeblink is not None:
+            ref_eyeblink_videoname = os.path.splitext(os.path.split(ref_eyeblink)[-1])[
+                0
+            ]
+            ref_eyeblink_frame_dir = os.path.join(results_dir, ref_eyeblink_videoname)
+            os.makedirs(ref_eyeblink_frame_dir, exist_ok=True)
+            print("3DMM Extraction for the reference video providing eye blinking")
+            ref_eyeblink_coeff_path, _, _ = self.preprocess_model.generate(
+                ref_eyeblink, ref_eyeblink_frame_dir
+            )
+        else:
+            ref_eyeblink_coeff_path = None
+
+        if ref_pose is not None:
+            if ref_pose == ref_eyeblink:
+                ref_pose_coeff_path = ref_eyeblink_coeff_path
+            else:
+                ref_pose_videoname = os.path.splitext(os.path.split(ref_pose)[-1])[0]
+                ref_pose_frame_dir = os.path.join(results_dir, ref_pose_videoname)
+                os.makedirs(ref_pose_frame_dir, exist_ok=True)
+                print("3DMM Extraction for the reference video providing pose")
+                ref_pose_coeff_path, _, _ = self.preprocess_model.generate(
+                    ref_pose, ref_pose_frame_dir
+                )
+        else:
+            ref_pose_coeff_path = None
+
+        # audio2ceoff
+        batch = get_data(
+            first_coeff_path,
+            args.audio_path,
+            device,
+            ref_eyeblink_coeff_path,
+            still=still,
+        )
+        coeff_path = self.audio_to_coeff.generate(
+            batch, results_dir, args.pose_style, ref_pose_coeff_path
+        )
+        # coeff2video
+        print("coeff2video")
+        data = get_facerender_data(
+            coeff_path,
+            crop_pic_path,
+            first_coeff_path,
+            args.audio_path,
+            args.batch_size,
+            args.input_yaw,
+            args.input_pitch,
+            args.input_roll,
+            expression_scale=args.expression_scale,
+            still_mode=still,
+            preprocess=preprocess,
+        )
+        animate_from_coeff.generate(
+            data, results_dir, args.pic_path, crop_info,
+            enhancer=enhancer, background_enhancer=args.background_enhancer,
+            preprocess=preprocess)
+
+        output = "/tmp/out.mp4"
+        mp4_path = os.path.join(results_dir, [f for f in os.listdir(results_dir) if "enhanced.mp4" in f][0])
+        shutil.copy(mp4_path, output)
+
+        return Path(output)
+
+
+def load_default():
+    return Namespace(
+        pose_style=0,
+        batch_size=2,
+        expression_scale=1.0,
+        input_yaw=None,
+        input_pitch=None,
+        input_roll=None,
+        background_enhancer=None,
+        face3dvis=False,
+        net_recon="resnet50",
+        init_path=None,
+        use_last_fc=False,
+        bfm_folder="./checkpoints/BFM_Fitting/",
+        bfm_model="BFM_model_front.mat",
+        focal=1015.0,
+        center=112.0,
+        camera_d=10.0,
+        z_near=5.0,
+        z_far=15.0,
+    )
diff --git a/src/facerender/animate.py b/src/facerender/animate.py
index 3adea961..2ee28e73 100644
--- a/src/facerender/animate.py
+++ b/src/facerender/animate.py
@@ -177,7 +177,8 @@ def generate(self, x, video_save_dir, pic_path, crop_info, enhancer=None, backgr
         audio_name = os.path.splitext(os.path.split(audio_path)[-1])[0]
         new_audio_path = os.path.join(video_save_dir, audio_name+'.wav')
         start_time = 0
-        sound = AudioSegment.from_mp3(audio_path)
+        # cog will not keep the .mp3 filename
+        sound = AudioSegment.from_file(audio_path)
         frames = frame_num 
         end_time = start_time + frames*1/25*1000
         word1=sound.set_frame_rate(16000)

From a60d62b13cc70294972425ff756c60668ebf2dbc Mon Sep 17 00:00:00 2001
From: Shadow Cun <vinthony@gmail.com>
Date: Thu, 4 May 2023 11:40:11 +0800
Subject: [PATCH 6/9] Update README.md

---
 README.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/README.md b/README.md
index 14c7ad44..7724c7cf 100644
--- a/README.md
+++ b/README.md
@@ -5,7 +5,7 @@
 
 <!--<h2> 😭 SadTalker： <span style="font-size:12px">Learning Realistic 3D Motion Coefficients for Stylized Audio-Driven Single Image Talking Face Animation </span> </h2> -->
 
-  <a href='https://arxiv.org/abs/2211.12194'><img src='https://img.shields.io/badge/ArXiv-PDF-red'></a> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<a href='https://sadtalker.github.io'><img src='https://img.shields.io/badge/Project-Page-Green'></a> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp; [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Winfredy/SadTalker/blob/main/quick_demo.ipynb) &nbsp;&nbsp;&nbsp;&nbsp;&nbsp; [![Hugging Face Spaces](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue)](https://huggingface.co/spaces/vinthony/SadTalker) &nbsp;&nbsp;&nbsp;&nbsp;&nbsp; [![sd webui-colab](https://img.shields.io/badge/Automatic1111-Colab-green)](https://colab.research.google.com/github/camenduru/stable-diffusion-webui-colab/blob/main/video/stable/stable_diffusion_1_5_video_webui_colab.ipynb) &nbsp;&nbsp;&nbsp;&nbsp;&nbsp; [![Replicate](https://replicate.com/cjwbw/sadtalker/badge)](https://replicate.com/cjwbw/sadtalker) 
+  <a href='https://arxiv.org/abs/2211.12194'><img src='https://img.shields.io/badge/ArXiv-PDF-red'></a> &nbsp; <a href='https://sadtalker.github.io'><img src='https://img.shields.io/badge/Project-Page-Green'></a> &nbsp; [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Winfredy/SadTalker/blob/main/quick_demo.ipynb) &nbsp; [![Hugging Face Spaces](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue)](https://huggingface.co/spaces/vinthony/SadTalker) &nbsp; [![sd webui-colab](https://img.shields.io/badge/Automatic1111-Colab-green)](https://colab.research.google.com/github/camenduru/stable-diffusion-webui-colab/blob/main/video/stable/stable_diffusion_1_5_video_webui_colab.ipynb) &nbsp; [![Replicate](https://replicate.com/cjwbw/sadtalker/badge)](https://replicate.com/cjwbw/sadtalker) 
 
 <div>
     <a target='_blank'>Wenxuan Zhang <sup>*,1,2</sup> </a>&emsp;

From a9034df0b3a1f6e52ffb5a906efdcabd53476638 Mon Sep 17 00:00:00 2001
From: kainstan <hua-zai@qq.com>
Date: Wed, 10 May 2023 21:21:05 +0800
Subject: [PATCH 7/9] . idea is a pychar configuration file that should be
 ignored

---
 .gitignore | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/.gitignore b/.gitignore
index 65365db2..851588a9 100644
--- a/.gitignore
+++ b/.gitignore
@@ -157,7 +157,7 @@ cython_debug/
 #  be found at https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore
 #  and can be added to the global gitignore or merged into this file.  For a more nuclear
 #  option (not recommended) you can uncomment the following to ignore the entire idea folder.
-#.idea/
+.idea/
 
 examples/results/*
 gfpgan/*

From 44889c3a8bd1fe8a2014d915e909673bef72afa6 Mon Sep 17 00:00:00 2001
From: kainstan <hua-zai@qq.com>
Date: Thu, 11 May 2023 16:52:26 +0800
Subject: [PATCH 8/9] .DS_ Store is a hidden configuration file for Mac and
 should be ignored

---
 .gitignore | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/.gitignore b/.gitignore
index 851588a9..0ecb8ed9 100644
--- a/.gitignore
+++ b/.gitignore
@@ -165,4 +165,7 @@ checkpoints/
 results/*
 Dockerfile
 start_docker.sh
-start.sh
\ No newline at end of file
+start.sh
+
+# Mac
+.DS_Store

From bbe54e928d71bd5c0c0650972450fa4907f3e34b Mon Sep 17 00:00:00 2001
From: ribasoka <ribasoka@gmail.com>
Date: Mon, 15 May 2023 23:09:05 +0300
Subject: [PATCH 9/9] Update app.py

- Fixed timeout bug
---
 app.py | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/app.py b/app.py
index edde0cf4..4bb206b0 100644
--- a/app.py
+++ b/app.py
@@ -144,6 +144,6 @@ def sadtalker_demo():
 if __name__ == "__main__":
 
     demo = sadtalker_demo()
-    demo.launch(share=True)
+    demo.queue().launch(share=True)