Skip to content

Latest commit

 

History

History
83 lines (48 loc) · 2.9 KB

README_EN.md

File metadata and controls

83 lines (48 loc) · 2.9 KB

SOFA_AI: Singing-Oriented Forced Aligner for Automatic Inference


English | 简体中文


Introduction

SOFA_AI (Singing-Oriented Forced Aligner for Automatic Inference) utilizes FunASR and SOFA to achieve the task of directly obtaining phoneme-level labels for target dry vocals in the absence of lyric annotations or speech transcription labels. This tool can to some extent optimize the phoneme labeling process for DiffSinger, reducing the burden of phoneme labeling.

Note:

The current code is assisted and corrected by ChatGPT-4, which may contain potential bugs and recognition errors. If any issues are found, you are welcome to raise an issue.

This project has plans to integrate with the openai/whisper project, as well as to add ideas about combining ASR with SOFA for confidence level assessment. Stay tuned.


How to Use

Environment Setup

  • Create and enter a Python 3.10 environment:

    conda create -n SOFA_AI python=3.10 -y
    conda activate SOFA_AI
  • Visit the Pytorch official website and download torch for your device.

  • (Optional, to avoid downloading multiple versions of the same library) Install pytorch-lightning separately:

    pip install lightning
  • Clone the repository and enter the code directory:

    git clone https://github.com/colstone/SOFA_AI.git
    cd SOFA_AI
  • Install the remaining libraries:

    pip install -r requirements.txt

Inference

  • Run the code:

    python SOFA_AI.py

    After the code runs, it will download the FunASR model from Modelscope. Once the model is downloaded, the code will ask for:

    • WAV file or folder path: Drag and drop the WAV file or folder into the command line window.

    • SOFA model path: Drag and drop the SOFA model into the command line window.

    • Dictionary path: Drag and drop the dictionary path into the command line window.

    • Phoneme label format (TextGrid or HTK lab): Enter textgrid or htk.

Then, simply wait for the code to finish running.

If you need the text labs or pinyin labs inferred by FunASR for correcting labels or for inference with MFA/SOFA, please go to the character or pinyin folder and proceed accordingly.


Open Source Projects Used in This Project

qiuqiao/SOFA: SOFA: Singing-Oriented Forced Aligner

alibaba-damo-academy/FunASR: A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models.

We sincerely thank the developers/development teams of the above projects.