Skip to content

Latest commit

 

History

History
44 lines (32 loc) · 2.3 KB

README.md

File metadata and controls

44 lines (32 loc) · 2.3 KB

Speech2Text

Implementation of "An open-source end-to-end ASR system for Brazilian Portuguese using DNNs built from newly assembled corpora" by Igor Quintanilha, Luiz Wagner Pereira Biscainho, and Sergio Lima Netto. (submitted).

Requirements

  • pytorch >= 1.0.1
  • cudatoolkit >= 9.0
  • torchvision
  • torchaudio
  • ignite
  • pyyaml
  • wget
  • num2words
  • unidecode
  • editdistance
  • ctcdecode

Datasets

All datasets can be found here.

Acoustic models

AM Trained on Method WER Download
DeepSpeech 2 BRSD v2 Scratch 52.55% (2.42%) Link
DeepSpeech 2 BRSD v2 Fine-tuned 47.41% (1.73%) Link

Language models

Language model* RP Size LapsBM BRTD
word 3-gram 25 1.9G 173.79 161.29
word 5-gram 42 7.8G 136.50 135.12
char 5-gram 5 41M <=2,334.48 <=2,694.51
char 10-gram 10 4.7G <=271.86$ <=323.71
char 15-gram* 15 5.4G <=239.59$ <=198.49
char 20-gram* 20 8.8G <=227.84$ <=189.53

*All models were trained using KenLM. More detailed information in the paper.