Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speaker Adaptation with Limited Data #212

Closed
ThThoma opened this issue Sep 11, 2020 · 2 comments
Closed

Speaker Adaptation with Limited Data #212

ThThoma opened this issue Sep 11, 2020 · 2 comments

Comments

@ThThoma
Copy link
Contributor

ThThoma commented Sep 11, 2020

Hello, I've worked with CMU pocketsphinx for a while and now I'm transitioning to Vosk.
I'm really happy with the models and the recognition speed, thank you and kudos!
(especially vosk-model-en-us-daanzu-20200905-lgraph has >90% accuracy for my voice)

However i would like to offer the choice for my users to adapt to their environment/voice with limited audio data.
With pocketsphinx models you could create a MLLR matrix and that improved accuracy by +5-10% for the speaker.

From what i understood, vosk models are based (or similar) to Kaldi models.
So i peeked at Kaldi transform documentation
and i am wondering if there is a way to create and apply MLLR matrices to vosk models?

Or are MLLR matrices considered outdated now and fine tuning (with ~1hour data) is our only choice?

@nshmyrev
Copy link
Collaborator

Hi! Thanks for your feedback!

MLLR is not compatible anymore, you can not use it.

Our models use ivectors internally which kind of superseeds MLLR and work automatically inside, so you should not worry about that.

Finetuning is the way but not very straightforward unfortunately. But Daanzu does it sometimes with helpful results daanzu/kaldi-active-grammar#33

See also here: https://www.quora.com/Does-adaptation-help-with-speech-recognition-accuracy

@ThThoma
Copy link
Contributor Author

ThThoma commented Sep 11, 2020

Thank you for the quick and clear response!
I will eventually look into finetuning.

your post on adaptation with modern speech recognition tool kits is enough :)

@ThThoma ThThoma closed this as completed Sep 11, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

2 participants