Skip to content

Adding New Languages

Alexander Veysov edited this page Sep 24, 2020 · 2 revisions

🦄 Adding New Languages

Adding a CE-only Model

As a service for community we can easily add a CE model for any language that has a Unicode alphabet pro bono.

A few general rules of thumb:

  • Generally it does not make sense to just use Common Voice - the resulting model will have problems with generalization;
  • Usually it takes some effort to collect enough data to build a decent model;
  • Ideally the proper way is to source as much training data as possible, but test/val datasets may cover many more domains to test generalization;
  • The more diverse data you have - the better model will be;

Please do not hesitate to contact us directly for advice. CE models will always stay public for all languages. From time-to-time we will re-train all of our models when we achieve some fundamental breakthroughs in our research.

Model Training Code

At this time for a number of reasons we decided not to share code for training models.

Adding a EE Model

Please contact us directly for a quote.

Current Backlog

Currently, without any hard deadlines, we are planning on supporting the following major languages both with CE and EE versions with the same attention to quality:

  • French
  • Italian
  • Polish
  • Czech