FLORES-101
is a Many-to-Many multilingual translation benchmark dataset for 101 languages.
-
Paper: The FLORES-101 Evaluation Benchmark for Low-Resource and Multilingual Machine Translation.
-
Download
FLORES-101
dataset. -
Evaluation server: dynabench, Instructions to submit model
Looking for FLORESv1, which included Nepali, Sinhala, Pashto, and Khmer? Click here
One of the biggest challenges hindering progress in low-resource and multilingual machine translation is the lack of good evaluation benchmarks. Current evaluation benchmarks either lack good coverage of low-resource languages, consider only restricted domains, or are low quality because they are constructed using semi-automatic procedures. In this work, we introduce the FLORES evaluation benchmark, consisting of 3001 sentences extracted from English Wikipedia and covering a variety of different topics and domains. These sentences have been translated in 101 languages by professional translators through a carefully controlled process. The resulting dataset enables better assessment of model quality on the long tail of low-resource languages, including the evaluation of many-to-many multilingual translation systems, as all translations are multilingually aligned. By publicly releasing such a high-quality and high-coverage dataset, we hope to foster progress in the machine translation community and beyond.
The data can be downloaded from: Here.
For evaluation, we use SentencePiece BLEU (spBLEU) which uses a SentencePiece (SPM) tokenizer with 256K tokens and then BLEU score is computed on the sentence-piece tokenized text. This requires installing sacrebleu using a specific branch:
git clone --single-branch --branch adding_spm_tokenized_bleu https://github.com/ngoyal2707/sacrebleu.git
cd sacrebleu
python setup.py install
cd ~/
wget https://dl.fbaipublicfiles.com/flores101/dataset/flores101_dataset.tar.gz
tar -xvzf flores101_dataset.tar.gz