Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[3.x] i18n: Only include editor translations above a threshold #54020

Merged
merged 1 commit into from
Oct 20, 2021

Conversation

akien-mga
Copy link
Member

@akien-mga akien-mga commented Oct 20, 2021

This reduces the size of the editor binaries significantly, as we otherwise
embed all WIP translations, including ones with very low completion ratios,
and end up paying for the size of all msgids for each locale.

Cf. godotengine/godot-proposals#3421 for details.

The thresholds used are:

  • 30% for the editor interface (should already include most common strings
    while more obscure ones like UndoRedo action names might be untranslated).
  • 10% for the class reference: this is a HUGE resource and 10% is already
    a lot of useful content, especially if focused on the most used APIs.

This currently reduces the size of the editor binary by 16% on Linux.

The list will be synced manually every now and then.


Some size comparisons:

  • No translations whatsoever: 69.4 MiB
  • All editor interface translations, no classref translations (i.e. same as before 3.4 RC 1): 74.0 MiB
  • All editor interface translations + all classref translations (i.e. 3.4 RC 1): 90.1 MiB
  • Editor interface translations with 30% threshold, classref translations with 10% threshold (this PR): 75.4 MiB

So the translation-related size differential between 3.3 / 3.4 betas and 3.4 RC 1 / 4.0 is now less than 2 MiB. It will grow as more languages reach the threshold and are included of course, but that will stay significantly less than the 26 MiB bump we had after #53511.

This will be cherry-picked to master. (Edit: #54024)

Comment on lines 25 to 46
# Generate completion ratio from statistics string such as:
# 2775 translated messages, 272 fuzzy translations, 151 untranslated messages.
# First number can be 0, second and third numbers are only present if non-zero.
include-list:
@list=""; \
threshold=0.10; \
for po in $(POFILES); do \
res=`msgfmt --statistics $$po 2>&1 | sed 's/[^0-9,]*//g'`; \
complete=`cut -d',' -f1 <<< $$res`; \
fuzzy_or_untranslated=`cut -d',' -f2 <<< $$res`; \
untranslated_maybe=`cut -d',' -f3 <<< $$res`; \
if [ -z "$$fuzzy_or_untranslated" ]; then \
fuzzy_or_untranslated=0; \
fi; \
if [ -z "$$untranslated_maybe" ]; then \
untranslated_maybe=0; \
fi; \
incomplete=`expr $$fuzzy_or_untranslated + $$untranslated_maybe`; \
if `awk "BEGIN {exit !($$complete / ($$complete + $$incomplete) > $$threshold)}"`; then \
lang=`basename $$po .po`; \
list+="$$lang,"; \
fi; \
done; \
echo $$list;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is madness... if anyone finds a better tool that can give the completion ratio without manual string wrangling, or wants to spend time writing a better bash or python routine for this, feel free to.

It's not included in the buildsystem though so it's not critical.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this be precomputed when you export from weblate? They seem to have language stats in their API, but I don't know how exactly you do the syncing.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Weblate stores the translations in its own copy of our Git repo - I just copy the files over and commit.

I'm pretty sure there must exist tools that can compute this ratio, it should be trivial. But I couldn't find any official gettext command for it, and the gettext documentation is really bad (it's near impossible to land on useful information when googling stuff about gettext PO files...).

editor/SCsub Outdated
Comment on lines 76 to 81
to_include = (
"ar,bg,bn,ca,cs,de,el,eo,es_AR,es,fi,fr,gl,he,hu,id,it,ja,"
"ko,ms,nb,nl,pl,pt_BR,pt,ro,ru,sk,sv,th,tr,uk,vi,zh_CN,zh_TW"
).split(",")
tlist = [env.Dir("#editor/translations").abspath + "/" + f + ".po" for f in to_include]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the point to include Arabic (ar), Bengali (bn), Hebrew (he) and Hindi (hi) translations, none of these languages can be fully displayed without complex text layout support.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No point in 3.x indeed, but then I'll have to add them back when cherry-picking to master. That can work.

Copy link
Member Author

@akien-mga akien-mga Oct 20, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Saves 200 KiB more :)

This reduces the size of the editor binaries significantly, as we otherwise
embed all WIP translations, including ones with very low completion ratios,
and end up paying for the size of all `msgid`s for each locale.

Cf. godotengine/godot-proposals#3421 for details.

The thresholds used are:
- 30% for the editor interface (should already include most common strings
  while more obscure ones like UndoRedo action names might be untranslated).
- 10% for the class reference: this is a HUGE resource and 10% is already
  a lot of useful content, especially if focused on the most used APIs.

For 3.x, we also exclude languages that require complex text layout support
to be displayed properly.

This currently reduces the size of the editor binary by 17% on Linux.

The list will be synced manually every now and then.
@akien-mga akien-mga merged commit 2a970c3 into godotengine:3.x Oct 20, 2021
@akien-mga akien-mga deleted the 3.x-editor-i18n-thresholds branch October 20, 2021 14:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants