Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

i10n #1292

Closed
wants to merge 6 commits into from
Closed

i10n #1292

wants to merge 6 commits into from

Conversation

GuillaumeGomez
Copy link
Member


```Shell
rustc --install-lang fr # downloads an official language pack from the server
rustc --install-lang fr=pack.zip # a custom pack can be installed this way
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I like the idea of adding a subcommand that downloads and extracts files to rustc. I'd rather see a separate utility to install these language packs (could also be part of multirust).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or a part of install.sh/rustup.sh (or whatever it's called these days).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see the issue to add such a thing to rust. Since localization will be handled by compiler directly, why not the language packs too ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue is that rustc will be talking over the network, we don't want that.

So the compiler will have support for language packs, just not for downloading/installing them.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not talking over the network ? I don't really see the issue here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh my bad. Didn't think of that. Thanks for the info !

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rustc also won’t be able to save these packs most of the time anyways since you need administrative privileges for that in current versions of all major systems.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just like rust_install.sh. I don't think this is a real issue here. They can just launch the command with sudo if needed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don’t simply launch compilers as root. If a compiler needs administrative privileges, then something went wrong somewhere.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a side thing, so I don't find it abnormal.

@Kimundi
Copy link
Member

Kimundi commented Sep 24, 2015

Theoretically you could use macros/syntax extension to directly allow i18n!("expected one of {} or an operator, found {}", expected, found) macros that use the string as a lookup key, and have a way to list all uses of the i18n! macro in the compiler so that it is comprehensive.

@Manishearth
Copy link
Member

@Kimundi I like the spirit of that idea, but it would make language packs much less portable

@GuillaumeGomez
Copy link
Member Author

@Kimundi: And if you want to change the key-string, you'll have to change it in all other language files. I don't think that it would be very convenient...

@steveklabnik
Copy link
Member

I would really like to see a comparison to other i18n libraries, efforts, and such. This is an incredibly complex topic, and this is a very, very brief RFC.

i18n is really important, and we should gain support for it. But it's really easy to do poorly.

@jonas-schievink
Copy link
Contributor

I feel like it's still too early to implement these "luxury" features. I think translating now would just lead to a worse experience (= many untranslated errors in the output) as diagnostics are improved and new ones get added.

I also don't get how we're supposed to change the structure of existing messages (ie. add a {} or swap their positions). Should we just create a new one and delete all old translations?

That said, even if I'll never use this (I'm much more used to english jargon than german), this does seem like a good feature to have in general.

@Manishearth
Copy link
Member

I also don't get how we're supposed to change the structure of existing messages

we should use named arguments as much as possible.

@Manishearth
Copy link
Member

It occurs to me that we forgot about pluralization. That's nontrivial to handle. I think Firefox handles it by asking for two versions of the string.

@nagisa
Copy link
Member

nagisa commented Sep 24, 2015

Do not re-invent a wheel. There’s gettext and lots of infrastructure around it. IMHO this proposal is strongly inferior to implementing gettext library (that works equally well on all supported platforms, as opposed to python’s gettext implementation) in rust and pulling that into rustc.

If gettext is not satisfactory in some way, then at least port something that is known to already work; the rust project really doesn’t need to solve the already-mostly-solved l10n problem all over again.

P.S. this RFC is proposing infrastructure for l10n (localisation), not i18n (internationalisation). i18n is much more involved and I don’t see how rust needs it at all.

@apasel422
Copy link
Contributor

@Manishearth Pluralization is more complicated than that, as some languages have more intricate rules than just 1 -> singular, !1 -> plural.

I agree with @steveklabnik that this needs far more investigation. l20n should certainly be brought up here.

@GuillaumeGomez
Copy link
Member Author

@nagisa: You're absolutely right. I got confused but it's i10n.

@GuillaumeGomez GuillaumeGomez changed the title i18n i10n Sep 24, 2015
@Manishearth
Copy link
Member

P.S. this RFC is proposing infrastructure for l10n (localisation), not i18n (internationalisation).

infrastructure for l10n is i18n. i18n is making a piece of software localizable, l10n is creating the translations.

@Manishearth
Copy link
Member

Pluralization is more complicated than that,

Agreed. I think they have some handling for that, but I haven't looked into it.

@olivren
Copy link

olivren commented Sep 25, 2015

I agree with the intent of this RFC, but not on the proposed solution.

In my experience, translations based on simple key/value formats are a real pain to work with. Finding consistent or meaningful key names is impossible. Developers now have to follow an indirection to know what the content of the string is. The most important part of translating software lies in the tools that make it easy for the translators to keep the translations up to date, and to do so at their own pace. gettext is very good for that, and it is not something to overlook.

So I think this RFC should just be "internationalize rustc using gettext". Unfortunately, I could not find any attempt to support gettext for Rust, and this is obviously a prerequisite (note that this needs supports both on gettext side and the creation of a binding in Rust).

@seanmonstar
Copy link
Contributor

I highly recommend looking at l20n.

I don't recommend looking to deeply at this implementation, as it was my first rust code ever, but I feel some absurd urge to include it with my comment about l20n.

@withoutboats
Copy link
Contributor

I don't know a lot about localization (is it just me or are these acronyms ironic given that this is an accessibility issue?), but wouldn't it make sense for this to be built on top of semantic error values that could also have a machine readable (e.g. JSON) output form, as well as a localized human readable String, similar to this RFC?

@nrc nrc added the T-compiler Relevant to the compiler team, which will review and decide on the RFC. label Sep 26, 2015
@fbstj
Copy link
Contributor

fbstj commented Sep 29, 2015

I agree with @withoutboats that machine readable debugging is probably much more easily translatable than embedding the i10n/etc inside everywhere.

@Nashenas88
Copy link

Let's not forget that pluralization is not the only thing that varies in translated strings. A huge class of languages does noun declension, where the spelling and pronunciation of a noun changes depending on its usage in a sentence, which can also be mixed with genders.

It's not just the messages we have now that would need to change, but the code around how some of those messages are generated too. There are some cases where we programmatically build up parts of the string (I'm not talking about the user's own code, but the messages themselves). This can lead to cases where figuring out how many strings need to be translated will be very difficult to do.

I'd also propose that when we do these translation files that the original English translation include an additional column for the context. This is usually a piece of text that describes more information about the text and the words used in order to help a translator understand the context that the translation is used. Not everyone coming up with translations is going to completely understand the code it's used in. They can also be used to describe the sections of the string that are replaced with user content. For example, whether the value that appears in "{}" is an identifier or a constant value, etc. This can change the surrounding words depending on the language. I can't think of an exact example, but a similar example is translating the word "to go" into Russian. They don't have a general word for "to go", but expressions like, " to drive ", "to fly", "to walk", etc. Having a context description makes coming up with a readable/sensible translation much easier.

@apasel422
Copy link
Contributor

@Nashenas88 That's a good point. l20n makes solving these kinds of grammatical issues relatively simple.

@nikomatsakis nikomatsakis self-assigned this Oct 1, 2015
@Nashenas88
Copy link

@apasel422, I just looked up l20n, and I'm impressed by what it offers. I haven't had a chance to look at the code yet though; I hope it's something we could take advantage of easily.

@durka
Copy link
Contributor

durka commented Oct 2, 2015

Maybe I missed something in the RFC, but how would this actually work? If the translations are regular format strings (with {}), then they have to be passed to format! at compile-time using some kind of include! shenanigans. But then rustc either has to be compiled to change locales, or it contains all the format strings in the binary and has big switch statements everywhere when printing. Alternatively, format strings could be interpreted at runtime with less safety. Which is proposed?

@Manishearth
Copy link
Member

There is machinery to iterate through format strings at runtime, it's easy to use that.

@nikomatsakis
Copy link
Contributor

So there is a change I have been contemplating doing that is related to this RFC. It frequently happens in Rust that we have "multipart" error messages, like an error with several explanatory notes, and maybe a help suggestion as well. Currently, each of these is a distinct message, and each has its own span, and it's kind of a big mess.

Furthermore, many of the messages -- such as those produced by the borrow checker -- involve "program flow". We currently display this as a multipart message highlighting each point in the code, but this must be "cross-referenced" against the original source somehow.

On a related note, our messages often include a lot of terminology that users may not know. Research shows that even simple terms like "function body" can be confusing to new users and so on, to say nothing of things like "object type" or "lvalue". It'd be great if we could find a way to make these terms clearer to people.

I was hoping to address all of these points by allowing us to construct richer errors. The idea would be to have:

  1. Multipart errors, first of all.
  2. Markup and not just plain strings, where you can associate arbitrary terms with spans, and not just the entire message. The formatter would try to coallesce these spans into a single snippet, with the relevant text color-coded, so that e.g. if we say "the lvalue", it might be colored red, and the lvalue we are referring to in the source also colored red. Or something. There would probably have to be some kind of priorities too, so we can degrade gracefully when color is not available, or if we can't get all the spans into a single snippet.

Anyway, I bring this up here because while these goals are not directly I10N goals, they are obviously served by some of the measures in this RFC. Furthermore, for I10N purposes, I imagine these "multipart" messages ought to be a unit, since the right breakdown will probably be translated differently. I know that sometimes I have to really torture the phrasing to make it make grammatical sense in English, and I assume it would be near impossible to port that across languages.

@GuillaumeGomez
Copy link
Member Author

Markup and not just plain strings, where you can associate arbitrary terms with spans, and not just the entire message. The formatter would try to coallesce these spans into a single snippet, with the relevant text color-coded, so that e.g. if we say "the lvalue", it might be colored red, and the lvalue we are referring to in the source also colored red. Or something. There would probably have to be some kind of priorities too, so we can degrade gracefully when color is not available, or if we can't get all the spans into a single snippet.

I was working on something similar but only for rustc --explain E0000 (and I didn't finish it because rustdoc must be modified before).

Giving a better access to information for newcomers (and even more experimented rust users!) should be a little more considered.

@seanmonstar
Copy link
Contributor

I've updated l20n so that it now compiles on stable rust. The parsing and resolving works, but locale negotiation is non-existent at the moment. (repo: https://github.com/seanmonstar/l20n.rs)

With some more work, it could be possible to use syntax extensions (or codegen like serde) to compile the l20n templates into rust code at compile time, instead of runtime. (Runtime should stay though, since it's also a possible strategy for an application to download updated language resources and need to compile them at runtime.)

@graydon
Copy link

graydon commented Nov 12, 2015

I will play community memory here and point out that the current format string syntax in rust was chosen explicitly to be compatible with ICU MessageFormat syntax (itself derived from Java's). This is the standard (and has been worked over very thoroughly to accommodate variations in plural, gender and similar dimensions).

http://icu-project.org/apiref/icu4j/com/ibm/icu/text/MessageFormat.html

@graydon
Copy link

graydon commented Nov 12, 2015

(The most thorough conversion we had about this was in 2013, starting with https://mail.mozilla.org/pipermail/rust-dev/2013-May/003999.html ... There are lots of arguments and informative links in that thread. )

@nikomatsakis
Copy link
Contributor

We discussed this RFC in the @rust-lang/compiler meeting yesterday. The rough consensus was that it is too early to "internationalize" the compiler, even though we would like to do so eventually. Even when just considering English, it is difficult to maintain the quality of error messages, and adding other languages into the mix would be a significant burden. It's also hard for us to judge the quality of those error messages. That said, we are doing some work on overhauling the error reporting infrastructure for IDE integration and better usability, and I expect that this should make internationalization easier longer term (though at the moment we have not been focusing on extracting the text of the messages themselves outside of the compiler). Therefore, I'm inclined to close this RFC for the time being (and open a corresponding issue), but I'd like to hear feedback on that first.

@GuillaumeGomez
Copy link
Member Author

What we had in mind @Manishearth and myself was more to provide the structure to allow users to add localization (so rust team can not internationalize anything). However I approve this way of doing it, for now steps need to be done before going more into this.

@nikomatsakis
Copy link
Contributor

@GuillaumeGomez OK. I will close then for now, but thanks for the interesting discussion.

@Manishearth
Copy link
Member

Given that we recently got localization for the website working, I wrote a post starting exploration of doing the same for the compiler.

https://internals.rust-lang.org/t/translating-the-compiler/10376

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
T-compiler Relevant to the compiler team, which will review and decide on the RFC.
Projects
None yet
Development

Successfully merging this pull request may close these issues.