Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add robust unicode support (probably via ICU bindings) #797

Open
steveklabnik opened this issue Feb 2, 2015 · 3 comments
Open

Add robust unicode support (probably via ICU bindings) #797

steveklabnik opened this issue Feb 2, 2015 · 3 comments
Labels
T-libs-api Relevant to the library API team, which will review and decide on the RFC.

Comments

@steveklabnik
Copy link
Member

Issue by brson
Wednesday Jun 04, 2014 at 22:09 GMT

For earlier discussion, see rust-lang/rust#14656

This issue was labelled with: A-libs, I-enhancement in the Rust repository


There's been lots of talk about unicode over the years. We have little support in the core libs, but need to provide something better for serious use. Best idea now is to wrestle libicu into a Rust crate. Start out-of-tree.

@skade
Copy link
Contributor

skade commented Feb 5, 2015

I honestly think this shouldn't be in a languages core. A good external library sounds better.

The issue is manifold: First of all, unicode isn't the solution to all problems. For example, Big5 is still a de-facto standard in some parts of Asia. I wouldn't default on unicode too much.

Another thing is that full unicode support comes with a lot of baggage, e.g. it is locale dependent, comes with a lot of specialized lingo, memory and disk size for tables, etc. Even systems specialised on text management (e.g. Elasticsearch) ship things like ICU as an additional plugin just for the pure (disk and memory) weight. Fixing bugs only in sync with the main language will be a problem.

A good unicode text library out of tree will probably be the best solution.

@petrochenkov petrochenkov added the T-libs-api Relevant to the library API team, which will review and decide on the RFC. label Jan 28, 2018
@ghost
Copy link

ghost commented Jul 31, 2019

Just for anyone who stumbles on this post via Google: I found unic to be a pretty cool project doing what @skade proposed. It is ICU, but written entirely in Rust. I'm hoping it gets some more attention because ICU is ... well, ICU.

@filmil
Copy link

filmil commented May 8, 2020

Hello folks! I hope you don't mind a bit of code advertising, especially since this particular issue seems very topical.

For some time now, in the context of Fuchsia OS, we've been working on rust bindings for the ICU library. This was originally started as an in-tree contribution to Fuchsia, but somewhere in January 2020 we moved it to a stand alone repository, which you're cordially invited to check out.

Now, anyone can make a one-off binding library and it can be a fun exercise in learning the ropes of a language and how it interfaces with other code via FFI. And people have done so before, a casual search will turn up a few such projects.

But, to make it a robustly and continuously tested library is somewhat tedious, and I am not aware that anyone else has done it. We invested some time to fix the developer toil for those who may care, so that you don't need to spend days figuring out how to install exactly the correct ICU version in exactly the right way so you can begin to work on the project.

For example, if you have Docker installed, you can clone the project and type make docker-test and you're off to the races.

Some features of interest:

This means it should be fairly easy for anyone to dive in and use or contribute. It's available on crates.io, and docs can be seen on docs.rs, as you'd expect.

As for the downside, the feature coverage isn't that impressive. What's there is what we needed for Fuchsia. In general, the extent of our contribution is motivated by Fuchsia's needs. That said, I think we've lowered the barrier to entry enough that it becomes practical for someone motivated to contribute the missing functionality.

And if you think you can contribute a feature you needed but didn't know where to place, that's possible too. More details about the project are available in README.md.

Of course, this doesn't diminish the significance of other projects that deal with Unicode support in rust; but it may fill a niche need of providing ICU functionality until a fully rustful implementation is ready for use.

Best regards,
F

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
T-libs-api Relevant to the library API team, which will review and decide on the RFC.
Projects
None yet
Development

No branches or pull requests

4 participants