Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Escaping char in libcore adds 2k of static data for no_std cases #39492

Open
aturon opened this issue Feb 3, 2017 · 12 comments
Open

Escaping char in libcore adds 2k of static data for no_std cases #39492

aturon opened this issue Feb 3, 2017 · 12 comments
Labels
A-unicode Area: Unicode C-feature-accepted Category: A feature request that has been accepted pending implementation. E-help-wanted Call for participation: Help is requested to fix this issue. I-heavy Issue: Problems and improvements with respect to binary size of generated code. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. WG-embedded Working group: Embedded systems

Comments

@aturon
Copy link
Member

aturon commented Feb 3, 2017

The PR to improve char escaping unfortunately added 2k of static data to libcore, which impacts the no_std use case on small devices. Even if in some cases this data could be eliminated automatically, if you ever format a character, you'll definitely bring these tables in.

We should see whether there's a way to get this functionality while moving the bloat to libstd, perhaps using specialization.

@aturon aturon added the T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. label Feb 3, 2017
@aturon
Copy link
Member Author

aturon commented Feb 3, 2017

cc @rust-lang/libs @SimonSapin

This was raised in @SimonSapin's recent comment. I'd like to propose a general policy that we take care in libcore to optimize for space, given that it's the basis for working on small embedded devices.

Nominating for discussion in next triage meeting.

@hanna-kruppe
Copy link
Contributor

Any solution that removes the functionality from core will regress string formatting in no_std crates. This should be obvious but I want to spell out that this affects not only custom code for small embedded devices, but also many other crates — up to and including regular libraries (that just happen to be no_std) linked into desktop applications. Furthermore, core and std would have different behavior for the same code, which would be quite confusing and AFAIK is unprecedented.

@aturon
Copy link
Member Author

aturon commented Feb 3, 2017

@rkruppe All fair points. We need to figure out the right way to balance between these several legit concerns.

@hanna-kruppe
Copy link
Contributor

Re: a more general policy to optimize libcore for size, one area of libcore that eats a lot of binary space is float formatting and parsing. Together, they account for 60+ kilobytes on top of the previous, naive implementations (assuming the measurements reported in the respective PRs #24612 #27307 are still accurate — but there were few changes to the code since then). While this could be optimized somewhat, these tasks fundamentally require a lot of tricky algorithmic work and pre-computed data to be both efficient and correct. (But again, it could probably be shrunk quite a bit, so if someone cares enough to try and improve this, be my guest! I have some ideas.)

I bring this up both out of fairness (if anything should be cut, these are prime candidates) and to bring those two kilobytes into perspective.

@eddyb
Copy link
Member

eddyb commented Feb 3, 2017

Btw, the implementation also appears to only use RLE and not much hierarchically, or any bitsets dor that matter, so it could perhaps use up less space.

@SimonSapin
Copy link
Contributor

@eddyb I tried to use std_unicode’s BoolTrie instead but that ~doubled the size… (There’s a few very large runs where RLE helps a lot.)

To deal with competing concerns, I think that Unicode-aware escaping should probably be the default but there should be some way to opt out of it for the niche cases where code size is constrained. Now that the build system is based on Cargo, could libcore have a default Cargo feature for this? Disabling it would require building your own libcore, but that’s already required at the moment for (at least some) embedded targets.

@aturon
Copy link
Member Author

aturon commented Feb 3, 2017

@SimonSapin Cargo features for core are a really good idea for dealing with this tradeoff. 👍

@alexcrichton
Copy link
Member

The libs team discussed this issue during triage today and the conclusion was that @SimonSapin's idea of a feature on libcore is great! We're totally on board with such a Cargo feature which trims down the size of many libcore routines (sometimes at the cost of correctness in the case of float parsing)

We're not likely to proactively add such feature, but PRs doing so would be most welcome!

@Mark-Simulacrum Mark-Simulacrum added C-feature-request Category: A feature request, i.e: not implemented / a PR. E-help-wanted Call for participation: Help is requested to fix this issue. labels Jul 27, 2017
@dtolnay dtolnay added C-feature-accepted Category: A feature request that has been accepted pending implementation. and removed C-feature-request Category: A feature request, i.e: not implemented / a PR. labels Nov 18, 2017
@tbu-
Copy link
Contributor

tbu- commented Sep 4, 2018

I don't know how I'd select features from core. Is there infrastructure in place that allows crates to do that?

If that's not the case, a PR adding default features to core won't have much impact.

@oli-obk
Copy link
Contributor

oli-obk commented Sep 4, 2018

You can use xargo to build your own libcore with any configuration settings you desire (e.g. no landing pads and panic = abort and similar)

@steveklabnik
Copy link
Member

Triage: I'm not aware of any movement on this issue since Alex posted the path forward.

@SimonSapin
Copy link
Contributor

SimonSapin commented Sep 16, 2019

We’ve since added more Unicode-related functionality to libcore (and do not have a libstd_unicode crate anymore), on the basis that the linker should eliminate from the final binary any table that is unused.

I suppose this issue is relevant for cases where <str as fmt::Debug>::fmt is used, and space is very constrained. However there is probably not much to do here until std-aware Cargo (and specifically rust-lang/wg-cargo-std-aware#4) gets further along.

@jonas-schievink jonas-schievink added I-heavy Issue: Problems and improvements with respect to binary size of generated code. WG-embedded Working group: Embedded systems labels Nov 9, 2019
@workingjubilee workingjubilee added the A-unicode Area: Unicode label Jul 22, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-unicode Area: Unicode C-feature-accepted Category: A feature request that has been accepted pending implementation. E-help-wanted Call for participation: Help is requested to fix this issue. I-heavy Issue: Problems and improvements with respect to binary size of generated code. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. WG-embedded Working group: Embedded systems
Projects
None yet
Development

No branches or pull requests