Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ffi::CString::new(string-literal) is suboptimal #35838

Closed
glandium opened this issue Aug 19, 2016 · 14 comments
Closed

ffi::CString::new(string-literal) is suboptimal #35838

glandium opened this issue Aug 19, 2016 · 14 comments
Labels
C-enhancement Category: An issue proposing an enhancement or a PR with one. I-slow Issue: Problems and improvements with respect to performance of generated code. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.

Comments

@glandium
Copy link
Contributor

I noticed this while doing some ltrace using alloc_system on my panicking program. A small test case is

#![feature(alloc_system)]
extern crate alloc_system;
fn main() {
    panic!();
}

What I noticed is the following pattern:

  • ptr = malloc(14)
  • memcpy(ptr, "RUST_BACKTRACE", 14)
  • memchr("RUST_BACKTRACE", '\0', 14)
  • ptr = realloc(ptr, 28)
  • ptr = realloc(ptr, 15)
    (not in ltrace, but likely: ptr[15] = '\0')

If realloc is not in-place, that's 3 copies of the string, and 3 allocations.

This all stems from the use of env::var_os("RUST_BACKTRACE") in librustc_driver/lib.rs, which ends up doing ffi::CString::new("RUST_BACKTRACE") (and indeed replacing panic!() in the test case above with that ffi::CString::new call yields the same copy/alloc pattern.

Ironically, the "RUST_BACKTRACE" static string which is copied the first time, is already null terminated in the executable binary's rodata section.

@glandium
Copy link
Contributor Author

Note, there might be a second bug hidden in here, where the compiler should probably be able to find out that the realloc(ptr, 28) is useless and that it could realloc(ptr, 15) directly.

@Aatch
Copy link
Contributor

Aatch commented Aug 19, 2016

I don't think we can make it optimal. We can't rely on &'static str being a string literal (even if we could special case for the 'static lifetime, which we can't) due to unsafe code, so at best we could optimise to reduce the number of copies and reallocations we're doing.

@durka
Copy link
Contributor

durka commented Aug 20, 2016

A macro or syntax extension could generate unsafe code to conjure a CString into existence, without doing the checks, when it's known to come from a literal.

@bluss
Copy link
Member

bluss commented Aug 20, 2016

CString::new uses Into<Vec<u8>>, so it will create a perfectly-sized Vec (sized for the input string) but then push another byte (0) to it, causing reallocation, and then into_boxed_slice() will have it shrink to fit again..

@bluss
Copy link
Member

bluss commented Aug 20, 2016

We can fix the second reallocation by using v.reserve_exact(1); just before v.push(0), then it will not have to shrink to fit.

eddyb added a commit to eddyb/rust that referenced this issue Aug 22, 2016
cstring: avoid excessive growth just to 0-terminate

Based on following what happens in CString::new("string literal"):

1. Using `Into<Vec<u8>>`, a Vec is allocated with capacity exactly equal
   to the string's input length.
2. By `v.push(0)`, the Vec is grown to twice capacity, since it was full.
3. By `v.into_boxed_slice()`, the Vec capacity is shrunk to fit the length again.

If we use `.reserve_exact(1)` just before the push, then we avoid the
capacity doubling that we're going to have to shrink anyway.

Growing by just 1 byte means that the step (2) is less likely to have to
move the memory to a larger allocation chunk, and that the step (3) does
not have to reallocate.

Addresses part of rust-lang#35838
bors added a commit that referenced this issue Aug 22, 2016
cstring: avoid excessive growth just to 0-terminate

Based on following what happens in CString::new("string literal"):

1. Using `Into<Vec<u8>>`, a Vec is allocated with capacity exactly equal
   to the string's input length.
2. By `v.push(0)`, the Vec is grown to twice capacity, since it was full.
3. By `v.into_boxed_slice()`, the Vec capacity is shrunk to fit the length again.

If we use `.reserve_exact(1)` just before the push, then we avoid the
capacity doubling that we're going to have to shrink anyway.

Growing by just 1 byte means that the step (2) is less likely to have to
move the memory to a larger allocation chunk, and that the step (3) does
not have to reallocate.

Addresses part of #35838
@Mark-Simulacrum
Copy link
Member

I'm seeing the below output from ltrace. It looks like this is mostly fixed, since I don't think there's really anything we can do to not realloc the CString into 15 bytes. Please reopen if that's the wrong interpretation, though.

malloc(14)                                                                                                                                         = 0x560144bb11a0
memcpy(0x560144bb11a0, "RUST_BACKTRACE", 14)                                                                                                       = 0x560144bb11a0
memchr("RUST_BACKTRACE", '\0', 14)                                                                                                                 = 0
realloc(0x560144bb11a0, 15)                                                                                                                        = 0x560144bb11a0
pthread_mutex_lock(0x5601442862f0, 0x7ffd065c16d8, 15, 0x7ffd065c1608)                                                                             = 0
getenv("RUST_BACKTRACE")                                                                                                                           = nil
pthread_mutex_unlock(0x5601442862f0, 15, 0xc000, 416)                                                                                              = 0
free(0x560144bb11a0)                                                                                                                               = <void>

@glandium
Copy link
Contributor Author

The Vec could be initialized to its final size, which is known in advance (length of the &'static str + 1)

@Mark-Simulacrum
Copy link
Member

I'm fairly certain that there's little to nothing we can do about it, just an LLVM optimization. CString::new can't really do much to improve the situation here; the only potential thing would be to inline the method it calls (CString::_new), but that's presumably intentionally not inlined.

@ollie27
Copy link
Member

ollie27 commented May 12, 2017

We should be able to use specialization, as mentioned #35871 (comment) to remove the realloc for &str and &[u8].

@Mark-Simulacrum Mark-Simulacrum added I-slow Issue: Problems and improvements with respect to performance of generated code. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. labels Jun 22, 2017
@Mark-Simulacrum Mark-Simulacrum added the C-enhancement Category: An issue proposing an enhancement or a PR with one. label Jul 26, 2017
@YaLTeR
Copy link
Contributor

YaLTeR commented Apr 6, 2018

Actually, why isn't it optimized? I have a Rust crate wrapping a C library API and there's a lot of places where I need to pass over a string. Naturally, on the Rust side I accept a &str, and the API is such that in most of the cases this parameter is a string literal. It has to go through a CString::new() so inlining and optimization would greatly benefit this use case (since string literals can potentially be optimized to C string literals in the read-only section without any runtime overhead whatsoever).

Currently, though, even this simplest example compiles into a pile of allocations and runtime checks, even with LTO and codegen-units 1:

use std::ffi::CString;

#[inline(never)]
fn f(c_str: *const i8) {
    println!("{:?}", c_str);
}

fn main() {
    let c_string = CString::new("Hello there").unwrap();
    f(c_string.as_ptr());
}

Godbolt

Actually, building this as a simple bin crate locally and checking the disassembly, CString::_new() got inlined all the way into main(), but all the allocations and a memchr() call are still there. Could it be that the problem is that memchr() can't be optimized away since it's a library call? Perhaps if this is the case it would be beneficial to use a Rust implementation?

@Mark-Simulacrum
Copy link
Member

I believe we now specialize down to the best we can (given what we know) after #65551 so closing.

@YaLTeR
Copy link
Contributor

YaLTeR commented Dec 28, 2019

Looking at the disassembly on 1.40.0, creating a CString from a string literal still has an allocation, a memchr and a CString::from_vec_unchecked::something along with a bunch of runtime checks. Is there really nothing that can be done for this case?

@Mark-Simulacrum
Copy link
Member

I think most of the checks could likely be eliminated with some inlining, although I would personally not be sure we should be inlining these functions -- there's not too much benefit.

The allocation though couldn't be avoided, though (CString must be owned).

@AngelicosPhosphoros
Copy link
Contributor

@Mark-Simulacrum
I achieved some progress with eliminating some runtime check in linked PR but this wouldn't happen in libc-based targets because they use libcs memchr.
This also requires to use LTO for build.

I think, nothing more can be achieved here so probably we should just close it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-enhancement Category: An issue proposing an enhancement or a PR with one. I-slow Issue: Problems and improvements with respect to performance of generated code. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

8 participants