Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Little improves in CString new when creating from slice #92124

Merged
merged 1 commit into from
Jan 19, 2022
Merged

Little improves in CString new when creating from slice #92124

merged 1 commit into from
Jan 19, 2022

Conversation

AngelicosPhosphoros
Copy link
Contributor

@AngelicosPhosphoros AngelicosPhosphoros commented Dec 20, 2021

Old code already contain optimization for cases with &str and &[u8] args. This commit adds a specialization for &mut[u8] too.

Also, I added usage of old slice in search for zero bytes instead of new buffer because it produce better code for constant inputs on Windows LTO builds. For other platforms, this wouldn't cause any difference because it calls libc anyway.

Inlined _new method into spec trait to reduce amount of code generated to CString::new callers.

@rust-highfive
Copy link
Collaborator

r? @kennytm

(rust-highfive has picked a reviewer for you, use r? to override)

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Dec 20, 2021
@AngelicosPhosphoros
Copy link
Contributor Author

AngelicosPhosphoros commented Dec 20, 2021

I checked generated asm for CString::new call using such dylib:

use std::ffi::{CString, NulError};

#[no_mangle]
pub extern "Rust" fn make_c_string()->Result<CString, NulError>{
    CString::new("Hello world!")
}

Asm generated using command: rustc +stage1 --crate-type=cdylib cstring_new.rs --emit=asm -Copt-level=3 -Ccodegen-units=1

Old generated code
    .text
    .def     @feat.00;
    .scl    3;
    .type   0;
    .endef
    .globl  @feat.00
.set @feat.00, 0
    .file   "cstring_new.a85eb7bc-cgu.0"
    .def     make_c_string;
    .scl    2;
    .type   32;
    .endef
    .section    .text,"xr",one_only,make_c_string
    .globl  make_c_string
    .p2align    4, 0x90
make_c_string:
.seh_proc make_c_string
    pushq   %rsi
    .seh_pushreg %rsi
    pushq   %rdi
    .seh_pushreg %rdi
    subq    $56, %rsp
    .seh_stackalloc 56
    .seh_endprologue
    movq    %rcx, %rsi
    leaq    __unnamed_1(%rip), %rdx
    leaq    32(%rsp), %rdi
    movl    $12, %r8d
    movq    %rdi, %rcx
    callq   _ZN70_$LT$$RF$str$u20$as$u20$std..ffi..c_str..CString..new..SpecIntoVec$GT$8into_vec17h42e14556a4e783e0E
    movq    %rsi, %rcx
    movq    %rdi, %rdx
    callq   _ZN3std3ffi5c_str7CString4_new17hd06ee8640c50ad96E
    movq    %rsi, %rax
    addq    $56, %rsp
    popq    %rdi
    popq    %rsi
    retq
    .seh_endproc

    .section    .rdata,"dr",one_only,__unnamed_1
__unnamed_1:
    .ascii  "Hello world!"
New generated code
    .text
    .def     @feat.00;
    .scl    3;
    .type   0;
    .endef
    .globl  @feat.00
.set @feat.00, 0
    .file   "cstring_new.558cd7f7-cgu.0"
    .def     make_c_string;
    .scl    2;
    .type   32;
    .endef
    .section    .text,"xr",one_only,make_c_string
    .globl  make_c_string
    .p2align    4, 0x90
make_c_string:
.seh_proc make_c_string
    pushq   %rsi
    .seh_pushreg %rsi
    subq    $32, %rsp
    .seh_stackalloc 32
    .seh_endprologue
    movq    %rcx, %rsi
    leaq    __unnamed_1(%rip), %rdx
    movl    $12, %r8d
    callq   _ZN66_$LT$$RF$str$u20$as$u20$std..ffi..c_str..CString..new..NewImpl$GT$8new_impl17ha5748a2c8ce696e6E
    movq    %rsi, %rax
    addq    $32, %rsp
    popq    %rsi
    retq
    .seh_endproc

    .section    .rdata,"dr",one_only,__unnamed_1
__unnamed_1:
    .ascii  "Hello world!"

It can be seen that new caller code is shorter.

As for lto performance, I compiled same code used rustc +stage1 --crate-type=cdylib cstring_new.rs -Copt-level=3 -Ccodegen-units=1 -Clto=fat then looked it using IdaPRO (on Windows).
When I used memchr on new buffer, there is unrolled loop which checks each byte of new buffer if it is zero and branch on it. When I use self, compiler eliminates check completely.

On Unix, there is probably no difference because Rust uses libc implementation anyway in that case.

@AngelicosPhosphoros
Copy link
Contributor Author

AngelicosPhosphoros commented Dec 20, 2021

I have 2 questions:

  1. Should I prefer uninit allocation instead of vec![0u8; ...] to avoid zeroing memory? It is easy to add, I just not sure if I should.
  2. Maybe it is good idea to make some methods #[inline], specifically CString::from_vec_unchecked and implementation of NewImpl::<&[u8]>::new_impl because currently first one is not inlined into new_impl() and second one not inlined for impls for &mut [u8] and &str despite them being inside std. Should I do that?

Example:

; _$LT$$RF$mut$u20$$u5b$u8$u5d$$u20$as$u20$std..ffi..c_str..CString..new..NewImpl$GT$::new_impl::hcd1310bd9104df0d
_ZN83_$LT$$RF$mut$u20$$u5b$u8$u5d$$u20$as$u20$std__ffi__c_str__CString__new__NewImpl$GT$8new_impl17hcd1310bd9104df0dE proc near
push    rsi
sub     rsp, 20h
mov     rsi, rcx
call    _ZN75_$LT$$RF$$u5b$u8$u5d$$u20$as$u20$std__ffi__c_str__CString__new__NewImpl$GT$8new_impl17h8473efdc7dbe7405E ; _$LT$$RF$$u5b$u8$u5d$$u20$as$u20$std..ffi..c_str..CString..new..NewImpl$GT$::new_impl::h8473efdc7dbe7405
mov     rax, rsi
add     rsp, 20h
pop     rsi
retn
_ZN83_$LT$$RF$mut$u20$$u5b$u8$u5d$$u20$as$u20$std__ffi__c_str__CString__new__NewImpl$GT$8new_impl17hcd1310bd9104df0dE endp

@AngelicosPhosphoros
Copy link
Contributor Author

Surprisingly, adding #[inline] doesn't change generated code in std.

@AngelicosPhosphoros
Copy link
Contributor Author

It turns out that with_capacity already returns vector with exact capacity needed.
Would rewrite.

@AngelicosPhosphoros AngelicosPhosphoros changed the title Remove reallocs in CString new when creating from slice Little improves in CString new when creating from slice Dec 20, 2021
@AngelicosPhosphoros
Copy link
Contributor Author

Removed all code related to new allocation mechanism.
I think, other changes (especially &mut [u8] specialization) still useful so I would keep PR open.

@rust-log-analyzer

This comment has been minimized.

@AngelicosPhosphoros
Copy link
Contributor Author

Code with LTO on Windows with current nightly.
изображение
Code with LTO on Windows with my fixes.
изображение

@AngelicosPhosphoros
Copy link
Contributor Author

JFYI: I wouldn't be able to participate until 15th January. I would return to this PR after that if needed.

Old code already contain optimization for cases with `&str` and `&[u8]` args. This commit adds a specialization for `&mut[u8]` too.

Also, I added usage of old slice in search for zero bytes instead of new buffer because it produce better code for Windows on LTO builds. For other platforms, this wouldn't cause any difference because it calls `libc` anyway.

Inlined `_new` method into spec trait to reduce amount of code generated to `CString::new` callers.
@AngelicosPhosphoros
Copy link
Contributor Author

@kennytm May you review my PR please? :)

@Mark-Simulacrum
Copy link
Member

This seems okay to me, thanks.

@bors r+

@bors
Copy link
Contributor

bors commented Jan 16, 2022

📌 Commit 4b62a77 has been approved by Mark-Simulacrum

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jan 16, 2022
bors added a commit to rust-lang-ci/rust that referenced this pull request Jan 19, 2022
…askrgr

Rollup of 10 pull requests

Successful merges:

 - rust-lang#88642 (Formally implement let chains)
 - rust-lang#89621 (doc: guarantee call order for sort_by_cached_key)
 - rust-lang#91278 (Use iterator instead of recursion in `codegen_place`)
 - rust-lang#92124 (Little improves in CString `new` when creating from slice)
 - rust-lang#92783 (Annotate dead code lint with notes about ignored derived impls)
 - rust-lang#92797 (Remove horizontal lines at top of page)
 - rust-lang#92920 (Move expr- and item-related pretty printing functions to modules)
 - rust-lang#93041 (Remove some unused ordering derivations based on `DefId`)
 - rust-lang#93051 (Add Option::is_some_with and Result::is_{ok,err}_with)
 - rust-lang#93062 (Update books)

Failed merges:

r? `@ghost`
`@rustbot` modify labels: rollup
@bors bors merged commit 3148a32 into rust-lang:master Jan 19, 2022
@rustbot rustbot added this to the 1.60.0 milestone Jan 19, 2022
@AngelicosPhosphoros AngelicosPhosphoros deleted the remove_extra_alloc_in_cstring_new_35838 branch January 19, 2022 19:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants