Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Box::new(uninitialized) creates a big alloca #58201

Closed
RalfJung opened this issue Feb 5, 2019 · 6 comments · Fixed by #106292
Closed

Box::new(uninitialized) creates a big alloca #58201

RalfJung opened this issue Feb 5, 2019 · 6 comments · Fixed by #106292
Assignees
Labels
A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. I-slow Issue: Problems and improvements with respect to performance of generated code.

Comments

@RalfJung
Copy link
Member

RalfJung commented Feb 5, 2019

The following code

pub fn box_uninitialized2() -> Box<MaybeUninit<[usize; 1024*1024]>> { unsafe {
    // CHECK-LABEL: @box_uninitialized2
    // CHECK-NOT: store
    // CHECK-NOT: alloca
    // CHECK-NOT: memcpy
    // CHECK-NOT: memset
    Box::new(std::mem::uninitialized())
} }

generates some really awful LLVM IR:

start:
  %_1.sroa.0 = alloca [1048576 x i64], align 8
  %_1.sroa.0.0.sroa_cast3 = bitcast [1048576 x i64]* %_1.sroa.0 to i8*
  call void @llvm.lifetime.start.p0i8(i64 8388608, i8* nonnull %_1.sroa.0.0.sroa_cast3)
  %0 = tail call i8* @__rust_alloc(i64 8388608, i64 8) #3, !noalias !1
  %1 = icmp eq i8* %0, null
  br i1 %1, label %bb7.i.i, label %"_ZN35_$LT$alloc..boxed..Box$LT$T$GT$$GT$3new17h9460275a68e769e0E.exit"

bb7.i.i:                                          ; preds = %start
; call alloc::alloc::handle_alloc_error
  tail call void @_ZN5alloc5alloc18handle_alloc_error17h3b023122fefb532cE(i64 8388608, i64 8) #3, !noalias !1
  unreachable

"_ZN35_$LT$alloc..boxed..Box$LT$T$GT$$GT$3new17h9460275a68e769e0E.exit": ; preds = %start
  %2 = bitcast i8* %0 to %"core::mem::MaybeUninit<[usize; 1048576]>"*
  call void @llvm.memcpy.p0i8.p0i8.i64(i8* nonnull align 8 %0, i8* nonnull align 8 %_1.sroa.0.0.sroa_cast3, i64 8388608, i1 false)
  call void @llvm.lifetime.end.p0i8(i64 8388608, i8* nonnull %_1.sroa.0.0.sroa_cast3)
  ret %"core::mem::MaybeUninit<[usize; 1048576]>"* %2

The same happens with std::mem::MaybeUninit::uninitialized().

Cc @arielb1, who made me write these tests, and @rust-lang/wg-codegen

@arielb1 arielb1 added I-slow Issue: Problems and improvements with respect to performance of generated code. A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. labels Feb 6, 2019
@arielb1
Copy link
Contributor

arielb1 commented Feb 6, 2019

Feels like an LLVM limitation to me. It should be replacing the memcpy with a memset(%0, 8388608, undef) (or a memset of poison when that becomes available).

@nikic
Copy link
Contributor

nikic commented Feb 7, 2019

For reference: https://rust.godbolt.org/z/6ea0aL

LLVM is capable of optimizing a memcpy of undef alloca away, this just runs into the usual limitation that all of memcpyopt doesn't work across BBs.

@arielb1
Copy link
Contributor

arielb1 commented Feb 7, 2019

LLVM is capable of optimizing a memcpy of undef alloca away, this just runs into the usual limitation that all of memcpyopt doesn't work across BBs.

This is such a long-standing issue. Seems like it might be wise to have someone working on fixing this.

@nikic
Copy link
Contributor

nikic commented Feb 7, 2019

@arielb1 Someone has already worked on this, see https://reviews.llvm.org/D38374. It's a bit hard to track, but unfortunately this got reverted again (multiple times). I suspect that this is only going to get really resolved once memcpyopt moves to memssa -- which someone has also worked on in the past (https://reviews.llvm.org/D26739), but which never really went anywhere.

@nikic
Copy link
Contributor

nikic commented Mar 12, 2021

The alloca and memcpy are no longer present on nightly, as a result of #82806.

@nikic nikic closed this as completed Mar 12, 2021
@Noratrieb
Copy link
Member

The FIXME in the test is still there, needs a test.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. I-slow Issue: Problems and improvements with respect to performance of generated code.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants