Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Attempt to make Box::default() construct in-place + Box::new_in_place() #64062

Closed
wants to merge 2 commits into from

Conversation

petertodd
Copy link
Contributor

To start with, I'll say straight up this is more of a sketch of a possible API then something I'm sure we should merge.

The first part here is the Box::new_in_place() API. Basically the idea is that we could provide a way to reliably create large boxed values in-place via return-value-optimization. This is something that often is never possible to optimize, even in theory, because allocation is fallible and if the creation of the value is itself fallible, the optimizer can't move the allocation to happen first because that would be an observable change.

Unfortunately this doesn't quite work if the closure can't be inlined sufficiently due to optimizer limitations. But hopefully this at least shows a way forward that might be interesting to others.

Secondly, I use this Box::new_in_place() API to optimize Box::default(). I'm not totally sure this is a good idea to actually merge, as currently this is kind of an observable change in behavior for things like large arrays that would otherwise blow up the stack; people might rely on it, and currently the optimization is unreliable.

Finally, this of course this could be extended to Rc and Arc as well in much the same fashion.

@rust-highfive
Copy link
Collaborator

r? @kennytm

(rust_highfive has picked a reviewer for you, use r? to override)

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Sep 1, 2019
/// FIXME: This is intended to work via return-value-optimization, but due to compiler
/// limitations can't reliably do that yet.
#[inline(always)]
fn new_in_place(f: impl FnOnce() -> T) -> Box<T> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is semantically more like new_please_try_to_do_it_in_place_but_I_am_not_expecting_any_guarantee?
(Such a guarantee is not something we can reliably or would make on a language (not LLVM) level.)

Copy link
Contributor Author

@petertodd petertodd Sep 1, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Point is, if we had guaranteed copy elision, I think this is as close as you could get with an API that looked something like this given Rust's semantics. Even with out-references it's not clear to me you could get semantics that nicely guarantee no copies in an easy way, as you could still mess up and wind up with a copy prior to writing it to the out-reference.

Like I said above, it's not clear to me this is actually a good idea. But it does seem novel. :)

edit: fixed thinko: we do have return-value-optimization; it's the guaranteed copy elision that we're missing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that we do not guarantee copy elision we'll need to at minimum call this something other than new_in_place which suggests that there actually is a guarantee. For example: new_hint_in_place.

It might be a good idea to just use this internally for now... let's see how this measures up in terms of perf for fn default.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be a good idea to just use this internally for now

Yup, it is private right now, as I didn't want to give people false hope even in nightly. I'd suggest that we leave it that way until RVO is more reliable (specifically named-return-value-optimization if I understand the terminology right?).

@Centril
Copy link
Contributor

Centril commented Sep 1, 2019

@bors try

@bors
Copy link
Contributor

bors commented Sep 1, 2019

⌛ Trying commit b0674d3 with merge 9d1bd60...

bors added a commit that referenced this pull request Sep 1, 2019
Attempt to make Box::default() construct in-place + Box::new_in_place()

To start with, I'll say straight up this is more of a sketch of a possible API then something I'm sure we should merge.

The first part here is the `Box::new_in_place()` API. Basically the idea is that we could provide a way to reliably create large boxed values in-place via return-value-optimization. This is something that often is *never* possible to optimize, even in theory, because allocation is fallible and if the creation of the value is itself fallible, the optimizer can't move the allocation to happen first because that would be an observable change.

Unfortunately this doesn't quite work if the closure can't be inlined sufficiently due to optimizer limitations. But hopefully this at least shows a way forward that might be interesting to others.

Secondly, I use this `Box::new_in_place()` API to optimize `Box::default()`. I'm not totally sure this is a good idea to actually merge, as currently this is kind of an observable change in behavior for things like large arrays that would otherwise blow up the stack; people might rely on it, and currently the optimization is unreliable.

Finally, this of course this could be extended to `Rc` and `Arc` as well in much the same fashion.
@bors
Copy link
Contributor

bors commented Sep 1, 2019

☀️ Try build successful - checks-azure
Build commit: 9d1bd60

@Centril
Copy link
Contributor

Centril commented Sep 1, 2019

@rust-timer build 9d1bd60

@rust-timer
Copy link
Collaborator

Success: Queued 9d1bd60 with parent e29faf0, comparison URL.

@rust-timer
Copy link
Collaborator

Finished benchmarking try commit 9d1bd60, comparison URL.

@petertodd
Copy link
Contributor Author

Looks like this is on average very slightly slower than before.

I'll bet you this is a function of caching: by allocating first, you have to access allocation-related memory flushing both registers and cache, followed by doing the work to create the object. As most objects are fairly small anyway, having to reload registers and cache to do the work of creating it may be slightly more costly on average than the extra copy.

@bjorn3
Copy link
Member

bjorn3 commented Sep 2, 2019

box expr already allocate a box first and then evaluate expr. That means that Box::default would have already been subject to rvo if possible.

#![feature(box_syntax)]

fn main() {
    let _: Box<String> = box String::new();
}
// [...]
        _2 = Box(std::string::String);
        (*_2) = const std::string::String::new() -> [return: bb2, unwind: bb4];
// [...]

@petertodd
Copy link
Contributor Author

box expr already allocate a box first and then evaluate expr. That means that Box::default would have already been subject to rvo if possible.

Ha, that's embarassing! Yeah, I wrote the initial version of this for Rc/Arc, and I was sure I'd double-checked the assembly for the Box version... But I guess not.

@Centril Just pushed another commit which adds Rc/Arc if you want to try that benchmarking thing again.

I suppose since the performance isn't significantly changed Box::new_in_place() still might be worth merging if only to provide a way to remove the old box syntax.

@Centril
Copy link
Contributor

Centril commented Sep 5, 2019

@Centril Just pushed another commit which adds Rc/Arc if you want to try that benchmarking thing again.

Sure; can't hurt.

@bors try

@bors
Copy link
Contributor

bors commented Sep 5, 2019

⌛ Trying commit 9b57963 with merge a683cc0...

bors added a commit that referenced this pull request Sep 5, 2019
Attempt to make Box::default() construct in-place + Box::new_in_place()

To start with, I'll say straight up this is more of a sketch of a possible API then something I'm sure we should merge.

The first part here is the `Box::new_in_place()` API. Basically the idea is that we could provide a way to reliably create large boxed values in-place via return-value-optimization. This is something that often is *never* possible to optimize, even in theory, because allocation is fallible and if the creation of the value is itself fallible, the optimizer can't move the allocation to happen first because that would be an observable change.

Unfortunately this doesn't quite work if the closure can't be inlined sufficiently due to optimizer limitations. But hopefully this at least shows a way forward that might be interesting to others.

Secondly, I use this `Box::new_in_place()` API to optimize `Box::default()`. I'm not totally sure this is a good idea to actually merge, as currently this is kind of an observable change in behavior for things like large arrays that would otherwise blow up the stack; people might rely on it, and currently the optimization is unreliable.

Finally, this of course this could be extended to `Rc` and `Arc` as well in much the same fashion.
@bors
Copy link
Contributor

bors commented Sep 5, 2019

☀️ Try build successful - checks-azure
Build commit: a683cc0

@Centril
Copy link
Contributor

Centril commented Sep 5, 2019

@rust-timer build a683cc0

@rust-timer
Copy link
Collaborator

Success: Queued a683cc0 with parent f257c40, comparison URL.

@rust-timer
Copy link
Collaborator

Finished benchmarking try commit a683cc0, comparison URL.

@petertodd
Copy link
Contributor Author

Heh, I wonder if that benchmark suit even has a single case of Rc::default() or Arc::default() being used?

Anyway, might be too rare to be worth optimizing.

@Centril
Copy link
Contributor

Centril commented Sep 5, 2019

@petertodd "The benchmark suite" is not those crates having Arc/Rc::default() but rather it is the performance of the compiler type checking, generating code, etc. for those crates. In other words, we are measuring compile time perf.

@petertodd
Copy link
Contributor Author

@petertodd "The benchmark suite" is not those crates having Arc/Rc::default() but rather it is the performance of the compiler type checking, generating code, etc. for those crates. In other words, we are measuring compile time perf.

Ah ok, so basically my question boils down to whether or not the compiler uses Rc/Arc default()

Anyway, if you guys think this is worth merging, let me know and I'll fix up the comments appropriately.

@Centril
Copy link
Contributor

Centril commented Sep 10, 2019

Ah ok, so basically my question boils down to whether or not the compiler uses Rc/Arc default()

Basically yeah.

Anyway, if you guys think this is worth merging, let me know and I'll fix up the comments appropriately.

I'd drop the Box changes but potentially there's an improvement for Rc and Arc but it could also just be noise.

r? @nnethercote

@nnethercote
Copy link
Contributor

nnethercote commented Sep 11, 2019

The general comments I can make are as follows.

  • The performance improvement on the benchmarks is vanishingly small, right at the edge of what can be reliably detected.
  • The lack of certainty about whether the optimization is even working as expected is discouraging.
  • The commits introduce multiple uses of unsafe.

Based on those observations, my inclination is that these commits do not have enough of a benefit to warrant landing.

I am not comfortable reviewing the specifics of the code, because my understanding of things like MaybeUninit is almost zero. @pnkfelix might be a better reviewer, if you want to push forward.

r? @pnkfelix

@pietroalbini
Copy link
Member

r? @pietroalbini

Just testing highfive.

@pietroalbini
Copy link
Member

r? @pnkfelix

@pietroalbini
Copy link
Member

Sorry y'all for the useless pings, trying to figure out why highfive didn't work earlier.

r? @nnethercote

@pietroalbini
Copy link
Member

Ok, apparently it's working now? The weird thing is I can't find any errors that happened when the last two r? failed, everything is green...

r? @pnkfelix

@JohnCSimon
Copy link
Member

ping from triage -
@pnkfelix @Centril @nnethercote
CC: @petertodd
Hi! This has sat idle for ten days. Is this ready for review?

Thanks

@Centril
Copy link
Contributor

Centril commented Sep 28, 2019

@pnkfelix Now that you're back, can you have a look at this please? // triage

@JohnCSimon
Copy link
Member

Pinging again from triage
@pnkfelix ?
Thanks

@JohnCSimon
Copy link
Member

This is still waiting on review
@pnkfelix @Centril
Thanks

@pnkfelix
Copy link
Member

after reading @nnethercote 's evaluation above, I am closing this PR.

Some extra notes. I'll leave it up to the reader to decide how relevant they are.

@pnkfelix pnkfelix closed this Oct 24, 2019
@petertodd
Copy link
Contributor Author

@pnkfelix Seems reasonable. Thanks for the notes!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-review Status: Awaiting review from the assignee but also interested parties.
Projects
None yet
Development

Successfully merging this pull request may close these issues.