Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bad codegen for non-copy-derived struct with all Copy derived fields #128081

Closed
CrazyboyQCD opened this issue Jul 23, 2024 · 11 comments · Fixed by #128299
Closed

Bad codegen for non-copy-derived struct with all Copy derived fields #128081

CrazyboyQCD opened this issue Jul 23, 2024 · 11 comments · Fixed by #128299
Assignees
Labels
A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. C-optimization Category: An issue highlighting optimization opportunities or PRs implementing such I-heavy Issue: Problems and improvements with respect to binary size of generated code. I-slow Issue: Problems and improvements with respect to performance of generated code. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@CrazyboyQCD
Copy link

CrazyboyQCD commented Jul 23, 2024

Godbolt Link
As you can see in the asm output, even set opt-level = 3, if we don't add Copy to structs with all fields Copy derived, in clone() it generates more mov and large struct can't trigger memcpy.

@rustbot rustbot added the needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. label Jul 23, 2024
@tgross35 tgross35 added I-slow Issue: Problems and improvements with respect to performance of generated code. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. I-heavy Issue: Problems and improvements with respect to binary size of generated code. C-optimization Category: An issue highlighting optimization opportunities or PRs implementing such and removed needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. labels Jul 23, 2024
@tgross35
Copy link
Contributor

There was some brief discussion on Zulip about changing Clone usage to Copy as a MIR opt in cases where they are known to be the same. Don't remember details here and could very well be misremembering, but @scottmcm I think you might have been the one to bring it up?

@scottmcm
Copy link
Member

For primitives we already change .clone() calls to copies (#94276) exactly so that the ones in derived Clone implementations become copies instead. As such, if you check MIR you'll see https://godbolt.org/z/T131P5YP1

        StorageLive(_45);
        _45 = ((*_1).43: u8);
        StorageLive(_46);
        _46 = ((*_1).44: u8);
        StorageLive(_47);
        _47 = ((*_1).45: u8);
        StorageLive(_48);
        _48 = ((*_1).46: u8);
        StorageLive(_49);
        _49 = ((*_1).47: u8);
        StorageLive(_50);
        _50 = ((*_1).48: u8);
        StorageLive(_51);
        _51 = ((*_1).49: u8);
        StorageLive(_52);
        _52 = ((*_1).50: u8);
        StorageLive(_53);
        _53 = ((*_1).51: u8);
        StorageLive(_54);
        _54 = ((*_1).52: [Dav1dSequenceHeaderOperatingParameterInfo; 32]);
        _0 = Dav1dSequenceHeader { profile: move _2, max_width: move _3, max_height: move _4, layout: move _5, pri: move _6, trc: move _7, mtrx: move _8, chr: move _9, hbd: move _10, color_range: move _11, num_operating_points: move _12, operating_points: move _13, still_picture: move _14, reduced_still_picture_header: move _15, timing_info_present: move _16, num_units_in_tick: move _17, time_scale: move _18, equal_picture_interval: move _19, num_ticks_per_picture: move _20, decoder_model_info_present: move _21, encoder_decoder_buffer_delay_length: move _22, num_units_in_decoding_tick: move _23, buffer_removal_delay_length: move _24, frame_presentation_delay_length: move _25, display_model_info_present: move _26, width_n_bits: move _27, height_n_bits: move _28, frame_id_numbers_present: move _29, delta_frame_id_n_bits: move _30, frame_id_n_bits: move _31, sb128: move _32, filter_intra: move _33, intra_edge_filter: move _34, inter_intra: move _35, masked_compound: move _36, warped_motion: move _37, dual_filter: move _38, order_hint: move _39, jnt_comp: move _40, ref_frame_mvs: move _41, screen_content_tools: move _42, force_integer_mv: move _43, order_hint_n_bits: move _44, super_res: move _45, cdef: move _46, restoration: move _47, ss_hor: move _48, ss_ver: move _49, monochrome: move _50, color_description_present: move _51, separate_uv_delta_q: move _52, film_grain_present: move _53, operating_parameter_info: move _54 };

which is copying the fields, then moving them into the aggregate.

https://rust-lang.github.io/rfcs/1521-copy-clone-semantics.html lets the standard library call Copy instead of Clone on things, but if -- like here -- the whole type isn't Copy that can't apply.

My instinct is that this should be filed to LLVM, because it's much better positioned to look at all the loads and stores we give it https://godbolt.org/z/4W9PP8nTW and coalesce them into something smaller.

And types should be marked Copy where possible because that allows RFC1521 to skip clones in lots of places in the standard library. Is there a reason that this one wasn't?

@tgross35
Copy link
Contributor

tgross35 commented Jul 23, 2024

Could we do something like run the #[derive(Copy)] check (i.e. see if all members are Copy) on everything Clone, and then make it get the same clone -> copy transformation if Copy could apply?

There are some good reasons not to use Copy even when it would be allowed - having it means that new private non-Copy fields is API breakage. And then often it's not great to have the implicit duplication in your code (e.g. I think it's pretty common to turn off automatic #[derive(Copy)] in bindgen).

(That being said, it does seem like minimizing and opening an LLVM issue would be good since there is something it's not seeing through)

@kkysen
Copy link

kkysen commented Jul 23, 2024

And types should be marked Copy where possible because that allows RFC1521 to skip clones in lots of places in the standard library. Is there a reason that this one wasn't?

The reason we didn't is because the types are fairly large and so we want to avoid accidental copies. I was expecting a .clone() that would be identical to what a Copy would be to be optimized the same.

We may change this to #[derive(Copy)], too, now because of the much better optimization and perf is very important for us, but we'd definitely prefer not to, because these types aren't meant to be automatically copied.

It does seem like there should be a better way for std to detect this other than Copy, detaching the meaning of bitwise-copyable with auto-copying variables.

My instinct is that this should be filed to LLVM, because it's much better positioned to look at all the loads and stores we give it https://godbolt.org/z/4W9PP8nTW and coalesce them into something smaller.

@CrazyboyQCD, I think it'd be good to file this against LLVM, too. I would think LLVM should be able to optimize this without help from rustc.

@CrazyboyQCD
Copy link
Author

CrazyboyQCD commented Jul 24, 2024

@scottmcm, would you mind doing this for LLVM? I'm not quite sure how to describe this clearly.

@tgross35
Copy link
Contributor

tgross35 commented Jul 24, 2024

Can you minimize the code example as much as possible? Remove fields, manually inline function calls, delete irrelevant code, etc as long as the issue still shows up.

If you do that, you can more or less just post the LLVM IR with Copy and the one without Copy to an issue, as long as you link the original godbolt. You can view the IR by clicking "add new" and then "LLVM IR" at the assembly tab. The goal is to show a missed optimization, i.e. "code A should be equivalent to code B but LLVM can't see it".

It's better yet if you can get something that reproduces with LLC. I don't have a great process for this but usually I copy the LLVM IR from the Rust to a LLC godbolt (I just use llvm.godbolt.org, set the input language to "LLVM IR") and try to delete more stuff there. Note you might need to manually demangle the function names so it actually compiles.

Scott is definitely far more in the know here than I am and can probably give some better suggestions, but if you can minimize it a bit then that's a great start :)

@tgross35 tgross35 reopened this Jul 24, 2024
@tgross35
Copy link
Contributor

Sorry about that, Github decided to click a button for me.

@CrazyboyQCD
Copy link
Author

@tgross35, just minimized the examples and pasted them.

@DianQK
Copy link
Member

DianQK commented Jul 24, 2024

I want to see if I can use #94276 to complete this in codegen. :) Having LLVM recognize this pattern might take up a lot of compile time.

@rustbot claim

@DianQK
Copy link
Member

DianQK commented Jul 25, 2024

Found a "future" regression, clone is better than copy when modifying some values: https://godbolt.org/z/xn1sKbs64. In LLVM, store optimizations are more than memcpy.

@rustbot label +A-LLVM

@rustbot rustbot added the A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. label Jul 25, 2024
bors added a commit to rust-lang-ci/rust that referenced this issue Jul 27, 2024
…=<try>

 Perform instsimplify before inline to eliminate some trivial calls

I am currently working on rust-lang#128081. In the current pipeline, we can get the following clone statements ([godbolt](https://rust.godbolt.org/z/931316fhP)):

```
    bb0: {
        StorageLive(_2);
        _2 = ((*_1).0: i32);
        StorageLive(_3);
        _3 = ((*_1).1: u64);
        _0 = Foo { a: move _2, b: move _3 };
        StorageDead(_3);
        StorageDead(_2);
        return;
    }
```

Analyzing such statements will be simple and fast. We don't need to consider branches or some interfering statements. However, this requires us to run `InstSimplify`, `ReferencePropagation`, and `SimplifyCFG` at least once. I can introduce a new pass, but I think the best place for it would be within `InstSimplify`.

I put `InstSimplify` before `Inline`, which takes some of the burden away from `Inline`.

r? `@saethlin`
bors added a commit to rust-lang-ci/rust that referenced this issue Jul 28, 2024
Simplify the canonical clone method and the copy-like forms to copy

Fixes rust-lang#128081. Currently being blocked by rust-lang#128265.

`@rustbot` label +S-blocked

r? `@saethlin`
bors added a commit to rust-lang-ci/rust that referenced this issue Jul 29, 2024
…=saethlin

 Perform instsimplify before inline to eliminate some trivial calls

I am currently working on rust-lang#128081. In the current pipeline, we can get the following clone statements ([godbolt](https://rust.godbolt.org/z/931316fhP)):

```
    bb0: {
        StorageLive(_2);
        _2 = ((*_1).0: i32);
        StorageLive(_3);
        _3 = ((*_1).1: u64);
        _0 = Foo { a: move _2, b: move _3 };
        StorageDead(_3);
        StorageDead(_2);
        return;
    }
```

Analyzing such statements will be simple and fast. We don't need to consider branches or some interfering statements. However, this requires us to run `InstSimplify`, `ReferencePropagation`, and `SimplifyCFG` at least once. I can introduce a new pass, but I think the best place for it would be within `InstSimplify`.

I put `InstSimplify` before `Inline`, which takes some of the burden away from `Inline`.

r? `@saethlin`
bors added a commit to rust-lang-ci/rust that referenced this issue Jul 29, 2024
Simplify the canonical clone method and the copy-like forms to copy

Fixes rust-lang#128081. Currently being blocked by rust-lang#128265.

`@rustbot` label +S-blocked

r? `@saethlin`
bors added a commit to rust-lang-ci/rust that referenced this issue Jul 31, 2024
Simplify the canonical clone method and the copy-like forms to copy

Fixes rust-lang#128081.

r? `@cjgillot`
bors added a commit to rust-lang-ci/rust that referenced this issue Aug 2, 2024
Simplify the canonical clone method and the copy-like forms to copy

Fixes rust-lang#128081.

r? `@cjgillot`
bors added a commit to rust-lang-ci/rust that referenced this issue Aug 24, 2024
Simplify the canonical clone method and the copy-like forms to copy

Fixes rust-lang#128081.

r? `@cjgillot`
bors added a commit to rust-lang-ci/rust that referenced this issue Aug 31, 2024
Simplify the canonical clone method and the copy-like forms to copy

Fixes rust-lang#128081.

r? `@cjgillot`
bors added a commit to rust-lang-ci/rust that referenced this issue Aug 31, 2024
Simplify the canonical clone method and the copy-like forms to copy

Fixes rust-lang#128081.

r? `@cjgillot`
bors added a commit to rust-lang-ci/rust that referenced this issue Sep 2, 2024
Simplify the canonical clone method and the copy-like forms to copy

Fixes rust-lang#128081.

r? `@cjgillot`
@theemathas
Copy link
Contributor

Another example of the issue: Godbolt link

bors added a commit to rust-lang-ci/rust that referenced this issue Sep 14, 2024
Simplify the canonical clone method and the copy-like forms to copy

Fixes rust-lang#128081.

The optimized clone method ends up as the following MIR:

```
_2 = copy ((*_1).0: i32);
_3 = copy ((*_1).1: u64);
_4 = copy ((*_1).2: [i8; 3]);
_0 = Foo { a: move _2, b: move _3, c: move _4 };
```

We can transform this to:

```
_0 = copy (*_1);
```

r? `@cjgillot`
@bors bors closed this as completed in e7386b3 Sep 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. C-optimization Category: An issue highlighting optimization opportunities or PRs implementing such I-heavy Issue: Problems and improvements with respect to binary size of generated code. I-slow Issue: Problems and improvements with respect to performance of generated code. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants