Remove my `scalar_copy_backend_type` optimization attempt #123185

scottmcm · 2024-03-29T07:05:27Z

I added this back in #111999 , but I no longer think it's a good idea

It had to get scaled back to only power-of-two things to not break a bunch of targets
LLVM seems to be getting better at memcpy removal anyway
Introducing vector instructions has seemed to sometimes (optimize zipping over array iterators #115515 (comment)) make autovectorization worse

So this removes it from the codegen crates entirely, and instead just tries to use https://doc.rust-lang.org/nightly/nightly-rustc/rustc_codegen_ssa/traits/builder/trait.BuilderMethods.html#method.typed_place_copy instead of direct memcpy so things will still use load/store when a type isn't OperandValue::Ref.

rustbot · 2024-03-29T07:05:35Z

r? @fee1-dead

rustbot has assigned @fee1-dead.
They will have a look at your PR within the next two weeks and either review your PR or reassign to another reviewer.

Use r? to explicitly pick a reviewer

scottmcm · 2024-03-29T07:07:56Z

tests/codegen/array-optimized.rs

-    // CHECK: %[[TEMP:.+]] = load <2 x i8>, ptr %a, align 1
-    // CHECK: store <2 x i8> %[[TEMP]], ptr %p, align 1
+    // CHECK: %[[TEMP:.+]] = load i16, ptr %a, align 1
+    // CHECK: store i16 %[[TEMP]], ptr %p, align 1


Note that while we now generate an alloca and memcpys for this (as seen in the array-codegen file), LLVM is able to remove them.

If LLVM picks i16 for this (and not <2 x i8>), then great, let's do that.

scottmcm · 2024-03-29T07:39:10Z

@bors try @rust-timer queue

bors · 2024-03-29T07:40:20Z

⌛ Trying commit b72e5ad with merge ab08738...

Remove my `scalar_copy_backend_type` optimization attempt I added this back in rust-lang#111999 , but I no longer think it's a good idea - It had to get scaled back to only power-of-two things to not break a bunch of targets - LLVM seems to be getting better at memcpy removal anyway - Introducing vector instructions has seemed to sometimes (rust-lang#115515 (comment)) make autovectorization worse So this removes it from the codegen crates entirely, and instead just tries to use <https://doc.rust-lang.org/nightly/nightly-rustc/rustc_codegen_ssa/traits/builder/trait.BuilderMethods.html#method.typed_place_copy> instead of direct `memcpy` so things will still use load/store when a type isn't `OperandValue::Ref`.

bors · 2024-03-29T09:12:46Z

☀️ Try build successful - checks-actions
Build commit: ab08738 (ab08738b5d1c784475d9fd734165d671e5617689)

rust-timer · 2024-03-29T11:34:28Z

Finished benchmarking commit (ab08738): comparison URL.

Overall result: ❌✅ regressions and improvements - ACTION NEEDED

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please fix the regressions and do another perf run. If the next run shows neutral or positive results, the label will be automatically removed.

@bors rollup=never
@rustbot label: -S-waiting-on-perf +perf-regression

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	1.1%	[1.1%, 1.1%]	1
Improvements ✅ (primary)	-0.6%	[-0.7%, -0.6%]	4
Improvements ✅ (secondary)	-1.4%	[-1.4%, -1.4%]	1
All ❌✅ (primary)	-0.6%	[-0.7%, -0.6%]	4

Max RSS (memory usage)

This benchmark run did not return any relevant results for this metric.

Cycles

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-	-	0
Improvements ✅ (secondary)	-2.3%	[-2.7%, -1.4%]	4
All ❌✅ (primary)	-	-	0

Binary size

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-0.1%	[-0.1%, -0.0%]	13
Improvements ✅ (secondary)	-0.9%	[-1.3%, -0.0%]	36
All ❌✅ (primary)	-0.1%	[-0.1%, -0.0%]	13

Bootstrap: 667.865s -> 668.349s (0.07%)
Artifact size: 315.66 MiB -> 315.57 MiB (-0.03%)

scottmcm · 2024-03-29T12:10:05Z

One small secondary regression; everything else an improvement -- even the runtime benchmarks improved.
@rustbot label: +perf-regression-triaged

compiler-errors · 2024-03-29T15:45:19Z

compiler/rustc_codegen_ssa/src/mir/operand.rs

@@ -419,7 +418,14 @@ impl<'a, 'tcx, V: CodegenObject> OperandValue<V> {
                    bx.store_with_flags(val, dest.llval, dest.align, flags);
                    return;
                }
-                base::memcpy_ty(bx, dest.llval, dest.align, r, source_align, dest.layout, flags)
+                bx.memcpy_known_size(


🤔 why isn't this also just using typed_place_copy?

also, memcpy_known_size has one callsite (this one) -- you should inline it. i don't really see the value of exposing it if it's only being used once, seems to add more confusion to the api imo.

You know, that's a really good observation. I thought the answer was that it would be recursion -- that typed_place_copy ends up here -- but now that I look it doesn't, so I'll make that change 👍

And yup, I'll inline memcpy_known_size too. Looks like I forgot to think it if was useful after I undid a couple of other changes that didn't work out.

scottmcm · 2024-03-29T20:07:38Z

Well MemFlags made that more complicated than expected, but done. Here's the diff since your last review, CE: https://github.com/rust-lang/rust/compare/b72e5ad9062f1192dfc60645050ed04836cd2cd9..ac20c35d5562dce0530ac2972268bc807ec41190

@rustbot ready

fee1-dead · 2024-03-30T02:26:20Z

r? compiler

compiler-errors · 2024-03-30T02:27:29Z

I already started this review anyways

r? compiler-errors

bors · 2024-04-09T20:46:14Z

📌 Commit 556b47e has been approved by compiler-errors

It is now in the queue for this repository.

DianQK

LLVM seems to be getting better at memcpy removal anyway.

I think we can expect to see more significant changes after llvm/llvm-project#87190.

DianQK · 2024-04-10T01:33:27Z

tests/codegen/issues/issue-122805.rs

+// OPT3LINX64-NEXT: store <8 x i16>
+// OPT3WINX64: load <8 x i16>
+// OPT3WINX64-NEXT: call <8 x i16> @llvm.bswap
+// OPT3WINX64-NEXT: store <8 x i16>
 // CHECK-NEXT: ret void


Ah, thanks! I hadn't noticed that different optimization levels yield different optimization effects.
I believe it makes sense to differentiate between O2 and O3 here.

The changes to O2 seem fine. I'll check again on this later.

bors · 2024-04-10T06:51:09Z

⌛ Testing commit 556b47e with merge ba787ee...

…-errors Remove my `scalar_copy_backend_type` optimization attempt I added this back in rust-lang#111999 , but I no longer think it's a good idea - It had to get scaled back to only power-of-two things to not break a bunch of targets - LLVM seems to be getting better at memcpy removal anyway - Introducing vector instructions has seemed to sometimes (rust-lang#115515 (comment)) make autovectorization worse So this removes it from the codegen crates entirely, and instead just tries to use <https://doc.rust-lang.org/nightly/nightly-rustc/rustc_codegen_ssa/traits/builder/trait.BuilderMethods.html#method.typed_place_copy> instead of direct `memcpy` so things will still use load/store when a type isn't `OperandValue::Ref`.

bors · 2024-04-10T07:20:19Z

💔 Test failed - checks-actions

scottmcm · 2024-04-10T16:20:53Z

I got the compile-test directives in the test wrong 🤦 Filed #123730

@bors r=compiler-errors

bors · 2024-04-10T16:20:56Z

📌 Commit 593e900 has been approved by compiler-errors

It is now in the queue for this repository.

bors · 2024-04-10T16:32:44Z

⌛ Testing commit 593e900 with merge c2239bc...

bors · 2024-04-10T18:51:40Z

☀️ Test successful - checks-actions
Approved by: compiler-errors
Pushing c2239bc to master...

rust-timer · 2024-04-10T20:34:42Z

Finished benchmarking commit (c2239bc): comparison URL.

Overall result: ✅ improvements - no action needed

@rustbot label: -perf-regression

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-0.7%	[-0.7%, -0.7%]	4
Improvements ✅ (secondary)	-2.0%	[-2.7%, -1.2%]	2
All ❌✅ (primary)	-0.7%	[-0.7%, -0.7%]	4

Max RSS (memory usage)

This benchmark run did not return any relevant results for this metric.

Cycles

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-	-	0
Improvements ✅ (secondary)	-1.5%	[-1.5%, -1.5%]	1
All ❌✅ (primary)	-	-	0

Binary size

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-0.1%	[-0.2%, -0.0%]	14
Improvements ✅ (secondary)	-0.9%	[-1.4%, -0.0%]	36
All ❌✅ (primary)	-0.1%	[-0.2%, -0.0%]	14

Bootstrap: 675.526s -> 675.28s (-0.04%)
Artifact size: 318.49 MiB -> 318.45 MiB (-0.01%)

scottmcm · 2024-04-11T19:50:47Z

Wow, I think that might be the happiest perf has ever been with one of my PRs. Even instruction wins on the runtime benchmarks.

rustbot assigned fee1-dead Mar 29, 2024

rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Mar 29, 2024

scottmcm commented Mar 29, 2024

View reviewed changes

This comment has been minimized.

Sign in to view

rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Mar 29, 2024

This comment has been minimized.

Sign in to view

rustbot added perf-regression Performance regression. and removed S-waiting-on-perf Status: Waiting on a perf run to be completed. labels Mar 29, 2024

rustbot added the perf-regression-triaged The performance regression has been triaged. label Mar 29, 2024

compiler-errors reviewed Mar 29, 2024

View reviewed changes

scottmcm force-pushed the more-typed-copy branch from b72e5ad to 120cb65 Compare March 29, 2024 18:52

scottmcm added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Mar 29, 2024

scottmcm force-pushed the more-typed-copy branch 2 times, most recently from 3e5d267 to ac20c35 Compare March 29, 2024 20:04

rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Mar 29, 2024

rustbot assigned cjgillot and unassigned fee1-dead Mar 30, 2024

bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Apr 9, 2024

DianQK reviewed Apr 10, 2024

View reviewed changes

This comment has been minimized.

Sign in to view

bors added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. labels Apr 10, 2024

Update 122805 test for PR 123185

593e900

scottmcm mentioned this pull request Apr 10, 2024

Add a check against multiple only- directives in the same line #123730

Closed

scottmcm force-pushed the more-typed-copy branch from 556b47e to 593e900 Compare April 10, 2024 15:39

bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Apr 10, 2024

bors added the merged-by-bors This PR was explicitly merged by bors. label Apr 10, 2024

bors merged commit c2239bc into rust-lang:master Apr 10, 2024
12 checks passed

rustbot added this to the 1.79.0 milestone Apr 10, 2024

scottmcm deleted the more-typed-copy branch April 10, 2024 19:10

rustbot removed the perf-regression Performance regression. label Apr 10, 2024

scottmcm mentioned this pull request Apr 23, 2024

-C linker-plugin-lto prevents vectorization of mem::swap at -C opt-level=3 #124234

Closed

This was referenced May 4, 2024

Copy 1-/2-element arrays as scalars, not vectors #116479

Closed

Avoid memcpy in codegen for more types, notably Vec #112733

Closed

scottmcm mentioned this pull request Aug 2, 2024

missed(?) optimization with a const array of same item #107208

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove my `scalar_copy_backend_type` optimization attempt #123185

Remove my `scalar_copy_backend_type` optimization attempt #123185

scottmcm commented Mar 29, 2024

rustbot commented Mar 29, 2024

scottmcm Mar 29, 2024

scottmcm commented Mar 29, 2024

This comment has been minimized.

bors commented Mar 29, 2024

bors commented Mar 29, 2024

This comment has been minimized.

rust-timer commented Mar 29, 2024

scottmcm commented Mar 29, 2024

compiler-errors Mar 29, 2024

scottmcm Mar 29, 2024 •

edited

Loading

scottmcm commented Mar 29, 2024

fee1-dead commented Mar 30, 2024

compiler-errors commented Mar 30, 2024

bors commented Apr 9, 2024

DianQK left a comment

DianQK Apr 10, 2024

bors commented Apr 10, 2024

This comment has been minimized.

bors commented Apr 10, 2024

scottmcm commented Apr 10, 2024 •

edited

Loading

bors commented Apr 10, 2024

bors commented Apr 10, 2024

bors commented Apr 10, 2024

rust-timer commented Apr 10, 2024

scottmcm commented Apr 11, 2024

Remove my scalar_copy_backend_type optimization attempt #123185

Remove my scalar_copy_backend_type optimization attempt #123185

Conversation

scottmcm commented Mar 29, 2024

rustbot commented Mar 29, 2024

Choose a reason for hiding this comment

scottmcm commented Mar 29, 2024

This comment has been minimized.

bors commented Mar 29, 2024

bors commented Mar 29, 2024

This comment has been minimized.

rust-timer commented Mar 29, 2024

Overall result: ❌✅ regressions and improvements - ACTION NEEDED

scottmcm commented Mar 29, 2024

Choose a reason for hiding this comment

scottmcm Mar 29, 2024 • edited Loading

Choose a reason for hiding this comment

scottmcm commented Mar 29, 2024

fee1-dead commented Mar 30, 2024

compiler-errors commented Mar 30, 2024

bors commented Apr 9, 2024

DianQK left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bors commented Apr 10, 2024

This comment has been minimized.

bors commented Apr 10, 2024

scottmcm commented Apr 10, 2024 • edited Loading

bors commented Apr 10, 2024

bors commented Apr 10, 2024

bors commented Apr 10, 2024

rust-timer commented Apr 10, 2024

Overall result: ✅ improvements - no action needed

scottmcm commented Apr 11, 2024

Remove my `scalar_copy_backend_type` optimization attempt #123185

Remove my `scalar_copy_backend_type` optimization attempt #123185

scottmcm Mar 29, 2024 •

edited

Loading

scottmcm commented Apr 10, 2024 •

edited

Loading