Lazify more work in builtins targets #122703

Urgau · 2024-03-18T18:19:12Z

This PR make some fields in Target and TargetOptions lazy.

This is done in order to reduce the impact of check-cfg which needs to load all the built-ins targets but doesn't need all the fields.

In particular this PR introduce a MaybeLazy<T> struct to support a borrowed, owned and lazy state.

~~The fields that are changed are:~~

~~LinkArgs fields (access to env + allocate): 54 023 236 ins¹ -> 53 478 072 ins~~
~~CrtObjects fields (allocate): 53 478 072 ins -> 53 127 832 ins~~
~~llvm_name field (access to env + allocate): 53 127 832 ins -> 53 057 110 ins~~

~~This brings around -1 000 000 instructions and -0.2/0.3ms of less wall-time (with some measurement even having the same minimum wall-time with or without check-cfg) 🎉.~~

See the latest perf run for the actual improvements, #122703 (comment).

This PR is also a step further to completely static-fying all the build-in targets, if we one day we decide to do it, but that's a separate conversion.

r? @petrochenkov

This is the total number of instructions as measured with cachegrind for the entire rustc invocation on a cargo --lib hello world, variance was really low (<15 000 ins).
In the scale of the check-cfg feature, last time I measured CheckCfg::fill_well_known was around ~2.8 millions ins. ↩

rustbot · 2024-03-18T18:19:21Z

These commits modify compiler targets.
(See the Target Tier Policy.)

Noratrieb · 2024-03-19T08:32:52Z

0.2ms is not a lot, even if you amortize it across hundreds of rustc invocations, and this does make the code more complicated. Is this really worth it?

Urgau · 2024-03-19T09:36:07Z

Actually, now that I remeasured it, the difference is big than I previously though. With stage1 the minimum wall-time measured is now the same, while before with nightly (so way more aggressive optimization) the minimum was never below 0.4ms; therefore this PR has the potential to nearly eliminate the cost of loading all the well known cfgs.

So yes, I definitively think it's worth it.

Note that the measurement were done on a laptop where the noise is non-negligible, that's why I have 50 rounds of warmup, so the wall-time has to be taken with some precaution; but what's clear even with the noise is the clear decrease in wall-time.

$ hyperfine --warmup 50 "rustc +stage1 --crate-type=lib src/lib.rs" "rustc +stage1 --crate-type=lib --check-cfg='cfg()' -Zunstable-options src/lib.rs"
Benchmark 1: rustc +stage1 --crate-type=lib src/lib.rs
  Time (mean ± σ):      20.3 ms ±   1.5 ms    [User: 11.3 ms, System: 8.0 ms]
  Range (min … max):    17.9 ms …  24.9 ms    132 runs

Benchmark 2: rustc +stage1 --crate-type=lib --check-cfg='cfg()' -Zunstable-options src/lib.rs
  Time (mean ± σ):      20.7 ms ±   1.7 ms    [User: 12.7 ms, System: 6.9 ms]
  Range (min … max):    17.9 ms …  25.5 ms    133 runs

Summary
  rustc +stage1 --crate-type=lib src/lib.rs ran
    1.02 ± 0.11 times faster than rustc +stage1 --crate-type=lib --check-cfg='cfg()' -Zunstable-options src/lib.rs

$ hyperfine --warmup 50 "rustc +nightly --crate-type=lib src/lib.rs" "rustc +nightly --crate-type=lib --check-cfg='cfg()' -Zunstable-options src/lib.rs"
Benchmark 1: rustc +nightly --crate-type=lib src/lib.rs
  Time (mean ± σ):      17.8 ms ±   1.3 ms    [User: 8.2 ms, System: 8.6 ms]
  Range (min … max):    15.8 ms …  26.5 ms    159 runs

Benchmark 2: rustc +nightly --crate-type=lib --check-cfg='cfg()' -Zunstable-options src/lib.rs
  Time (mean ± σ):      18.3 ms ±   1.3 ms    [User: 9.1 ms, System: 8.0 ms]
  Range (min … max):    16.3 ms …  25.5 ms    148 runs

Summary
  rustc +nightly --crate-type=lib src/lib.rs ran
    1.03 ± 0.10 times faster than rustc +nightly --crate-type=lib --check-cfg='cfg()' -Zunstable-options src/lib.rs

Regarding the code complexity, the vast majority of it comes from adapting (ie adding MaybeLazy::lazy calls) to the fields in each and every targets, while there are no code changes outside rustc_target. The code complexity increase is IMO limited and contained.

Adding new targets shouldn't be more difficult than before.

petrochenkov · 2024-03-19T14:20:10Z

I dislike this for the same reason as #122207, but at much larger scale.
I'd also prefer to not do this, unless cfg checking is enabled by default, blocking on rust-lang/cargo#13571.
Maybe be we'll have to do this, but it won't be a good day for target specs.
@rustbot blocked

Urgau · 2024-03-19T14:41:35Z

I dislike this for the same reason as #122207, but at much larger scale.

I understand why you don't like #122207, I also don't like it much, I mainly proposed that one for discussion and perf test.

However I think this PR is quite different, it goes into the direction of static target definitions, which IMO is sufficient in it self, even if it doesn't bring any perf improvements. @petrochenkov Could you elaborate on why you don't like this PR?

EDIT: To elaborate a bit more, I always found it weird that the target specifications were not immutable, for the most part they are, except for some Apple shenanigans; and if they were and const-eval was a bit more permissive we could have them in a simple static which IMO would be cleaner and as a bonus would be better for check-cfg.

EDIT2: Yes, I realized I'm going into the static discussions, even through I said I wouldn't; but I really think this goes into a good direction by permitting more data to be static.

lqd · 2024-05-06T13:16:46Z

@bors try @rust-timer queue

Lazify more work in builtins targets This PR make some fields in `Target` and `TargetOptions` lazifyable. This is done in order to reduce the impact of check-cfg which needs to load all the built-ins targets but doesn't need all the fields. In paticular this PR introduce a `MaybeLazy<T>` struct to support a borrowed, owned and lazy state. The fields that are changed are: - `LinkArgs` fields (access to env + allocate): ~54 023 236 ins[^1] -> ~53 478 072 ins - `CrtObjects` fields (allocate): ~53 478 072 ins -> ~53 127 832 ins - `llvm_name` field (access to env + allocate): ~53 127 832 ins -> ~53 057 110 ins This brings around -1 000 000 ins*tructions* and -0.2/0.3ms of less wall-time (with some measurement even having the same minimum wall-time with or without check-cfg) 🎉. *This PR is also a step further to completely `static`-fying all the build-in targets, if we one day we decide to do it, but that's a separate conversion.* [^1]: This is the total number of instructions as measured with `cachegrind` for the entire `rustc` invocation on a cargo --lib hello world, variance was really low (<15 000 ins). In the scale of the check-cfg feature, last time I measured `CheckCfg::fill_well_known` was around ~2.8 millions ins. r? `@petrochenkov`

bors · 2024-05-06T13:17:57Z

⌛ Trying commit 2a57448 with merge c3871b5...

bors · 2024-05-06T14:55:12Z

☀️ Try build successful - checks-actions
Build commit: c3871b5 (c3871b59880d404e451097b32bf6f68bcbd565f1)

rust-timer · 2024-05-06T23:40:35Z

Finished benchmarking commit (c3871b5): comparison URL.

Overall result: ❌✅ regressions and improvements - ACTION NEEDED

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please fix the regressions and do another perf run. If the next run shows neutral or positive results, the label will be automatically removed.

@bors rollup=never
@rustbot label: -S-waiting-on-perf +perf-regression

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	0.7%	[0.2%, 1.4%]	4
Improvements ✅ (primary)	-1.3%	[-2.2%, -0.3%]	4
Improvements ✅ (secondary)	-0.8%	[-1.9%, -0.2%]	18
All ❌✅ (primary)	-1.3%	[-2.2%, -0.3%]	4

Max RSS (memory usage)

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	2.1%	[2.1%, 2.1%]	1
Regressions ❌ (secondary)	6.8%	[5.5%, 8.0%]	3
Improvements ✅ (primary)	-	-	0
Improvements ✅ (secondary)	-5.1%	[-5.1%, -5.1%]	1
All ❌✅ (primary)	2.1%	[2.1%, 2.1%]	1

Cycles

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	2.8%	[2.4%, 3.4%]	3
Improvements ✅ (primary)	-	-	0
Improvements ✅ (secondary)	-2.2%	[-2.2%, -2.2%]	1
All ❌✅ (primary)	-	-	0

Binary size

This benchmark run did not return any relevant results for this metric.

Bootstrap: 676.654s -> 674.986s (-0.25%)
Artifact size: 315.91 MiB -> 315.95 MiB (0.01%)

Urgau · 2024-05-12T15:36:13Z

rust-lang/cargo#13571 has been merged, the PR has been rebased and the latest perf run shows some clear improvements across multiple benchmarks 🎉.

@rustbot labels -S-blocked +S-waiting-on-review

petrochenkov · 2024-05-28T17:23:14Z

@Urgau
To clarify the status, the default cfg checking is now in both stable and bootstrap rustc and cargo, and is not going to be reverted due to issues like #124800 and similar, right?

Could you give a link to the PR with perf regressions that first enables cfg checking in a way observable to perf checking on merge?

bors · 2024-06-18T15:05:05Z

⌛ Trying commit d7ef5e6 with merge d013d1a...

Lazify more work in builtins targets This PR make some fields in `Target` and `TargetOptions` lazy. This is done in order to reduce the impact of check-cfg which needs to load all the built-ins targets but doesn't need all the fields. In particular this PR introduce a `MaybeLazy<T>` struct to support a borrowed, owned and lazy state. ~~The fields that are changed are:~~ - ~~`LinkArgs` fields (access to env + allocate): 54 023 236 ins[^1] -> 53 478 072 ins~~ - ~~`CrtObjects` fields (allocate): 53 478 072 ins -> 53 127 832 ins~~ - ~~`llvm_name` field (access to env + allocate): 53 127 832 ins -> 53 057 110 ins~~ ~~This brings around -1 000 000 ins*tructions* and -0.2/0.3ms of less wall-time (with some measurement even having the same minimum wall-time with or without check-cfg) 🎉.~~ See the latest perf run for the actual improvements, rust-lang#122703 (comment). *This PR is also a step further to completely `static`-fying all the build-in targets, if we one day we decide to do it, but that's a separate conversion.* [^1]: This is the total number of instructions as measured with `cachegrind` for the entire `rustc` invocation on a cargo --lib hello world, variance was really low (<15 000 ins). In the scale of the check-cfg feature, last time I measured `CheckCfg::fill_well_known` was around ~2.8 millions ins. r? `@petrochenkov`

bors · 2024-06-18T16:43:26Z

☀️ Try build successful - checks-actions
Build commit: d013d1a (d013d1af6909ef48ddac409375e4f0746655c5ad)

Urgau · 2024-06-18T21:47:19Z

Standard LazyLock already supports stateful fn objects. I was able to avoid all MaybeLazy::lazy calls for CrtObjects this way - 2eefb88.

Indeed. This has now been done for CRT objects and link args. (pushed a separate commits, for easier review and maintenance for me)

The latest (and most accurate) perf run doesn't show as much perf improvement as the first one, I tried to improve more but it didn't lead anywhere.

(it's worth noting that they are not directly comparable given that they are more than a month apart - maybe worth redoing a perf run without the latest changes?)

@rustbot ready

compiler/rustc_target/src/spec/maybe_lazy.rs

compiler/rustc_target/src/spec/mod.rs

compiler/rustc_target/src/spec/base/aix.rs

petrochenkov · 2024-06-19T11:23:23Z

Let's maybe split these changes and try to benchmark and land them separately.
MaybeLazy<str> first, then CRT objects, then link args.
@rustbot author

by adding `TargetOptions::link_args_list` to handle multi link args

Handle Apple targets

Lazify `Target::llvm_target` field This PR lazify the `Target::llvm_target` field by introducing `MaybeLazy`, a 3-way lazy container (borrowed, owned and lazied state). Split from rust-lang#122703 r? `@petrochenkov`

Lazify CRT objects fields initilization This PR lazify the CRT objects by introducing `MaybeLazy`, a 3-way lazy container (borrowed, owned and lazied state). Split from rust-lang#122703 r? `@petrochenkov`

bors · 2024-06-27T08:16:43Z

☔ The latest upstream changes (presumably #126907) made this pull request unmergeable. Please resolve the merge conflicts.

Lazify `TargetOptions::*link_args` fields This PR lazify the link args by introducing `MaybeLazy`, a 3-way lazy container (borrowed, owned and lazied state). Split from rust-lang#122703 r? `@petrochenkov`

petrochenkov · 2024-07-23T12:11:04Z

#127992 is closed so closing this as well.

rustbot assigned petrochenkov Mar 18, 2024

rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Mar 18, 2024

This comment has been minimized.

Sign in to view

rustbot added S-blocked Status: Marked as blocked ❌ on something else such as an RFC or other implementation work. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Mar 19, 2024

This comment was marked as resolved.

Sign in to view

Urgau force-pushed the lazy-targets branch from c1412c2 to 2a57448 Compare May 6, 2024 11:46

rustbot added the O-macos Operating system: macOS label May 6, 2024

This comment has been minimized.

Sign in to view

rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label May 6, 2024

This comment has been minimized.

Sign in to view

rustbot added perf-regression Performance regression. and removed S-waiting-on-perf Status: Waiting on a perf run to be completed. labels May 6, 2024

Urgau mentioned this pull request May 7, 2024

Update cargo #124684

Merged

Urgau force-pushed the lazy-targets branch from 2a57448 to 565f328 Compare May 12, 2024 14:35

rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-blocked Status: Marked as blocked ❌ on something else such as an RFC or other implementation work. labels May 12, 2024

This comment has been minimized.

Sign in to view

This comment was marked as outdated.

Sign in to view

rustbot removed the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Jun 18, 2024

Urgau force-pushed the lazy-targets branch from d7ef5e6 to 3f906c3 Compare June 18, 2024 21:11

rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Jun 18, 2024

petrochenkov reviewed Jun 19, 2024

View reviewed changes

rustbot added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jun 19, 2024

Urgau added 6 commits June 19, 2024 19:25

MaybeLazy: Permit non function-pointer as lazy input

1973e9d

Move CRT Objects to stateful MaybeLazy

1dffc91

Make TargetOptions::link_args stateful lazy - part 1

b75347b

Make TargetOptions::link_args stateful lazy - part 2

aed211a

by adding `TargetOptions::link_args_list` to handle multi link args

Make TargetOptions::link_args stateful lazy - part 3

e240466

Handle Apple targets

Add missing #[inline] to TargetOptions::link_args

95df404

Urgau force-pushed the lazy-targets branch from 3f906c3 to 95df404 Compare June 19, 2024 17:25

Urgau mentioned this pull request Jun 19, 2024

Lazify Target::llvm_target field #126702

Closed

Urgau mentioned this pull request Jun 23, 2024

Lazify CRT objects fields initilization #126860

Closed

Urgau mentioned this pull request Jul 19, 2024

Lazify TargetOptions::*link_args fields #127992

Closed

petrochenkov closed this Jul 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Lazify more work in builtins targets #122703

Lazify more work in builtins targets #122703

Urgau commented Mar 18, 2024 •

edited

Loading

rustbot commented Mar 18, 2024

This comment has been minimized.

Noratrieb commented Mar 19, 2024

Urgau commented Mar 19, 2024 •

edited

Loading

petrochenkov commented Mar 19, 2024

Urgau commented Mar 19, 2024 •

edited

Loading

This comment was marked as resolved.

lqd commented May 6, 2024

This comment has been minimized.

bors commented May 6, 2024

bors commented May 6, 2024

This comment has been minimized.

rust-timer commented May 6, 2024

Urgau commented May 12, 2024

petrochenkov commented May 28, 2024 •

edited

Loading

bors commented Jun 18, 2024

bors commented Jun 18, 2024

This comment has been minimized.

This comment was marked as outdated.

Urgau commented Jun 18, 2024

petrochenkov commented Jun 19, 2024

bors commented Jun 27, 2024

petrochenkov commented Jul 23, 2024

Lazify more work in builtins targets #122703

Lazify more work in builtins targets #122703

Conversation

Urgau commented Mar 18, 2024 • edited Loading

Footnotes

rustbot commented Mar 18, 2024

This comment has been minimized.

Noratrieb commented Mar 19, 2024

Urgau commented Mar 19, 2024 • edited Loading

petrochenkov commented Mar 19, 2024

Urgau commented Mar 19, 2024 • edited Loading

This comment was marked as resolved.

lqd commented May 6, 2024

This comment has been minimized.

bors commented May 6, 2024

bors commented May 6, 2024

This comment has been minimized.

rust-timer commented May 6, 2024

Overall result: ❌✅ regressions and improvements - ACTION NEEDED

Instruction count

Max RSS (memory usage)

Cycles

Binary size

Urgau commented May 12, 2024

petrochenkov commented May 28, 2024 • edited Loading

bors commented Jun 18, 2024

bors commented Jun 18, 2024

This comment has been minimized.

This comment was marked as outdated.

Urgau commented Jun 18, 2024

petrochenkov commented Jun 19, 2024

bors commented Jun 27, 2024

petrochenkov commented Jul 23, 2024

Urgau commented Mar 18, 2024 •

edited

Loading

Urgau commented Mar 19, 2024 •

edited

Loading

Urgau commented Mar 19, 2024 •

edited

Loading

petrochenkov commented May 28, 2024 •

edited

Loading