Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Morph layout_raw query into layout_of. #88308

Merged
merged 1 commit into from
Aug 26, 2021
Merged

Conversation

eddyb
Copy link
Member

@eddyb eddyb commented Aug 24, 2021

Before this PR, LayoutCx::layout_of wrapped the layout_raw query, to:

  • normalize the type, before attempting to compute the layout
  • pass the layout to record_layout_for_printing, for -Zprint-type-sizes

Moving those two responsibilities into the query may reduce overhead (due to cached calls skipping those steps), but I want to do a perf run to know.

One of the changes I had to make was changing the return type of the query, to be able to both get out the type produced by normalizing inside the query and to match the signature of the old TyCtxt::layout_of. This change may be worse, perf-wise, so that's another reason I want to check.

r? @nagisa cc @oli-obk

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Aug 24, 2021
@eddyb
Copy link
Member Author

eddyb commented Aug 24, 2021

@bors try @rust-timer queue

@rust-timer
Copy link
Collaborator

Awaiting bors try build completion.

@rustbot label: +S-waiting-on-perf

@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Aug 24, 2021
@bors
Copy link
Contributor

bors commented Aug 24, 2021

⌛ Trying commit edb4b2d with merge ef5a2f51e873d28ef516ea93637b800575586660...

@bors
Copy link
Contributor

bors commented Aug 24, 2021

☀️ Try build successful - checks-actions
Build commit: ef5a2f51e873d28ef516ea93637b800575586660 (ef5a2f51e873d28ef516ea93637b800575586660)

@rust-timer
Copy link
Collaborator

Queued ef5a2f51e873d28ef516ea93637b800575586660 with parent b5fe3bc, future comparison URL.

@inquisitivecrystal inquisitivecrystal added the T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. label Aug 24, 2021
@rust-timer
Copy link
Collaborator

Finished benchmarking try commit (ef5a2f51e873d28ef516ea93637b800575586660): comparison url.

Summary: This change led to very large relevant improvements 🎉 in compiler performance.

  • Very large improvement in instruction counts (up to -11.6% on incr-full builds of ctfe-stress-4)

If you disagree with this performance assessment, please file an issue in rust-lang/rustc-perf.

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR led to changes in compiler perf.

@bors rollup=never
@rustbot label: +S-waiting-on-review -S-waiting-on-perf -perf-regression

@rustbot rustbot removed the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Aug 25, 2021
@eddyb eddyb marked this pull request as ready for review August 25, 2021 06:22
fn layout_of(&self, ty: Ty<'tcx>) -> Self::TyAndLayout {
let param_env = self.param_env.with_reveal_all_normalized(self.tcx);
let ty = self.tcx.normalize_erasing_regions(param_env, ty);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the reasoning behind having this outside the query is that it increases the amount of cache hits.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but you're doing 2 cache hits (normalize + layout) instead of 1 (just layout).

This PR only penalizes the initial cache miss (layout gets cached for both normalized and unnormalized versions of the type), but benefits all further hits (since it reduces them to a single lookup).

@nagisa
Copy link
Member

nagisa commented Aug 26, 2021

@bors r+

@bors
Copy link
Contributor

bors commented Aug 26, 2021

📌 Commit edb4b2d has been approved by nagisa

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Aug 26, 2021
@bors
Copy link
Contributor

bors commented Aug 26, 2021

⌛ Testing commit edb4b2d with merge 4b9f4b2...

@bors
Copy link
Contributor

bors commented Aug 26, 2021

☀️ Test successful - checks-actions
Approved by: nagisa
Pushing 4b9f4b2 to master...

@bors bors added the merged-by-bors This PR was explicitly merged by bors. label Aug 26, 2021
@bors bors merged commit 4b9f4b2 into rust-lang:master Aug 26, 2021
@rustbot rustbot added this to the 1.56.0 milestone Aug 26, 2021
@eddyb eddyb deleted the cooked-layouts branch August 26, 2021 18:32
@the8472
Copy link
Member

the8472 commented Sep 17, 2021

While the instruction counts look good this has regressed the wall times of await-call-tree, deeply-nested-closures and deeply-nested-async by up to 30%

The timeline graphs show that this isn't noise.

They're synthetic benchmarks so this might be acceptable.

@the8472 the8472 added the perf-regression Performance regression. label Sep 17, 2021
@nagisa
Copy link
Member

nagisa commented Sep 18, 2021

Both instructions and cycles are significantly down, which would mean that the perf machines were running at a low frequency for the test or that there are significantly more stalls or we're getting de-scheduled more often (due to e.g. holding a lock). Sadly we do not track context switches in perf :(

@eddyb
Copy link
Member Author

eddyb commented Sep 18, 2021

(EDIT: @nagisa posted a more helpful comment after I had written mine, below, but before posting it, keep that in mind)

Eugh this is probably that memory-latency-shaped gap in our performance tracking, I'm guessing the fewer instructions now have to wait for more cache misses or the like. Though 30% is a lot.

Wait, but a lot of these are... -check? What is going on, that makes no sense, I'm only touching an API that is only used by codegen and miri (for CTFE).

Looking at the query data for await-call-tree-check/incr-full, there are 70 layout_of executions, but they're seeming unrelated to the regression, which is in some other (arbitrary) part of the code.

Looking at the diff again, the only cascade effect I could think of would be the size of the HashMap holding the results of the query, and maybe that allocation affecting other allocations, but that kind of thing tends to lose itself in noise most of the time.

@erikdesjardins
Copy link
Contributor

I believe the wall-time regression was caused by the CPU governor on the perf machine, as mentioned here.

For more details see #83698 (comment), a (big) comment on the PR that was merged (and perf-tested) immediately before this one, and thus shared in the blame for this regression.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
merged-by-bors This PR was explicitly merged by bors. perf-regression Performance regression. S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants