Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generator optimization: Overlap locals that never have storage live at the same time #60187

Merged
merged 11 commits into from
Jun 12, 2019

Conversation

tmandry
Copy link
Member

@tmandry tmandry commented Apr 23, 2019

The specific goal of this optimization is to optimize async fns which use await!. Notably, await! has an enclosing scope around the futures it awaits (definition), which we rely on to implement the optimization.

More generally, the optimization allows overlapping the storage of some locals which are never storage-live at the same time. We care about storage-liveness when computing the layout, because knowing a field is StorageDead is the only way to prove it will not be accessed, either directly or through a reference.

To determine whether we can overlap two locals in the generator layout, we look at whether they might both be StorageLive at any point in the MIR. We use the MaybeStorageLive dataflow analysis for this. We iterate over every location in the MIR, and build a bitset for each local of the locals it might potentially conflict with.

Next, we assign every saved local to one or more variants. The variants correspond to suspension points, and we include the set of locals live across a given suspension point in the variant. (Note that we use liveness instead of storage-liveness here; this ensures that the local has actually been initialized in each variant it has been included in. If the local is not live across a suspension point, then it doesn't need to be included in that variant.). It's important to note that the variants are a "view" into our layout.

For the layout computation, we use a simplified approach.

  1. Start with the set of locals assigned to only one variant. The rest are disqualified.
  2. For each pair of locals which may conflict and are not assigned to the same variant, we pick one local to disqualify from overlapping.

Disqualified locals go into a non-overlapping "prefix" at the beginning of our layout. This means they always have space reserved for them. All the locals that are allowed to overlap in each variant are then laid out after this prefix, in the "overlap zone".

So, if A and B were disqualified, and X, Y, and Z were all eligible for overlap, our generator might look something like this:

You can think of a generator as an enum, where some fields are shared between variants. e.g.

enum Generator {
  Unresumed,
  Poisoned,
  Returned,
  Suspend0(A, B, X),
  Suspend1(B),
  Suspend2(A, Y, Z),
}

where every mention of A and B refer to the same field, which does not move when changing variants. Note that A and B would automatically be sent to the prefix in this example. Assuming that X is never StorageLive at the same time as either Y or Z, it would be allowed to overlap with them.

Note that if two locals (Y and Z in this case) are assigned to the same variant in our generator, their memory would never overlap in the layout. Thus they can both be eligible for the overlapping section, even if they are storage-live at the same time.


Depends on:

Before merging:

  • Wrap the types of all generator fields in MaybeUninitialized in layout::ty::field (opened Don't ignore generator fields in miri #60889)
  • Make PR description more complete (e.g. explain why storage liveness is important and why we have to check every location)
  • Clean up TODO
  • Fix the layout code to enforce that the same field never moves around in the generator
  • Add tests for async/await
  • Reduce # bits we store by half, since the conflict relation is symmetric (note: decided not to do this, for simplicity)
  • Store liveness information for each yield point in our GeneratorLayout, that way we can emit more useful debuginfo AND tell miri which fields are definitely initialized for a given variant (see discussion at Multi-variant layouts for generators #59897 (comment))

@rust-highfive
Copy link
Collaborator

r? @davidtwco

(rust_highfive has picked a reviewer for you, use r? to override)

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Apr 23, 2019
@tmandry tmandry changed the title Generator optimization: Overlap locals that never have storage live at the same time [WIP] Generator optimization: Overlap locals that never have storage live at the same time Apr 23, 2019
@Zoxc
Copy link
Contributor

Zoxc commented Apr 23, 2019

@bors try

@bors
Copy link
Contributor

bors commented Apr 23, 2019

⌛ Trying commit e27b6fd with merge faed5e2...

bors added a commit that referenced this pull request Apr 23, 2019
[WIP] Generator optimization: Overlap locals that never have storage live at the same time

This depends on #59897 (which is currently failing). Only the commits from "Emit StorageDead for all locals" and on are relevant.

The reason I'm opening this now is so we can get some perf numbers and discuss the approach.
@rust-highfive
Copy link
Collaborator

The job x86_64-gnu-llvm-6.0 of your PR failed on Travis (raw log). Through arcane magic we have determined that the following fragments from the build log may contain information about the problem.

Click to expand the log.
travis_time:end:155d63f6:start=1555982977591402820,finish=1555982978514722160,duration=923319340
$ git checkout -qf FETCH_HEAD
travis_fold:end:git.checkout

Encrypted environment variables have been removed for security reasons.
See https://docs.travis-ci.com/user/pull-requests/#pull-requests-and-security-restrictions
$ export SCCACHE_BUCKET=rust-lang-ci-sccache2
$ export SCCACHE_REGION=us-west-1
$ export GCP_CACHE_BUCKET=rust-lang-ci-cache
$ export AWS_ACCESS_KEY_ID=AKIA46X5W6CZEJZ6XT55
---

[00:03:37] travis_fold:start:tidy
travis_time:start:tidy
tidy check
[00:03:37] tidy error: /checkout/src/librustc_mir/transform/generator.rs:561: TODO is deprecated; use FIXME
[00:03:39] some tidy checks failed
[00:03:39] 
[00:03:39] 
[00:03:39] command did not execute successfully: "/checkout/obj/build/x86_64-unknown-linux-gnu/stage0-tools-bin/tidy" "/checkout/src" "/checkout/obj/build/x86_64-unknown-linux-gnu/stage0/bin/cargo" "--no-vendor" "--quiet"
[00:03:39] 
[00:03:39] 
[00:03:39] failed to run: /checkout/obj/build/bootstrap/debug/bootstrap test src/tools/tidy
[00:03:39] Build completed unsuccessfully in 0:00:45
[00:03:39] Build completed unsuccessfully in 0:00:45
[00:03:39] make: *** [tidy] Error 1
[00:03:39] Makefile:67: recipe for target 'tidy' failed
The command "stamp sh -x -c "$RUN_SCRIPT"" exited with 2.
travis_time:start:0094b6c6
$ date && (curl -fs --head https://google.com | grep ^Date: | sed 's/Date: //g' || true)
Tue Apr 23 01:33:29 UTC 2019
---
travis_time:end:15be1d78:start=1555983210069161032,finish=1555983210073859753,duration=4698721
travis_fold:end:after_failure.3
travis_fold:start:after_failure.4
travis_time:start:07c2e027
$ ln -s . checkout && for CORE in obj/cores/core.*; do EXE=$(echo $CORE | sed 's|obj/cores/core\.[0-9]*\.!checkout!\(.*\)|\1|;y|!|/|'); if [ -f "$EXE" ]; then printf travis_fold":start:crashlog\n\033[31;1m%s\033[0m\n" "$CORE"; gdb --batch -q -c "$CORE" "$EXE" -iex 'set auto-load off' -iex 'dir src/' -iex 'set sysroot .' -ex bt -ex q; echo travis_fold":"end:crashlog; fi; done || true
travis_fold:end:after_failure.4
travis_fold:start:after_failure.5
travis_time:start:11effaa0
travis_time:start:11effaa0
$ cat ./obj/build/x86_64-unknown-linux-gnu/native/asan/build/lib/asan/clang_rt.asan-dynamic-i386.vers || true
cat: ./obj/build/x86_64-unknown-linux-gnu/native/asan/build/lib/asan/clang_rt.asan-dynamic-i386.vers: No such file or directory
travis_fold:end:after_failure.5
travis_fold:start:after_failure.6
travis_time:start:078d6cef
$ dmesg | grep -i kill

I'm a bot! I can only do what humans tell me to, so if this was not helpful or you have suggestions for improvements, please ping or otherwise contact @TimNN. (Feature Requests)

@bors
Copy link
Contributor

bors commented Apr 23, 2019

☀️ Try build successful - checks-travis
Build commit: faed5e2

@Zoxc
Copy link
Contributor

Zoxc commented Apr 23, 2019

@rust-timer build faed5e2

@rust-timer
Copy link
Collaborator

Success: Queued faed5e2 with parent 004c549, comparison URL.

@rust-timer
Copy link
Collaborator

Finished benchmarking try commit faed5e2

@tmandry
Copy link
Member Author

tmandry commented Apr 23, 2019

Benchmarking results do not look good. Next I'm trying an approach that doesn't create more basic blocks in the MIR. Instead it propagates StorageDeads forward along the cleanup path, which means we may hit some extra StorageDeads along the way. This shouldn't cause any problems, but I will have to update the analysis pass of the optimization if we keep this.

If the benchmarks still do not look good, I'll try a smarter approach of assembling the StorageDead statements than putting them in a vec and reversing them. At this point I suspect the larger performance hit is due to new basic blocks, but I'm not sure.

@rust-highfive
Copy link
Collaborator

The job x86_64-gnu-llvm-6.0 of your PR failed on Travis (raw log). Through arcane magic we have determined that the following fragments from the build log may contain information about the problem.

Click to expand the log.
travis_time:end:20ec75c0:start=1556049886727429768,finish=1556049981616995594,duration=94889565826
$ git checkout -qf FETCH_HEAD
travis_fold:end:git.checkout

Encrypted environment variables have been removed for security reasons.
See https://docs.travis-ci.com/user/pull-requests/#pull-requests-and-security-restrictions
$ export SCCACHE_BUCKET=rust-lang-ci-sccache2
$ export SCCACHE_REGION=us-west-1
$ export GCP_CACHE_BUCKET=rust-lang-ci-cache
$ export AWS_ACCESS_KEY_ID=AKIA46X5W6CZEJZ6XT55
---

[00:03:52] travis_fold:start:tidy
travis_time:start:tidy
tidy check
[00:03:52] tidy error: /checkout/src/librustc_mir/transform/generator.rs:561: TODO is deprecated; use FIXME
[00:03:54] some tidy checks failed
[00:03:54] 
[00:03:54] 
[00:03:54] command did not execute successfully: "/checkout/obj/build/x86_64-unknown-linux-gnu/stage0-tools-bin/tidy" "/checkout/src" "/checkout/obj/build/x86_64-unknown-linux-gnu/stage0/bin/cargo" "--no-vendor" "--quiet"
[00:03:54] 
[00:03:54] 
[00:03:54] failed to run: /checkout/obj/build/bootstrap/debug/bootstrap test src/tools/tidy
[00:03:54] Build completed unsuccessfully in 0:00:44
[00:03:54] Build completed unsuccessfully in 0:00:44
[00:03:54] Makefile:67: recipe for target 'tidy' failed
[00:03:54] make: *** [tidy] Error 1
The command "stamp sh -x -c "$RUN_SCRIPT"" exited with 2.
travis_time:start:02451030
$ date && (curl -fs --head https://google.com | grep ^Date: | sed 's/Date: //g' || true)
Tue Apr 23 20:10:25 UTC 2019
---
travis_time:end:00da78d6:start=1556050226138966140,finish=1556050226150654242,duration=11688102
travis_fold:end:after_failure.3
travis_fold:start:after_failure.4
travis_time:start:25678716
$ ln -s . checkout && for CORE in obj/cores/core.*; do EXE=$(echo $CORE | sed 's|obj/cores/core\.[0-9]*\.!checkout!\(.*\)|\1|;y|!|/|'); if [ -f "$EXE" ]; then printf travis_fold":start:crashlog\n\033[31;1m%s\033[0m\n" "$CORE"; gdb --batch -q -c "$CORE" "$EXE" -iex 'set auto-load off' -iex 'dir src/' -iex 'set sysroot .' -ex bt -ex q; echo travis_fold":"end:crashlog; fi; done || true
travis_fold:end:after_failure.4
travis_fold:start:after_failure.5
travis_time:start:00be7c32
travis_time:start:00be7c32
$ cat ./obj/build/x86_64-unknown-linux-gnu/native/asan/build/lib/asan/clang_rt.asan-dynamic-i386.vers || true
cat: ./obj/build/x86_64-unknown-linux-gnu/native/asan/build/lib/asan/clang_rt.asan-dynamic-i386.vers: No such file or directory
travis_fold:end:after_failure.5
travis_fold:start:after_failure.6
travis_time:start:15f7112a
$ dmesg | grep -i kill

I'm a bot! I can only do what humans tell me to, so if this was not helpful or you have suggestions for improvements, please ping or otherwise contact @TimNN. (Feature Requests)

@cramertj
Copy link
Member

@bors try

@bors
Copy link
Contributor

bors commented Apr 23, 2019

⌛ Trying commit 8bbe3449442738b1a545953a5bcf2a3ec20cdbd4 with merge 32434c7ac565769649a967e525da3d6dabfaca0a...

@bors
Copy link
Contributor

bors commented Apr 23, 2019

☀️ Try build successful - checks-travis
Build commit: 32434c7ac565769649a967e525da3d6dabfaca0a

@cramertj
Copy link
Member

cramertj commented Apr 23, 2019

@rust-timer build 32434c7ac565769649a967e525da3d6dabfaca0a

@rust-timer
Copy link
Collaborator

Success: Queued 32434c7ac565769649a967e525da3d6dabfaca0a with parent 4eff852, comparison URL.

@rust-lang rust-lang deleted a comment from rust-timer Apr 23, 2019
@rust-lang rust-lang deleted a comment from rust-timer Apr 23, 2019
@rust-timer
Copy link
Collaborator

Finished benchmarking try commit 32434c7ac565769649a967e525da3d6dabfaca0a

@Zoxc
Copy link
Contributor

Zoxc commented Apr 24, 2019

@tmandry Only changing StorageDead generation for generators should get rid of the performance regressions.

@tmandry
Copy link
Member Author

tmandry commented Apr 24, 2019

@tmandry Only changing StorageDead generation for generators should get rid of the performance regressions.

@Zoxc yep, that's what I'll fall back to, if needed!

@rust-timer
Copy link
Collaborator

Finished benchmarking try commit 32434c7ac565769649a967e525da3d6dabfaca0a

@tmandry tmandry force-pushed the generator-optimization branch 2 times, most recently from 7294f02 to dcd5da8 Compare April 25, 2019 01:40
@rust-highfive
Copy link
Collaborator

The job x86_64-gnu-llvm-6.0 of your PR failed on Travis (raw log). Through arcane magic we have determined that the following fragments from the build log may contain information about the problem.

Click to expand the log.
travis_time:end:1f485cf0:start=1556156481484382630,finish=1556156482401242044,duration=916859414
$ git checkout -qf FETCH_HEAD
travis_fold:end:git.checkout

Encrypted environment variables have been removed for security reasons.
See https://docs.travis-ci.com/user/pull-requests/#pull-requests-and-security-restrictions
$ export SCCACHE_BUCKET=rust-lang-ci-sccache2
$ export SCCACHE_REGION=us-west-1
$ export GCP_CACHE_BUCKET=rust-lang-ci-cache
$ export AWS_ACCESS_KEY_ID=AKIA46X5W6CZEJZ6XT55
---

[00:04:11] travis_fold:start:tidy
travis_time:start:tidy
tidy check
[00:04:11] tidy error: /checkout/src/librustc_mir/transform/generator.rs:561: TODO is deprecated; use FIXME
[00:04:13] some tidy checks failed
[00:04:13] 
[00:04:13] 
[00:04:13] command did not execute successfully: "/checkout/obj/build/x86_64-unknown-linux-gnu/stage0-tools-bin/tidy" "/checkout/src" "/checkout/obj/build/x86_64-unknown-linux-gnu/stage0/bin/cargo" "--no-vendor" "--quiet"
[00:04:13] 
[00:04:13] 
[00:04:13] failed to run: /checkout/obj/build/bootstrap/debug/bootstrap test src/tools/tidy
[00:04:13] Build completed unsuccessfully in 0:00:45
[00:04:13] Build completed unsuccessfully in 0:00:45
[00:04:13] make: *** [tidy] Error 1
[00:04:13] Makefile:67: recipe for target 'tidy' failed
The command "stamp sh -x -c "$RUN_SCRIPT"" exited with 2.
travis_time:start:201bfb6e
$ date && (curl -fs --head https://google.com | grep ^Date: | sed 's/Date: //g' || true)
Thu Apr 25 01:45:47 UTC 2019
---
travis_time:end:02be76d9:start=1556156747911994846,finish=1556156747916916594,duration=4921748
travis_fold:end:after_failure.3
travis_fold:start:after_failure.4
travis_time:start:0e908098
$ ln -s . checkout && for CORE in obj/cores/core.*; do EXE=$(echo $CORE | sed 's|obj/cores/core\.[0-9]*\.!checkout!\(.*\)|\1|;y|!|/|'); if [ -f "$EXE" ]; then printf travis_fold":start:crashlog\n\033[31;1m%s\033[0m\n" "$CORE"; gdb --batch -q -c "$CORE" "$EXE" -iex 'set auto-load off' -iex 'dir src/' -iex 'set sysroot .' -ex bt -ex q; echo travis_fold":"end:crashlog; fi; done || true
travis_fold:end:after_failure.4
travis_fold:start:after_failure.5
travis_time:start:084779d0
travis_time:start:084779d0
$ cat ./obj/build/x86_64-unknown-linux-gnu/native/asan/build/lib/asan/clang_rt.asan-dynamic-i386.vers || true
cat: ./obj/build/x86_64-unknown-linux-gnu/native/asan/build/lib/asan/clang_rt.asan-dynamic-i386.vers: No such file or directory
travis_fold:end:after_failure.5
travis_fold:start:after_failure.6
travis_time:start:0f7238cd
$ dmesg | grep -i kill

I'm a bot! I can only do what humans tell me to, so if this was not helpful or you have suggestions for improvements, please ping or otherwise contact @TimNN. (Feature Requests)

@bors
Copy link
Contributor

bors commented Apr 25, 2019

☔ The latest upstream changes (presumably #60250) made this pull request unmergeable. Please resolve the merge conflicts.

@bors
Copy link
Contributor

bors commented Jun 11, 2019

📌 Commit aeefbc4 has been approved by eddyb

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jun 11, 2019
@bors
Copy link
Contributor

bors commented Jun 12, 2019

⌛ Testing commit aeefbc4 with merge 05083c2...

bors added a commit that referenced this pull request Jun 12, 2019
Generator optimization: Overlap locals that never have storage live at the same time

The specific goal of this optimization is to optimize async fns which use `await!`. Notably, `await!` has an enclosing scope around the futures it awaits ([definition](https://github.com/rust-lang/rust/blob/08bfe16129b0621bc90184f8704523d4929695ef/src/libstd/macros.rs#L365-L381)), which we rely on to implement the optimization.

More generally, the optimization allows overlapping the storage of some locals which are never storage-live at the same time. **We care about storage-liveness when computing the layout, because knowing a field is `StorageDead` is the only way to prove it will not be accessed, either directly or through a reference.**

To determine whether we can overlap two locals in the generator layout, we look at whether they might *both* be `StorageLive` at any point in the MIR. We use the `MaybeStorageLive` dataflow analysis for this. We iterate over every location in the MIR, and build a bitset for each local of the locals it might potentially conflict with.

Next, we assign every saved local to one or more variants. The variants correspond to suspension points, and we include the set of locals live across a given suspension point in the variant. (Note that we use liveness instead of storage-liveness here; this ensures that the local has actually been initialized in each variant it has been included in. If the local is not live across a suspension point, then it doesn't need to be included in that variant.). It's important to note that the variants are a "view" into our layout.

For the layout computation, we use a simplified approach.

1. Start with the set of locals assigned to only one variant. The rest are disqualified.
2. For each pair of locals which may conflict *and are not assigned to the same variant*, we pick one local to disqualify from overlapping.

Disqualified locals go into a non-overlapping "prefix" at the beginning of our layout. This means they always have space reserved for them. All the locals that are allowed to overlap in each variant are then laid out after this prefix, in the "overlap zone".

So, if A and B were disqualified, and X, Y, and Z were all eligible for overlap, our generator might look something like this:

You can think of a generator as an enum, where some fields are shared between variants. e.g.

```rust
enum Generator {
  Unresumed,
  Poisoned,
  Returned,
  Suspend0(A, B, X),
  Suspend1(B),
  Suspend2(A, Y, Z),
}
```

where every mention of `A` and `B` refer to the same field, which does not move when changing variants. Note that `A` and `B` would automatically be sent to the prefix in this example. Assuming that `X` is never `StorageLive` at the same time as either `Y` or `Z`, it would be allowed to overlap with them.

Note that if two locals (`Y` and `Z` in this case) are assigned to the same variant in our generator, their memory would never overlap in the layout. Thus they can both be eligible for the overlapping section, even if they are storage-live at the same time.

---

Depends on:
- [x] #59897 Multi-variant layouts for generators
- [x] #60840 Preserve local scopes in generator MIR
- [x] #61373 Emit StorageDead along unwind paths for generators

Before merging:

- [x] ~Wrap the types of all generator fields in `MaybeUninitialized` in layout::ty::field~ (opened #60889)
- [x] Make PR description more complete (e.g. explain why storage liveness is important and why we have to check every location)
- [x] Clean up TODO
- [x] Fix the layout code to enforce that the same field never moves around in the generator
- [x] Add tests for async/await
- [x] ~Reduce # bits we store by half, since the conflict relation is symmetric~ (note: decided not to do this, for simplicity)
- [x] Store liveness information for each yield point in our `GeneratorLayout`, that way we can emit more useful debuginfo AND tell miri which fields are definitely initialized for a given variant (see discussion at #59897 (comment))
Centril added a commit to Centril/rust that referenced this pull request Jun 12, 2019
…ddyb

Generator optimization: Overlap locals that never have storage live at the same time

The specific goal of this optimization is to optimize async fns which use `await!`. Notably, `await!` has an enclosing scope around the futures it awaits ([definition](https://github.com/rust-lang/rust/blob/08bfe16129b0621bc90184f8704523d4929695ef/src/libstd/macros.rs#L365-L381)), which we rely on to implement the optimization.

More generally, the optimization allows overlapping the storage of some locals which are never storage-live at the same time. **We care about storage-liveness when computing the layout, because knowing a field is `StorageDead` is the only way to prove it will not be accessed, either directly or through a reference.**

To determine whether we can overlap two locals in the generator layout, we look at whether they might *both* be `StorageLive` at any point in the MIR. We use the `MaybeStorageLive` dataflow analysis for this. We iterate over every location in the MIR, and build a bitset for each local of the locals it might potentially conflict with.

Next, we assign every saved local to one or more variants. The variants correspond to suspension points, and we include the set of locals live across a given suspension point in the variant. (Note that we use liveness instead of storage-liveness here; this ensures that the local has actually been initialized in each variant it has been included in. If the local is not live across a suspension point, then it doesn't need to be included in that variant.). It's important to note that the variants are a "view" into our layout.

For the layout computation, we use a simplified approach.

1. Start with the set of locals assigned to only one variant. The rest are disqualified.
2. For each pair of locals which may conflict *and are not assigned to the same variant*, we pick one local to disqualify from overlapping.

Disqualified locals go into a non-overlapping "prefix" at the beginning of our layout. This means they always have space reserved for them. All the locals that are allowed to overlap in each variant are then laid out after this prefix, in the "overlap zone".

So, if A and B were disqualified, and X, Y, and Z were all eligible for overlap, our generator might look something like this:

You can think of a generator as an enum, where some fields are shared between variants. e.g.

```rust
enum Generator {
  Unresumed,
  Poisoned,
  Returned,
  Suspend0(A, B, X),
  Suspend1(B),
  Suspend2(A, Y, Z),
}
```

where every mention of `A` and `B` refer to the same field, which does not move when changing variants. Note that `A` and `B` would automatically be sent to the prefix in this example. Assuming that `X` is never `StorageLive` at the same time as either `Y` or `Z`, it would be allowed to overlap with them.

Note that if two locals (`Y` and `Z` in this case) are assigned to the same variant in our generator, their memory would never overlap in the layout. Thus they can both be eligible for the overlapping section, even if they are storage-live at the same time.

---

Depends on:
- [x] rust-lang#59897 Multi-variant layouts for generators
- [x] rust-lang#60840 Preserve local scopes in generator MIR
- [x] rust-lang#61373 Emit StorageDead along unwind paths for generators

Before merging:

- [x] ~Wrap the types of all generator fields in `MaybeUninitialized` in layout::ty::field~ (opened rust-lang#60889)
- [x] Make PR description more complete (e.g. explain why storage liveness is important and why we have to check every location)
- [x] Clean up TODO
- [x] Fix the layout code to enforce that the same field never moves around in the generator
- [x] Add tests for async/await
- [x] ~Reduce # bits we store by half, since the conflict relation is symmetric~ (note: decided not to do this, for simplicity)
- [x] Store liveness information for each yield point in our `GeneratorLayout`, that way we can emit more useful debuginfo AND tell miri which fields are definitely initialized for a given variant (see discussion at rust-lang#59897 (comment))
@bors
Copy link
Contributor

bors commented Jun 12, 2019

☀️ Test successful - checks-travis, status-appveyor
Approved by: eddyb
Pushing 05083c2 to master...

@bors bors added the merged-by-bors This PR was explicitly merged by bors. label Jun 12, 2019
bors added a commit that referenced this pull request Jun 12, 2019
Rollup of 9 pull requests

Successful merges:

 - #60187 (Generator optimization: Overlap locals that never have storage live at the same time)
 - #61348 (Implement Clone::clone_from for Option and Result)
 - #61568 (Use Symbol, Span in libfmt_macros)
 - #61632 (ci: Collect CPU usage statistics on Azure)
 - #61654 (use pattern matching for slices destructuring)
 - #61671 (implement nth_back for Range(Inclusive))
 - #61688 (is_fp and is_floating_point do the same thing, remove the former)
 - #61705 (Pass cflags rather than cxxflags to LLVM as CMAKE_C_FLAGS)
 - #61734 (Migrate rust-by-example to MdBook2)

Failed merges:

r? @ghost
@bors bors merged commit aeefbc4 into rust-lang:master Jun 12, 2019
@tmandry tmandry deleted the generator-optimization branch June 14, 2019 00:47
bors added a commit that referenced this pull request Jun 26, 2019
rustc: correctly transform memory_index mappings for generators.

Fixes #61793, closes #62011 (previous attempt at fixing #61793).

During #60187, I made the mistake of suggesting that the (re-)computation of `memory_index` in `ty::layout`, after generator-specific logic split/recombined fields, be done off of the `offsets` of those fields (which needed to be computed anyway), as opposed to the `memory_index`.

`memory_index` maps each field to its in-memory order index, which ranges over the same `0..n` values as the fields themselves, making it a bijective mapping, and more specifically a permutation (indeed, it's the permutation resulting from field reordering optimizations).

Each field has an unique "memory index", meaning a sort based on them, even an unstable one, will not put them in the wrong order. But offsets don't have that property, because of ZSTs (which do not increase the offset), so sorting based on the offset of fields alone can (and did) result in wrong orders.

Instead of going back to sorting based on (slices/subsets of) `memory_index`, or special-casing ZSTs to make sorting based on offsets produce the right results (presumably), as #62011 does, I opted to drop sorting altogether and focus on `O(n)` operations involving *permutations*:
* a permutation is easily inverted (see the `invert_mapping` `fn`)
  * an `inverse_memory_index` was already employed in other parts of the `ty::layout` code (that is, a mapping from memory order to field indices)
  * inverting twice produces the original permutation, so you can invert, modify, and invert again, if it's easier to modify the inverse mapping than the direct one
* you can modify/remove elements in a permutation, as long as the result remains dense (i.e. using every integer in `0..len`, without gaps)
  * for splitting a `0..n` permutation into disjoint `0..x` and `x..n` ranges, you can pick the elements based on a `i < x` / `i >= x` predicate, and for the latter, also subtract `x` to compact the range to `0..n-x`
  * in the general case, for taking an arbitrary subset of the permutation, you need a renumbering from that subset to a dense `0..subset.len()` - but notably, this is still `O(n)`!
* you can merge permutations, as long as the result remains disjoint (i.e. each element is unique)
  * for concatenating two `0..n` and `0..m` permutations, you can renumber the elements in the latter to `n..n+m`
* some of these operations can be combined, and an inverse mapping (be it a permutation or not) can still be used instead of a forward one by changing the "domain" of the loop performing the operation

I wish I had a nicer / more mathematical description of the recombinations involved, but my focus was to fix the bug (in a way which preserves information more directly than sorting would), so I may have missed potential changes in the surrounding generator layout code, that would make this all more straight-forward.

r? @tmandry
tmandry added a commit to tmandry/rust that referenced this pull request Aug 6, 2019
I tested the generator optimizations in rust-lang#60187 and rust-lang#61922 on the Fuchsia
build, and noticed that some small generators (about 8% of the async fns
in our build) increased in size slightly.

This is because in rust-lang#60187 we split the fields into two groups, a
"prefix" non-overlap region and an overlap region, and lay them out
separately. This can introduce unnecessary padding bytes between the two
groups.

In every single case in the Fuchsia build, it was due to there being
only a single variant being used in the overlap region. This means that
we aren't doing any overlapping, period. So it's better to combine the
two regions into one and lay out all the fields at once, which is what
this change does.
Centril added a commit to Centril/rust that referenced this pull request Aug 6, 2019
…ssions, r=cramertj

Fix generator size regressions due to optimization

I tested the generator optimizations in rust-lang#60187 and rust-lang#61922 on the Fuchsia
build, and noticed that some small generators (about 8% of the async fns
in our build) increased in size slightly.

This is because in rust-lang#60187 we split the fields into two groups, a
"prefix" non-overlap region and an overlap region, and lay them out
separately. This can introduce unnecessary padding bytes between the two
groups.

In every single case in the Fuchsia build, it was due to there being
only a single variant being used in the overlap region. This means that
we aren't doing any overlapping, period. So it's better to combine the
two regions into one and lay out all the fields at once, which is what
this change does.

r? @cramertj
cc @eddyb @Zoxc
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
merged-by-bors This PR was explicitly merged by bors. S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants