Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Only break critical edges where actually needed #33544

Merged
merged 1 commit into from
May 14, 2016

Conversation

dotdash
Copy link
Contributor

@dotdash dotdash commented May 10, 2016

Currently, to prepare for MIR trans, we break all critical edges,
although we only actually need to do this for edges originating from a
call that gets translated to an invoke instruction in LLVM.

This has the unfortunate effect of undoing a bunch of the things that
SimplifyCfg has done. A particularly bad case arises when you have a
C-like enum with N variants and a derived PartialEq implementation.

In that case, the match on the (&lhs, &rhs) tuple gets translated into
nested matches with N arms each and a basic block each, resulting in N²
basic blocks. SimplifyCfg reduces that to roughly 2*N basic blocks, but
breaking the critical edges means that we go back to N².

In nickel.rs, there is such an enum with roughly N=800. So we get about
640K basic blocks or 2.5M lines of LLVM IR. LLVM takes a while to
reduce that to the final "disr_a == disr_b".

So before this patch, we had 2.5M lines of IR with 640K basic blocks,
which took about about 3.6s in LLVM to get optimized and translated.
After this patch, we get about 650K lines with about 1.6K basic blocks
and spent a little less than 0.2s in LLVM.

cc #33111

r? @Aatch

let term_span = term.span;
let term_scope = term.scope;
let succs = term.successors_mut();
// if succs.len() > 1 || (succs.len() > 0 && is_invoke) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just delete this line?

Copy link
Contributor Author

@dotdash dotdash May 10, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because I have feelings for this line... Nah, actually I'm just bad at double checking my stuff.

@dotdash dotdash force-pushed the baby_dont_break_me_no_more branch from b16904b to 3a475e4 Compare May 10, 2016 19:41
// Returns true if the terminator is a call that would use an invoke in LLVM.
fn term_is_invoke(term: &Terminator) -> bool {
match term.kind {
TerminatorKind::Call { cleanup: Some(_), .. } |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why does this always return false?

@dotdash dotdash force-pushed the baby_dont_break_me_no_more branch 2 times, most recently from 2cd4ca9 to 6e04944 Compare May 10, 2016 20:34
@Aatch
Copy link
Contributor

Aatch commented May 10, 2016

r=me if travis passes.

@dotdash
Copy link
Contributor Author

dotdash commented May 11, 2016

@bors r=Aatch

@bors
Copy link
Contributor

bors commented May 11, 2016

📌 Commit 6e04944 has been approved by Aatch

@bors
Copy link
Contributor

bors commented May 11, 2016

☔ The latest upstream changes (presumably #33425) made this pull request unmergeable. Please resolve the merge conflicts.

Currently, to prepare for MIR trans, we break _all_ critical edges,
although we only actually need to do this for edges originating from a
call that gets translated to an invoke instruction in LLVM.

This has the unfortunate effect of undoing a bunch of the things that
SimplifyCfg has done. A particularly bad case arises when you have a
C-like enum with N variants and a derived PartialEq implementation.

In that case, the match on the (&lhs, &rhs) tuple gets translated into
nested matches with N arms each and a basic block each, resulting in N²
basic blocks. SimplifyCfg reduces that to roughly 2*N basic blocks, but
breaking the critical edges means that we go back to N².

In nickel.rs, there is such an enum with roughly N=800. So we get about
640K basic blocks or 2.5M lines of LLVM IR. LLVM takes a while to
reduce that to the final "disr_a == disr_b".

So before this patch, we had 2.5M lines of IR with 640K basic blocks,
which took about about 3.6s in LLVM to get optimized and translated.
After this patch, we get about 650K lines with about 1.6K basic blocks
and spent a little less than 0.2s in LLVM.

cc rust-lang#33111
@dotdash dotdash force-pushed the baby_dont_break_me_no_more branch from 6e04944 to 00f6513 Compare May 11, 2016 16:40
@dotdash
Copy link
Contributor Author

dotdash commented May 11, 2016

@bors r=Aatch

@bors
Copy link
Contributor

bors commented May 11, 2016

📌 Commit 00f6513 has been approved by Aatch

dotdash added a commit to dotdash/rust that referenced this pull request May 11, 2016
Currently, all switches in MIR are exhausitive, meaning that we can have
a lot of arms that all go to the same basic block, the extreme case
being an if-let expression which results in just 2 possible cases, be
might end up with hundreds of arms for large enums.

To improve this situation and give LLVM less code to chew on, we can
detect whether there's a pre-dominant target basic block in a switch
and then promote this to be the default target, not translating the
corresponding arms at all.

In combination with rust-lang#33544 this makes unoptimized MIR trans of
nickel.rs as fast as using old trans and greatly improves the times for
optimized builds, which are only 30-40% slower instead of ~300%.

cc rust-lang#33111
Manishearth added a commit to Manishearth/rust that referenced this pull request May 12, 2016
… r=Aatch

Only break critical edges where actually needed

Currently, to prepare for MIR trans, we break _all_ critical edges,
although we only actually need to do this for edges originating from a
call that gets translated to an invoke instruction in LLVM.

This has the unfortunate effect of undoing a bunch of the things that
SimplifyCfg has done. A particularly bad case arises when you have a
C-like enum with N variants and a derived PartialEq implementation.

In that case, the match on the (&lhs, &rhs) tuple gets translated into
nested matches with N arms each and a basic block each, resulting in N²
basic blocks. SimplifyCfg reduces that to roughly 2*N basic blocks, but
breaking the critical edges means that we go back to N².

In nickel.rs, there is such an enum with roughly N=800. So we get about
640K basic blocks or 2.5M lines of LLVM IR. LLVM takes a while to
reduce that to the final "disr_a == disr_b".

So before this patch, we had 2.5M lines of IR with 640K basic blocks,
which took about about 3.6s in LLVM to get optimized and translated.
After this patch, we get about 650K lines with about 1.6K basic blocks
and spent a little less than 0.2s in LLVM.

cc rust-lang#33111

r? @Aatch
Manishearth added a commit to Manishearth/rust that referenced this pull request May 12, 2016
[MIR trans] Optimize trans for biased switches

Currently, all switches in MIR are exhausitive, meaning that we can have
a lot of arms that all go to the same basic block, the extreme case
being an if-let expression which results in just 2 possible cases, be
might end up with hundreds of arms for large enums.

To improve this situation and give LLVM less code to chew on, we can
detect whether there's a pre-dominant target basic block in a switch
and then promote this to be the default target, not translating the
corresponding arms at all.

In combination with rust-lang#33544 this makes unoptimized MIR trans of
nickel.rs as fast as using old trans and greatly improves the times for
optimized builds, which are only 30-40% slower instead of ~300%.

cc rust-lang#33111
eddyb added a commit to eddyb/rust that referenced this pull request May 12, 2016
… r=Aatch

Only break critical edges where actually needed

Currently, to prepare for MIR trans, we break _all_ critical edges,
although we only actually need to do this for edges originating from a
call that gets translated to an invoke instruction in LLVM.

This has the unfortunate effect of undoing a bunch of the things that
SimplifyCfg has done. A particularly bad case arises when you have a
C-like enum with N variants and a derived PartialEq implementation.

In that case, the match on the (&lhs, &rhs) tuple gets translated into
nested matches with N arms each and a basic block each, resulting in N²
basic blocks. SimplifyCfg reduces that to roughly 2*N basic blocks, but
breaking the critical edges means that we go back to N².

In nickel.rs, there is such an enum with roughly N=800. So we get about
640K basic blocks or 2.5M lines of LLVM IR. LLVM takes a while to
reduce that to the final "disr_a == disr_b".

So before this patch, we had 2.5M lines of IR with 640K basic blocks,
which took about about 3.6s in LLVM to get optimized and translated.
After this patch, we get about 650K lines with about 1.6K basic blocks
and spent a little less than 0.2s in LLVM.

cc rust-lang#33111

r? @Aatch
eddyb added a commit to eddyb/rust that referenced this pull request May 12, 2016
[MIR trans] Optimize trans for biased switches

Currently, all switches in MIR are exhausitive, meaning that we can have
a lot of arms that all go to the same basic block, the extreme case
being an if-let expression which results in just 2 possible cases, be
might end up with hundreds of arms for large enums.

To improve this situation and give LLVM less code to chew on, we can
detect whether there's a pre-dominant target basic block in a switch
and then promote this to be the default target, not translating the
corresponding arms at all.

In combination with rust-lang#33544 this makes unoptimized MIR trans of
nickel.rs as fast as using old trans and greatly improves the times for
optimized builds, which are only 30-40% slower instead of ~300%.

cc rust-lang#33111
bors added a commit that referenced this pull request May 12, 2016
eddyb added a commit to eddyb/rust that referenced this pull request May 13, 2016
… r=Aatch

Only break critical edges where actually needed

Currently, to prepare for MIR trans, we break _all_ critical edges,
although we only actually need to do this for edges originating from a
call that gets translated to an invoke instruction in LLVM.

This has the unfortunate effect of undoing a bunch of the things that
SimplifyCfg has done. A particularly bad case arises when you have a
C-like enum with N variants and a derived PartialEq implementation.

In that case, the match on the (&lhs, &rhs) tuple gets translated into
nested matches with N arms each and a basic block each, resulting in N²
basic blocks. SimplifyCfg reduces that to roughly 2*N basic blocks, but
breaking the critical edges means that we go back to N².

In nickel.rs, there is such an enum with roughly N=800. So we get about
640K basic blocks or 2.5M lines of LLVM IR. LLVM takes a while to
reduce that to the final "disr_a == disr_b".

So before this patch, we had 2.5M lines of IR with 640K basic blocks,
which took about about 3.6s in LLVM to get optimized and translated.
After this patch, we get about 650K lines with about 1.6K basic blocks
and spent a little less than 0.2s in LLVM.

cc rust-lang#33111

r? @Aatch
eddyb added a commit to eddyb/rust that referenced this pull request May 13, 2016
[MIR trans] Optimize trans for biased switches

Currently, all switches in MIR are exhausitive, meaning that we can have
a lot of arms that all go to the same basic block, the extreme case
being an if-let expression which results in just 2 possible cases, be
might end up with hundreds of arms for large enums.

To improve this situation and give LLVM less code to chew on, we can
detect whether there's a pre-dominant target basic block in a switch
and then promote this to be the default target, not translating the
corresponding arms at all.

In combination with rust-lang#33544 this makes unoptimized MIR trans of
nickel.rs as fast as using old trans and greatly improves the times for
optimized builds, which are only 30-40% slower instead of ~300%.

cc rust-lang#33111
@bors
Copy link
Contributor

bors commented May 14, 2016

⌛ Testing commit 00f6513 with merge ab08af1...

Manishearth added a commit to Manishearth/rust that referenced this pull request May 14, 2016
… r=Aatch

Only break critical edges where actually needed

Currently, to prepare for MIR trans, we break _all_ critical edges,
although we only actually need to do this for edges originating from a
call that gets translated to an invoke instruction in LLVM.

This has the unfortunate effect of undoing a bunch of the things that
SimplifyCfg has done. A particularly bad case arises when you have a
C-like enum with N variants and a derived PartialEq implementation.

In that case, the match on the (&lhs, &rhs) tuple gets translated into
nested matches with N arms each and a basic block each, resulting in N²
basic blocks. SimplifyCfg reduces that to roughly 2*N basic blocks, but
breaking the critical edges means that we go back to N².

In nickel.rs, there is such an enum with roughly N=800. So we get about
640K basic blocks or 2.5M lines of LLVM IR. LLVM takes a while to
reduce that to the final "disr_a == disr_b".

So before this patch, we had 2.5M lines of IR with 640K basic blocks,
which took about about 3.6s in LLVM to get optimized and translated.
After this patch, we get about 650K lines with about 1.6K basic blocks
and spent a little less than 0.2s in LLVM.

cc rust-lang#33111

r? @Aatch
Manishearth added a commit to Manishearth/rust that referenced this pull request May 14, 2016
[MIR trans] Optimize trans for biased switches

Currently, all switches in MIR are exhausitive, meaning that we can have
a lot of arms that all go to the same basic block, the extreme case
being an if-let expression which results in just 2 possible cases, be
might end up with hundreds of arms for large enums.

To improve this situation and give LLVM less code to chew on, we can
detect whether there's a pre-dominant target basic block in a switch
and then promote this to be the default target, not translating the
corresponding arms at all.

In combination with rust-lang#33544 this makes unoptimized MIR trans of
nickel.rs as fast as using old trans and greatly improves the times for
optimized builds, which are only 30-40% slower instead of ~300%.

cc rust-lang#33111
@bors
Copy link
Contributor

bors commented May 14, 2016

💔 Test failed - auto-mac-64-opt-rustbuild

bors added a commit that referenced this pull request May 14, 2016
Rollup of 9 pull requests

- Successful merges: #33544, #33552, #33554, #33555, #33560, #33566, #33572, #33574, #33576
- Failed merges:
@bors bors merged commit 00f6513 into rust-lang:master May 14, 2016
@dotdash dotdash deleted the baby_dont_break_me_no_more branch May 17, 2016 03:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants