Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Default to -Z plt=yes #106380

Closed
wants to merge 1 commit into from
Closed

Default to -Z plt=yes #106380

wants to merge 1 commit into from

Conversation

MaskRay
Copy link
Contributor

@MaskRay MaskRay commented Jan 2, 2023

6009da0 defaulted to -Z plt=no (like
clang -fno-plt) which a not a useful default[1].

On x86-64, if the target symbol is preemptible, there is an
R_X86_64_GLOB_DAT relocation, and the (very minor) optimization works as
intended. However, if the target is non-preemptible, i.e. the target is
resolved to the same component, this is actually a pessimization due to
the longer instruction.

On RISC architectures, there is typically no single instruction which
can load a GOT entry and perform an indirect call. -fno-plt has a longer
code sequence. For example, AArch64 needs 3 instructions:

adrp    x0, _GLOBAL_OFFSET_TABLE_
ldr     x0, [x0, #:gotpage_lo15:bar]
br      x0

This does not end up with a serious code size issue, because LLVM
"RtLibUseGOT" is not implemented for non-x86 targets.

On x86-32, very new lld[2] (2022-12-31) is needed to support
general-dynamic/local-dynamic TLS models.

-Z plt=no is not an appropriate default, so just default to true for
all targets.

[1] https://maskray.me/blog/2021-09-19-all-about-procedure-linkage-table#fno-plt
[2] llvm/llvm-project@8dc7366

@rustbot
Copy link
Collaborator

rustbot commented Jan 2, 2023

r? @oli-obk

(rustbot has picked a reviewer for you, use r? to override)

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Jan 2, 2023
@rustbot
Copy link
Collaborator

rustbot commented Jan 2, 2023

These commits modify compiler targets.
(See the Target Tier Policy.)

@MaskRay
Copy link
Contributor Author

MaskRay commented Jan 2, 2023

@GabrielMajeri FYI

@GabrielMajeri
Copy link
Contributor

For the record, I'll leave a link to the original pull request which enabled -fno-plt by default: #54592. There's more discussion there, and a few synthetic benchmarks.

As for this:

However, if the target is non-preemptible, i.e. the target is
resolved to the same component, this is actually a pessimization due to
the longer instruction.

I'm afraid I'm not familiar enough with CPU (micro)architectures to counter this claim. On x86-64, at the very least, the -fno-plt optimization seemed to do more good than harm, and it was enabled by default for all arches due to an implicit assumption that it would help them as well (although, again, I haven't benchmarked the changes for anything except x86).

@rust-log-analyzer

This comment has been minimized.

@nikic
Copy link
Contributor

nikic commented Jan 2, 2023

@bors try @rust-timer queue

@rust-timer

This comment has been minimized.

@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Jan 2, 2023
@bors
Copy link
Contributor

bors commented Jan 2, 2023

⌛ Trying commit 5a3791f76dde2490094ebeb8977614a55a72fc25 with merge 5f4f27a0a98c60c9218675e35d2cc60c47546394...

@rust-log-analyzer

This comment has been minimized.

@MaskRay MaskRay force-pushed the needs_plt branch 3 times, most recently from ad22866 to 043b307 Compare January 3, 2023 00:59
@rust-log-analyzer

This comment has been minimized.

@tmiasko
Copy link
Contributor

tmiasko commented Jan 3, 2023

@rust-timer build 5f4f27a0a98c60c9218675e35d2cc60c47546394

@rust-timer

This comment has been minimized.

@rust-timer
Copy link
Collaborator

Finished benchmarking commit (5f4f27a0a98c60c9218675e35d2cc60c47546394): comparison URL.

Overall result: ❌✅ regressions and improvements - ACTION NEEDED

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please fix the regressions and do another perf run. If the next run shows neutral or positive results, the label will be automatically removed.

@bors rollup=never
@rustbot label: -S-waiting-on-perf +perf-regression

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
0.4% [0.2%, 1.0%] 28
Regressions ❌
(secondary)
0.4% [0.2%, 0.5%] 24
Improvements ✅
(primary)
-1.1% [-2.3%, -0.4%] 34
Improvements ✅
(secondary)
-1.2% [-2.0%, -0.6%] 11
All ❌✅ (primary) -0.4% [-2.3%, 1.0%] 62

Max RSS (memory usage)

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
-1.2% [-2.3%, -0.1%] 2
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) -1.2% [-2.3%, -0.1%] 2

Cycles

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
1.9% [0.9%, 2.8%] 15
Regressions ❌
(secondary)
2.2% [1.5%, 3.5%] 10
Improvements ✅
(primary)
-1.7% [-2.5%, -1.1%] 13
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) 0.2% [-2.5%, 2.8%] 28

@rustbot rustbot added perf-regression Performance regression. and removed S-waiting-on-perf Status: Waiting on a perf run to be completed. labels Jan 3, 2023
@krasimirgg
Copy link
Contributor

@rustbot label: +llvm-main

@rustbot rustbot added the llvm-main Marks PRs that are making Rust work with LLVM main (this label is consumed by CI tooling) label Jan 5, 2023
@nikic
Copy link
Contributor

nikic commented Jan 5, 2023

@krasimirgg Is that because llvm/llvm-project@2679e8b broke the Rust build? If so, consider this PR blocked on that change getting reverted. We can evaluate changing our defaults, but it's not okay to break LLVM and force us to change defaults because of that.

6009da0 defaulted to `-Z plt=no` (like
`clang -fno-plt`) which a not a useful default[1].

On x86-64, if the target symbol is preemptible, there is an
`R_X86_64_GLOB_DAT` relocation, and the (very minor) optimization works as
intended. However, if the target is non-preemptible, i.e. the target is
resolved to the same component, this is actually a pessimization due to
the longer instruction.

On RISC architectures, there is typically no single instruction which
can load a GOT entry and perform an indirect call. `-fno-plt` has a longer
code quence. For example, AArch64 needs 3 instructions:

    adrp    x0, _GLOBAL_OFFSET_TABLE_
    ldr     x0, [x0, #:gotpage_lo15:bar]
    br      x0

This does not end up with a serious code size issue, because LLVM
"RtLibUseGOT" is not implemented for non-x86 targets.

On x86-32, very new lld[2] (2022-12-31) is needed to support
general-dynamic/local-dynamic TLS models.

`-Z plt=no` is not an appropriate default, so just default to true for
all targets.

[1] https://maskray.me/blog/2021-09-19-all-about-procedure-linkage-table#fno-plt
[2] llvm/llvm-project@8dc7366
@MaskRay
Copy link
Contributor Author

MaskRay commented Jan 11, 2023

Rebase

@nikic nikic added the I-compiler-nominated Nominated for discussion during a compiler team meeting. label Jan 21, 2023
@nikic
Copy link
Contributor

nikic commented Jan 21, 2023

Nominating for discussion at the next compiler meeting. The proposal here is to change the current -Z plt=no default to -Z plt=yes (ignoring some details about relro levels and target-specific overrides here). The blog post https://maskray.me/blog/2021-09-19-all-about-procedure-linkage-table has some background information on what GOT and PLT are.

In short, the basic idea is that if you call a function from a different shared object (say from libc.so), what actually gets called is a PLT stub, which then looks up the function to call in the GOT and calls it. With -Z plt=no, we instead directly look up the function in the GOT and call it. On x86-64, this can be done in a single instruction, which is one byte longer than the call to the PLT stub. This means we trade off one less level of call indirection for one byte more code size. (The PLT stub can also be used to perform lazy binding, but this is irrelevant in the context of this discussion.)

If the called function is in the same shared object / executable, it will get resolved to a direct call. In this case, the longer -Z plt=no encoding is a pure code size loss.

On non-x86 targets, the encoding for a GOT call rather than PLT call may be substantially larger. Rather than going from 5 to 6 bytes, it may go from 4 to 12 bytes (I guess that would be the case for the AArch64 example).

Relevant results from the perf run above are:

  • cycles: About 2% regressions on doc builds, i.e. generated code becomes 2% slower, for this workload. About 2% improvement on debug builds, which indicates that generating PLT binaries is faster.
  • binary size: Improvements in the 1% range.

I believe these are the relevant facts. My personal recommendation based on my understanding of the tradeoffs here would be to default to -Z plt=no for x86-64 only and use -Z plt=yes for all other targets. The current default was only ever performance evaluated for x86-64 (see #54592 for the original change), and the tradeoffs are clearly very different on other targets. However, for x86-64 there do seem to be clear performance benefits in typical usage scenarios, so I'm not convinced by the proposal to use -Z plt=yes for all targets, including x86-64.

I'd like some broader input from T-compiler on what to do here. Also cc @nagisa who reviewed the original PR.

Finally, I think whatever we do here, it probably makes sense to stabilize -C plt, so people can override this.

@pnkfelix
Copy link
Member

I have filed an MCP for the plan regarding the change in defaults outlined by @nikic; that is rust-lang/compiler-team#581

(It probably would be good to expose control of the PLT via a proper -C flag as @nikic suggests, but I opted not to fold that into rust-lang/compiler-team#581.)

@apiraino apiraino removed the I-compiler-nominated Nominated for discussion during a compiler team meeting. label Jan 26, 2023
@bors
Copy link
Contributor

bors commented Feb 21, 2023

☔ The latest upstream changes (presumably #108301) made this pull request unmergeable. Please resolve the merge conflicts.

@durin42
Copy link
Contributor

durin42 commented Feb 21, 2023

@rustbot label: -llvm-main

@rustbot rustbot removed the llvm-main Marks PRs that are making Rust work with LLVM main (this label is consumed by CI tooling) label Feb 21, 2023
@workingjubilee workingjubilee added the A-cli Area: Command line interface to the compiler. label Mar 5, 2023
@apiraino
Copy link
Contributor

apiraino commented Mar 9, 2023

MCP has been approved compiler-team#581

@pnkfelix
Copy link
Member

@MaskRay can you revise this PR to reflect the refinement described in rust-lang/compiler-team#581 (namely, that the default should be PLT=yes for everything except x86_64 ?)

@rustbot label: -S-waiting-on-review +S-waiting-on-author

@rustbot rustbot added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Mar 23, 2023
@MaskRay
Copy link
Contributor Author

MaskRay commented Mar 26, 2023

@MaskRay can you revise this PR to reflect the refinement described in rust-lang/compiler-team#581 (namely, that the default should be PLT=yes for everything except x86_64 ?)

@rustbot label: -S-waiting-on-review +S-waiting-on-author

I disagree with keeping -Z plt=no for x86-64. We should just use -Z plt=yes for all architectures.

@nekopsykose
Copy link

I disagree with keeping -Z plt=no for x86-64. We should just use -Z plt=yes for all architectures.

for the sake of nonperfection, could you make the change anyway?

i fear otherwise this change would just get lost in a cyclic argument, because it is correct from both arguments sides in some capacity (on x86 specifically, in the most common case of no special user-optimised static linking overrides (few people are doing this in rust to my knowledge), plt=no is generally a small gain (the rustc-itself benchmarks), by everything established above). i feel like everyone here has already discussed this to the full capacity possible, and there is no new information to really take in on the subject.

but it would be nice to at least indeed change it everywhere else (as acked via MCP above), and people then would get a real gain.

@Dylan-DPC
Copy link
Member

@MaskRay any updates on this?

durin42 added a commit to durin42/rust that referenced this pull request Jun 22, 2023
Per the discussion in rust-lang#106380 plt=no isn't a great default, and
rust-lang/compiler-team#581 decided that the default should be PLT=yes
for everything except x86_64. Not everyone agrees about the x86_64 part
of this change, but this at least is an improvement in the state of
things without changing the x86_64 situation, so I've attempted making
this change in the name of not letting the perfect be the enemy of the
good.
bors added a commit to rust-lang-ci/rust that referenced this pull request Jun 23, 2023
rustc_session: default to -Z plt=yes on non-x86_64

Per the discussion in rust-lang#106380 plt=no isn't a great default, and rust-lang/compiler-team#581 decided that the default should be PLT=yes for everything except x86_64. Not everyone agrees about the x86_64 part of this change, but this at least is an improvement in the state of things without changing the x86_64 situation, so I've attempted making this change in the name of not letting the perfect be the enemy of the good.

Please let me know if I've messed this up somehow - I'm not wholly confident I got this right.

r? `@nikic`
@nikic
Copy link
Contributor

nikic commented Jun 27, 2023

Superseded by #109982.

@nikic nikic closed this Jun 27, 2023
@MaskRay MaskRay deleted the needs_plt branch August 27, 2023 22:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-cli Area: Command line interface to the compiler. perf-regression Performance regression. S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.