Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

--remap-path-prefix behavior regression resulting in bad paths in 1.50.0 #82074

Closed
bobbobbio opened this issue Feb 13, 2021 · 6 comments · Fixed by #82102
Closed

--remap-path-prefix behavior regression resulting in bad paths in 1.50.0 #82074

bobbobbio opened this issue Feb 13, 2021 · 6 comments · Fixed by #82102
Assignees
Labels
A-debuginfo Area: Debugging information in compiled programs (DWARF, PDB, etc.) A-reproducibility Area: Reproducible / Deterministic builds C-bug Category: This is a bug. P-high High priority regression-from-stable-to-stable Performance or correctness regression from one stable version to another. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@bobbobbio
Copy link

On Rust 1.50.0 --remap-path-prefix is behaving weirdly

On 1.49.0 it works as expected and gives a path relative to what is passed

$ rustc --version                                                                                                  
rustc 1.49.0 (e1884a8e3 2020-12-29)                                                                                                        
$ rustc -g main.rs -o bar/main --remap-path-prefix='/home/remi/foo2/='                                             
$ llvm-dwarfdump bar/main | grep DW_AT_decl_file | grep 'main.rs'                                                  
                  DW_AT_decl_file       ("main.rs")

On 1.50.0 it does not work as expected, and gives a path to a non-existent file

$ rustc --version
rustc 1.50.0 (cb75ad5db 2021-02-10)
$ rustc -g main.rs -o bar/main --remap-path-prefix='/home/remi/foo2/='                                             
$ llvm-dwarfdump bar/main | grep DW_AT_decl_file | grep 'main.rs'                                                  
                  DW_AT_decl_file       ("bar/main.rs")

In terms of prioritization of this issue, I'd like to make a plea for the importance of it, because at first glance it seems not a huge deal but it causes us huge problems.

We need this --remap-path-prefix to work so that the build artifacts are reproducible from machine to machine and for our distributed build cache to work successfully, and we also need the debugging-symbols paths to be correct so debuggers work correctly.

@bobbobbio bobbobbio added the C-bug Category: This is a bug. label Feb 13, 2021
@jonas-schievink jonas-schievink added A-debuginfo Area: Debugging information in compiled programs (DWARF, PDB, etc.) A-reproducibility Area: Reproducible / Deterministic builds regression-from-stable-to-stable Performance or correctness regression from one stable version to another. labels Feb 13, 2021
@rustbot rustbot added the I-prioritize Issue: Indicates that prioritization has been requested for this issue. label Feb 13, 2021
@jyn514
Copy link
Member

jyn514 commented Feb 13, 2021

It took me a second to realize at first - the bug is that bar/ shouldn't be added; bar/ is an output path, not an input, and shouldn't be used for debug symbols.

@nagisa nagisa added the T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. label Feb 13, 2021
@nagisa
Copy link
Member

nagisa commented Feb 14, 2021

Bisection says the issue is in 2ba7ca2 which seems very plausible.

cc @davidtwco

@nagisa nagisa self-assigned this Feb 14, 2021
@nagisa
Copy link
Member

nagisa commented Feb 14, 2021

So, after further investigation I can see exactly how this issue started, and why it happened. Warning: 3am rambling ahead.

In particular there are a couple of issues here (I'm going to use LLVM-IR primarily here, because that's I think more convenient to demonstrate the causations, -Ccodegen-units=1 everywhere for simplicity):

When we compile a fn main() {} without --remap-path-prefix and with split debug info we'll end up emitting the following metadata:

!llvm.dbg.cu = !{!18}
!18 = distinct !DICompileUnit(language: DW_LANG_Rust, file: !19, producer: "clang LLVM (rustc version 1.52.0-nightly (3f5aee2d5 2021-02-12))", isOptimized: false, runtimeVersion: 0, splitDebugFilename: "main.test.7rcbfp3g-cgu.0.rcgu.dwo", emissionKind: FullDebug, enums: !4, globals: !20)
!19 = !DIFile(filename: "/tmp/test.rs", directory: "outdir")

Adding directory: "outdir" here is necessary for debuggers to find split debug info that we'd place alongside the binaries we output when gdb is invoked from the source directory. The filename here is the "name" of the codegen unit, and just so happens that we chose to name it after the source filename, but it could be pretty much any random string.

Now, as the issue is written (with the output path being relative), this does not necessarily present a reproducibility problem. rustc will encode a path that's "as given" to its -o flag. The reproducibility problem starts cropping up more prominently when you provide an output path that's absolute (-o /tmp/outdir/result). Then you get metadata as such (note the path to source directory encoded in the codegen unit metadata):

!18 = distinct !DICompileUnit(language: DW_LANG_Rust, file: !19, producer: "clang LLVM (rustc version 1.52.0-nightly (3f5aee2d5 2021-02-12))", isOptimized: false, runtimeVersion: 0, splitDebugFilename: "main.test.7rcbfp3g-cgu.0.rcgu.dwo", emissionKind: FullDebug, enums: !4, globals: !20)
!19 = !DIFile(filename: "test.rs", directory: "/tmp/outdir")

Turns out this directory is not affected by --remap-path-prefix and that's ultimately where most of the problem lies, I think.

I have a solution in mind, I think, but I want to look at what clang does, exactly, first.

@nagisa
Copy link
Member

nagisa commented Feb 14, 2021

So, this is what clang does:

  1. It sets the codegen unit directory to the absolute path, no matter what:

    clang -g test.c -o banana/out

    produces

    !0 = distinct !DICompileUnit(language: DW_LANG_C99, file: !1, producer: "clang version 11.0.1", isOptimized: true, runtimeVersion: 0, emissionKind: FullDebug, enums: !2, splitDebugInlining: false, nameTableKind: None)
    !1 = !DIFile(filename: "test.c", directory: "/tmp")

    This differs from the behaviour rustc implements which sets the directory relative to the
    compiler working directory. It also differs in the way that this does point to the source
    file rather than the output directory.

  2. It will not normalize the filename and directory, at all:

    clang -gsplit-dwarf banana/test.c -o banana/out

    produces

    !0 = distinct !DICompileUnit(language: DW_LANG_C99, file: !1, producer: "clang version 11.0.1", isOptimized: true, runtimeVersion: 0, emissionKind: FullDebug, enums: !2, splitDebugInlining: false, nameTableKind: None)
    !1 = !DIFile(filename: "banana/test.c", directory: "/tmp")

    rather than

    !1 = !DIFile(filename: "test.c", directory: "/tmp/banana")

    Note how the filename contains path components.

  3. When building with -gsplit-dwarf, the way it handles DIFile does not differ from above.
    Instead to correctly point the debugger at the correct location it will use a path in
    splitDebugFilename:

    clang -gsplit-dwarf test.c -o banana/out -c

    will produce

    !0 = distinct !DICompileUnit(language: DW_LANG_C99, file: !1, producer: "clang version 11.0.1", isOptimized: true, runtimeVersion: 0, splitDebugFilename: "banana/out.dwo", emissionKind: FullDebug, enums: !2, splitDebugInlining: false, nameTableKind: GNU)
    !1 = !DIFile(filename: "test.c", directory: "/tmp")
  4. It applies the -ffile-prefix-map and -fdebug-prefix-map to the codegen unit, but only when
    it matches fully. So, for example:

    clang -g banana/test.c -o banana/out -ffile-prefix-map=/tmp=

    Will do exactly what you'd expect and remove the /tmp prefix:

    !0 = distinct !DICompileUnit(language: DW_LANG_C99, file: !1, producer: "clang version 11.0.1", isOptimized: true, runtimeVersion: 0, emissionKind: FullDebug, enums: !2, splitDebugInlining: false, nameTableKind: None)
    !1 = !DIFile(filename: "banana/test.c", directory: "")

    (Same holds for the -fdebug-prefix-map)

    However for this invocation:

    clang -g banana/test.c -o banana/out -ffile-prefix-map=/tmp/banana= -S -emit-llvm
    

    clang will fail to match any paths, and not replace anything:

    !0 = distinct !DICompileUnit(language: DW_LANG_C99, file: !1, producer: "clang version 11.0.1", isOptimized: true, runtimeVersion: 0, emissionKind: FullDebug, enums: !2, splitDebugInlining: false, nameTableKind: None)
    !1 = !DIFile(filename: "banana/test.c", directory: "/tmp")

    And same holds for the split-dwarf case:

    clang -gsplit-dwarf banana/test.c -o banana/out -ffile-prefix-map=/tmp/banana= -c
    !0 = distinct !DICompileUnit(language: DW_LANG_C99, file: !1, producer: "clang version 11.0.1", isOptimized: true, runtimeVersion: 0, splitDebugFilename: "banana/out.dwo", emissionKind: FullDebug, enums: !2, splitDebugInlining: false, nameTableKind: GNU)
    !1 = !DIFile(filename: "banana/test.c", directory: "/tmp")

With all these in mind, I see a couple of things to adjust on the rustc side, and we can probably
make it behave better than clang too, but ultimately the only thing I see that really hurts
reproducibility right now is that we don't apply --remap-prefix-path to the path that we put into
directory at all.

@apiraino
Copy link
Contributor

@rustbot label -I-prioritize +P-high

@rustbot rustbot added P-high High priority and removed I-prioritize Issue: Indicates that prioritization has been requested for this issue. labels Feb 17, 2021
@apiraino
Copy link
Contributor

Assigning P-high as discussed as part of the Prioritization Working Group procedure and removing I-prioritize.

bors added a commit to rust-lang-ci/rust that referenced this issue Feb 23, 2021
Set path of the compile unit to the source directory

As part of the effort to implement split dwarf debug info, we ended up
setting the compile unit location to the output directory rather than
the source directory. Furthermore, it seems like we failed to remap the
prefixes for this as well!

The desired behaviour is to instead set the `DW_AT_GNU_dwo_name` to a
path relative to compiler's working directory. This still allows
debuggers to find the split dwarf files, while not changing the
behaviour of the code that is compiling with regular debug info, and not
changing the compiler's behaviour with regards to reproducibility.

Fixes rust-lang#82074

cc `@alexcrichton` `@davidtwco`
@bors bors closed this as completed in 16c7188 Feb 23, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-debuginfo Area: Debugging information in compiled programs (DWARF, PDB, etc.) A-reproducibility Area: Reproducible / Deterministic builds C-bug Category: This is a bug. P-high High priority regression-from-stable-to-stable Performance or correctness regression from one stable version to another. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants