Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix lld support #58959

Merged
merged 1 commit into from
Sep 12, 2021
Merged

Fix lld support #58959

merged 1 commit into from
Sep 12, 2021

Conversation

janvorli
Copy link
Member

I have found that when I've made a change to use the lld linker in July,
I've missed to push part of the change that was supposed to go to Arcade.

Without this change, the lld enabling flag is not used and we end up using
the default linker.

I am checking it to the runtime repo to ensure that it is fixed for the
RC2 release. I will make the same change in the Arcade repo.

I have found that when I've made a change to use the lld linker in July,
I've missed to push part of the change that was supposed to go to Arcade.

I am checking it to the runtime repo to ensure that it is fixed for the
RC2 release. I will make the same change in the Arcade repo.
@janvorli janvorli added this to the 6.0.0 milestone Sep 10, 2021
@janvorli janvorli self-assigned this Sep 10, 2021
@ghost
Copy link

ghost commented Sep 10, 2021

Tagging subscribers to this area: @hoyosjs
See info in area-owners.md if you want to be subscribed.

Issue Details

I have found that when I've made a change to use the lld linker in July,
I've missed to push part of the change that was supposed to go to Arcade.

Without this change, the lld enabling flag is not used and we end up using
the default linker.

I am checking it to the runtime repo to ensure that it is fixed for the
RC2 release. I will make the same change in the Arcade repo.

Author: janvorli
Assignees: janvorli
Labels:

area-Infrastructure-coreclr

Milestone: 6.0.0

@janvorli janvorli mentioned this pull request Sep 10, 2021
@@ -138,8 +138,8 @@ function(add_toolchain_linker_flag Flag)
if (NOT Config STREQUAL "")
Copy link
Member

@am11 am11 Sep 10, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this file will be overwrritten by the next arcade update, this would need to go into arcade.

(proposal dotnet/arcade#6768 was based on this very practical issue, as I myself including 10s of other people have forgotten about it in past, in various repos on several occasions .. a simple GitHub action can probably help 🙂)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know, I have written that in the description above. We need to get this in very quickly to get it into the RC2, so I am making the same change in arcade and here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah ok, missed that. 👍
just noticed that most recently bot silently overwrote updates made in eng/common ~10 days ago: a539321. Overall this automation seems semi-complete.

@janvorli
Copy link
Member Author

The arcade clone of this change is dotnet/arcade#7883

@janvorli janvorli closed this Sep 11, 2021
@janvorli janvorli reopened this Sep 11, 2021
@janvorli janvorli merged commit f29484a into dotnet:main Sep 12, 2021
@crummel
Copy link
Contributor

crummel commented Sep 14, 2021

I checked out f29484a and built this on a CentOS 8 ARM64 Raspberry Pi with lld available and I'm still seeing:

[chris@crummel-rpi4-centos runtime]$ git show --oneline
f29484a8681 (HEAD) Fix lld support (#58959)
[chris@crummel-rpi4-centos runtime]$ lld --version
lld is a generic driver.
Invoke ld.lld (Unix), ld64.lld (macOS), lld-link (Windows), wasm-ld (WebAssembly) instead
[chris@crummel-rpi4-centos runtime]$ ld.lld --version
LLD 11.0.0 (compatible with GNU linkers)
[chris@crummel-rpi4-centos runtime]$ readelf -Wl ./artifacts/bin/coreclr/Linux.arm64.Debug/libcoreclr.so

Elf file type is DYN (Shared object file)
Entry point 0x93f80
There are 8 program headers, starting at offset 64

Program Headers:
  Type           Offset   VirtAddr           PhysAddr           FileSiz  MemSiz   Flg Align
  LOAD           0x000000 0x0000000000000000 0x0000000000000000 0xc0e6cc 0xc0e6cc R E 0x10000
  LOAD           0xc10d88 0x0000000000c20d88 0x0000000000c20d88 0x04ab78 0x0cb2c0 RW  0x10000
  DYNAMIC        0xc3c620 0x0000000000c4c620 0x0000000000c4c620 0x0002c0 0x0002c0 RW  0x8
  NOTE           0x000200 0x0000000000000200 0x0000000000000200 0x000024 0x000024 R   0x4
  TLS            0xc10d88 0x0000000000c20d88 0x0000000000c20d88 0x000000 0x0000f4 R   0x8
  GNU_EH_FRAME   0xa5e830 0x0000000000a5e830 0x0000000000a5e830 0x052c4c 0x052c4c R   0x4
  GNU_STACK      0x000000 0x0000000000000000 0x0000000000000000 0x000000 0x000000 RW  0x10
  GNU_RELRO      0xc10d88 0x0000000000c20d88 0x0000000000c20d88 0x02f278 0x02f278 R   0x1

 Section to Segment mapping:
  Segment Sections...
   00     .note.gnu.build-id .hash .gnu.hash .dynsym .dynstr .gnu.version .gnu.version_d .gnu.version_r .rela.dyn .rela.plt .init .plt .text .fini .rodata __tracepoints_strings .eh_frame_hdr .eh_frame .gcc_except_table
   01     .init_array .fini_array .data.rel.ro .dynamic .got .data __tracepoints __tracepoints_ptrs .bss
   02     .dynamic
   03     .note.gnu.build-id
   04     .tbss
   05     .eh_frame_hdr
   06
   07     .init_array .fini_array .data.rel.ro .dynamic .got

i.e. .text and .rodata are still in the same section. I'm currently trying this on a RHEL ARM64 machine but wanted to let you know this is what I was seeing right now.

cc @janvorli @tmds

@janvorli
Copy link
Member Author

@crummel have you made a clean build? This is what I get when building it on my arm64 Ubuntu 18.04:

janvorli@odroid:/mnt/ext/git/runtime2$ readelf -Wl artifacts/bin/coreclr/Linux.arm64.Debug/libcoreclr.so

Elf file type is DYN (Shared object file)
Entry point 0x350000
There are 11 program headers, starting at offset 64

Program Headers:
  Type           Offset   VirtAddr           PhysAddr           FileSiz  MemSiz   Flg Align
  PHDR           0x000040 0x0000000000000040 0x0000000000000040 0x000268 0x000268 R   0x8
  LOAD           0x000000 0x0000000000000000 0x0000000000000000 0x347324 0x347324 R   0x10000
  LOAD           0x350000 0x0000000000350000 0x0000000000350000 0x952c30 0x952c30 R E 0x10000
  LOAD           0xcb0000 0x0000000000cb0000 0x0000000000cb0000 0x02f278 0x02f278 RW  0x10000
  LOAD           0xce0000 0x0000000000ce0000 0x0000000000ce0000 0x01baf0 0x09c2e8 RW  0x10000
  TLS            0xca2c30 0x0000000000ca2c30 0x0000000000ca2c30 0x000000 0x0000f4 R   0x8
  DYNAMIC        0xcdb918 0x0000000000cdb918 0x0000000000cdb918 0x000240 0x000240 RW  0x8
  GNU_RELRO      0xcb0000 0x0000000000cb0000 0x0000000000cb0000 0x030000 0x030000 R   0x1
  GNU_EH_FRAME   0x1ae014 0x00000000001ae014 0x00000000001ae014 0x0533cc 0x0533cc R   0x4
  GNU_STACK      0x000000 0x0000000000000000 0x0000000000000000 0x000000 0x000000 RW  0
  NOTE           0x0002a8 0x00000000000002a8 0x00000000000002a8 0x000024 0x000024 R   0x4

 Section to Segment mapping:
  Segment Sections...
   00
   01     .note.gnu.build-id .dynsym .gnu.version .gnu.version_d .gnu.version_r .gnu.hash .dynstr .rela.dyn .rela.plt .gcc_except_table .rodata __tracepoints_strings .eh_frame_hdr .eh_frame
   02     .text .init .fini .plt
   03     .fini_array .init_array .data.rel.ro .dynamic .got .got.plt
   04     .data .tm_clone_table __tracepoints __tracepoints_ptrs .bss
   05     .tbss
   06     .dynamic
   07     .fini_array .init_array .data.rel.ro .dynamic .got .got.plt
   08     .eh_frame_hdr
   09
   10     .note.gnu.build-id

And I got the same from a cross build on my x64 Ubuntu.

Can you please check the output of the (clean) build, close to the beginning, it should say something like:

Check for working C compiler: /usr/bin/clang-9

We only look for lld with the same version syntax as the clang. So in my case, it was lld-9. Can you check if the lld is available in such a form?

@omajid
Copy link
Member

omajid commented Oct 4, 2021

Do you know if there's a Microsoft-built arm64 SDK build with this fix included? I am using dotnet-sdk-6.0.100-rc.2.21426.20-linux-arm64.tar.gz and I see:

$ readelf -Wl ./.dotnet/shared/Microsoft.NETCore.App/6.0.0-rc.2.21423.6/libcoreclr.so

Elf file type is DYN (Shared object file)
Entry point 0x79f80
There are 8 program headers, starting at offset 64

Program Headers:
  Type           Offset   VirtAddr           PhysAddr           FileSiz  MemSiz   Flg Align
  LOAD           0x000000 0x0000000000000000 0x0000000000000000 0x64de98 0x64de98 R E 0x10000
  LOAD           0x64e750 0x000000000065e750 0x000000000065e750 0x03f9f0 0x0a4b30 RW  0x10000
  DYNAMIC        0x6705e0 0x00000000006805e0 0x00000000006805e0 0x0002b0 0x0002b0 RW  0x8
  NOTE           0x000200 0x0000000000000200 0x0000000000000200 0x000024 0x000024 R   0x4
  TLS            0x64e750 0x000000000065e750 0x000000000065e750 0x000000 0x0000e9 R   0x10
  GNU_EH_FRAME   0x564b3c 0x0000000000564b3c 0x0000000000564b3c 0x01fc9c 0x01fc9c R   0x4
  GNU_STACK      0x000000 0x0000000000000000 0x0000000000000000 0x000000 0x000000 RW  0x10
  GNU_RELRO      0x64e750 0x000000000065e750 0x000000000065e750 0x0248b0 0x0248b0 R   0x1

 Section to Segment mapping:
  Segment Sections...
   00     .note.gnu.build-id .gnu.hash .dynsym .dynstr .gnu.version .gnu.version_d .gnu.version_r .rela.dyn .rela.plt .init .plt .text .fini .rodata __tracepoints_strings .eh_frame_hdr .eh_frame .gcc_except_table 
   01     .init_array .fini_array .jcr .data.rel.ro .dynamic .got .data __tracepoints __tracepoints_ptrs .bss 
   02     .dynamic 
   03     .note.gnu.build-id 
   04     .tbss 
   05     .eh_frame_hdr 
   06     

@omajid
Copy link
Member

omajid commented Oct 4, 2021

Can you check if the lld is available in such a form?

It doesn't look like there is. There's a clang-11, but no lld-11:

$ rpm -ql clang
/usr/bin/clang
/usr/bin/clang++
/usr/bin/clang++-11
/usr/bin/clang-11
/usr/bin/clang-cl
/usr/bin/clang-cpp
/usr/lib/.build-id
/usr/lib/.build-id/f1
/usr/lib/.build-id/f1/61329902b127ec8a6efefd3e26c1cdb4a35c4b
/usr/share/licenses/clang
/usr/share/licenses/clang/LICENSE.TXT
/usr/share/man/man1/clang++-11.1.gz
/usr/share/man/man1/clang++.1.gz
/usr/share/man/man1/clang-11.1.gz
/usr/share/man/man1/clang.1.gz
$ rpm -ql lld
/usr/bin/ld.lld
/usr/bin/ld64.lld
/usr/bin/lld
/usr/bin/lld-link
/usr/bin/wasm-ld
/usr/lib/.build-id
/usr/lib/.build-id/08
/usr/lib/.build-id/08/1854999505ea9ad225b29fca6e451841022496
/usr/share/licenses/lld
/usr/share/licenses/lld/LICENSE.TXT

@am11
Copy link
Member

am11 commented Oct 4, 2021

I copied a link from https://github.com/dotnet/installer#installers-and-binaries Release/6.0.1XX (6.0.x Runtime):

$ mkdir foo
$ curl -sSL https://aka.ms/dotnet/6.0.1xx/daily/dotnet-sdk-linux-arm64.tar.gz | tar xzf - -C foo
$ readelf -Wl foo/shared/Microsoft.NETCore.App/6.0.0-rtm.21504.6/libcoreclr.so

Elf file type is DYN (Shared object file)
Entry point 0x1b0000
There are 11 program headers, starting at offset 64

Program Headers:
  Type           Offset   VirtAddr           PhysAddr           FileSiz  MemSiz   Flg Align
  PHDR           0x000040 0x0000000000000040 0x0000000000000040 0x000268 0x000268 R   0x8
  LOAD           0x000000 0x0000000000000000 0x0000000000000000 0x1a58bc 0x1a58bc R   0x10000
  LOAD           0x1b0000 0x00000000001b0000 0x00000000001b0000 0x4ab780 0x4ab780 R E 0x10000
  LOAD           0x660000 0x0000000000660000 0x0000000000660000 0x024848 0x024848 RW  0x10000
  LOAD           0x690000 0x0000000000690000 0x0000000000690000 0x01b140 0x080280 RW  0x10000
  TLS            0x65b780 0x000000000065b780 0x000000000065b780 0x000000 0x0000e9 R   0x10
  DYNAMIC        0x681eb8 0x0000000000681eb8 0x0000000000681eb8 0x000240 0x000240 RW  0x8
  GNU_RELRO      0x660000 0x0000000000660000 0x0000000000660000 0x025000 0x025000 R   0x1
  GNU_EH_FRAME   0x0fd9e8 0x00000000000fd9e8 0x00000000000fd9e8 0x01fe24 0x01fe24 R   0x4
  GNU_STACK      0x000000 0x0000000000000000 0x0000000000000000 0x000000 0x000000 RW  0
  NOTE           0x0002a8 0x00000000000002a8 0x00000000000002a8 0x000024 0x000024 R   0x4

 Section to Segment mapping:
  Segment Sections...
   00
   01     .note.gnu.build-id .dynsym .gnu.version .gnu.version_d .gnu.version_r .gnu.hash .dynstr .rela.dyn .rela.plt .gcc_except_table .rodata __tracepoints_strings .eh_frame_hdr .eh_frame
   02     .text .init .fini .plt
   03     .jcr .fini_array .init_array .data.rel.ro .dynamic .got .got.plt
   04     .data .tm_clone_table __tracepoints __tracepoints_ptrs .bss
   05     .tbss
   06     .dynamic
   07     .jcr .fini_array .init_array .data.rel.ro .dynamic .got .got.plt
   08     .eh_frame_hdr
   09
   10     .note.gnu.build-id

@omajid
Copy link
Member

omajid commented Oct 4, 2021

I had to do this to make it work for me:

diff --git a/eng/common/native/init-compiler.sh b/eng/common/native/init-compiler.sh
index 28f5145a6f7..3247258d789 100644
--- a/eng/common/native/init-compiler.sh
+++ b/eng/common/native/init-compiler.sh
@@ -112,7 +112,7 @@ if [[ -z "$CC" ]]; then
 fi
 
 if [[ "$compiler" == "clang" ]]; then
-    if command -v "lld$desired_version" > /dev/null; then
+    if command -v "lld" || command -v "lld$desired_version" > /dev/null; then
         # Only lld version >= 9 can be considered stable
         if [[ "$majorVersion" -ge 9 ]]; then
             LDFLAGS="-fuse-ld=lld"

The result:

$ readelf -Wl ./artifacts/bin/coreclr/Linux.arm64.Debug/libcoreclr.so

Elf file type is DYN (Shared object file)
Entry point 0x352f40
There are 11 program headers, starting at offset 64

Program Headers:
  Type           Offset   VirtAddr           PhysAddr           FileSiz  MemSiz   Flg Align
  PHDR           0x000040 0x0000000000000040 0x0000000000000040 0x000268 0x000268 R   0x8
  LOAD           0x000000 0x0000000000000000 0x0000000000000000 0x342f0c 0x342f0c R   0x10000
  LOAD           0x342f40 0x0000000000352f40 0x0000000000352f40 0x8f6fb0 0x8f6fb0 R E 0x10000
  LOAD           0xc39ef0 0x0000000000c59ef0 0x0000000000c59ef0 0x02f198 0x02f198 RW  0x10000
  LOAD           0xc69100 0x0000000000c99100 0x0000000000c99100 0x01b9f0 0x09c198 RW  0x10000
  TLS            0xc39ef0 0x0000000000c49ef0 0x0000000000c49ef0 0x000000 0x0000f4 R   0x8
  DYNAMIC        0xc65718 0x0000000000c85718 0x0000000000c85718 0x000250 0x000250 RW  0x8
  GNU_RELRO      0xc39ef0 0x0000000000c59ef0 0x0000000000c59ef0 0x02f198 0x030110 R   0x1
  GNU_EH_FRAME   0x1ac3cc 0x00000000001ac3cc 0x00000000001ac3cc 0x052cac 0x052cac R   0x4
  GNU_STACK      0x000000 0x0000000000000000 0x0000000000000000 0x000000 0x000000 RW  0
  NOTE           0x0002a8 0x00000000000002a8 0x00000000000002a8 0x000024 0x000024 R   0x4

 Section to Segment mapping:
  Segment Sections...
   00     
   01     .note.gnu.build-id .dynsym .gnu.version .gnu.version_d .gnu.version_r .gnu.hash .hash .dynstr .rela.dyn .rela.plt .gcc_except_table .rodata __tracepoints_strings .eh_frame_hdr .eh_frame 
   02     .text .init .fini .plt 
   03     .data.rel.ro .fini_array .init_array .dynamic .got .got.plt 
   04     .data .tm_clone_table __tracepoints __tracepoints_ptrs .bss 
   05     .tbss 
   06     .dynamic 
   07     .data.rel.ro .fini_array .init_array .dynamic .got .got.plt 
   08     .eh_frame_hdr 
   09     
   10     .note.gnu.build-id 

@am11
Copy link
Member

am11 commented Oct 4, 2021

That means your lld and clang are not the same versions. We generally expect the whole toolchain at the same version. The exception is objcopy, which has a bug in llvm9 (fixed in llvm10 onwards). See the ubuntu 16 cross build containers targetting arm64; how it is set up.

@omajid
Copy link
Member

omajid commented Oct 4, 2021

That means your lld and clang are not the same versions

But they are:

$ rpm -q clang llvm lld
clang-11.0.0-1.module+el8.4.0+8598+a071fcd5.aarch64
llvm-11.0.0-2.module+el8.4.0+8598+a071fcd5.aarch64
lld-11.0.0-3.module+el8.4.0+8719+b61528dc.aarch64
$ lld --version
lld is a generic driver.
Invoke ld.lld (Unix), ld64.lld (macOS), lld-link (Windows), wasm-ld (WebAssembly) instead
$ ld.lld --version
LLD 11.0.0 (compatible with GNU linkers)
$ clang  --version
clang version 11.0.0 (Red Hat 11.0.0-1.module+el8.4.0+8598+a071fcd5)
Target: aarch64-unknown-linux-gnu
Thread model: posix
InstalledDir: /usr/bin

We generally expect the whole toolchain at the same version.

Sure, but the binaries named with a suffix is a Debian/Ubuntu convention, right? It's not true for RHEL.

@am11
Copy link
Member

am11 commented Oct 5, 2021

Yes we use naming convention for compiler in init-compiler which is how the packaging works across many operating systems and distros. For tools lld is the exception that was made by this PR; we typically add the introspection for tools in https://github.com/dotnet/runtime/blob/7afb66262c287b7832b872473d1e668038aa2496/eng/native/configuretools.cmake, locate_toolchain_exec which uses cmake's find_program which is known to work in all cases. Should we use the same mechanism for lld, @janvorli?

@janvorli janvorli deleted the fix-lld-support-2 branch October 5, 2021 07:24
@janvorli
Copy link
Member Author

janvorli commented Oct 5, 2021

@am11 the code in the init-tools.sh is used just to verify that the lld available is of high enough version. LLD before version 9 was not working properly. And IIRC, the version of LLD to use is picked by clang to match it when the -fuse-ld=lld option is passed. So I'd prefer fixing the init-tools.sh by what @omajid has added. And maybe we can check the lld version by parsing the output of clang$desired_version -fuse-ld=lld -Wl,--version there instead to cover the case when the desired_version is empty.

@am11
Copy link
Member

am11 commented Oct 5, 2021

@janvorli, previously cmake's find_program was chosen to verify six other tools in configuretools.cmake, and we throw a consistent error when find_program fails: Unable to find toolchain executable. Name: ${exec}, Prefix: ${TOOLSET_PREFIX}. We even use the same mechanism in toolchain.cmake (for cross-compilation). The one-off exception for lld just makes it inconsistent.

@janvorli
Copy link
Member Author

janvorli commented Oct 5, 2021

But the lld is different from all the other tools. It is never invoked directly in any way, the compiler invokes it (and locates it) by itself based on the -fuse-ld=lld option. We don't have a way to tell the compiler which one to use if there are multiple ones. That's why it is handled differently.

@am11
Copy link
Member

am11 commented Oct 5, 2021

Thanks. I forgot about that difference. We could, however, still use find_program and set executable:

# same for CXX
SET(CMAKE_C_LINK_EXECUTABLE "${CMAKE_LINKER_found-by-locate_toolchain_exec}
    <OBJECTS> -o  <TARGET> <CMAKE_C_LINK_FLAGS> <LINK_FLAGS> <LINK_LIBRARIES>")

(supported in cmake 3.6, our lowest supported version)

@janvorli
Copy link
Member Author

janvorli commented Oct 5, 2021

I would prefer to leave the decision on which lld to use on the compiler. I believe it would always pick one that matches the compiler version (based on my experiments) and I don't see a benefit of forcing a mismatching version.

@am11
Copy link
Member

am11 commented Oct 5, 2021

Right, we don't have to force set it explicitly, we could use existing locate_toolchain_exec just to validate whether the tool exists (locate_toolchain_exec(ld DISCARD_RESULT)) which already handles various cases.

@omajid
Copy link
Member

omajid commented Oct 5, 2021

Hey, @janvorli , even aside from fixing the build on RHEL-like environments, the actual fixes haven't landed in a 6.0 SDK yet, have they? I tried a recent RC2 and didn't get anywhere: #58959 (comment)

@am11
Copy link
Member

am11 commented Oct 5, 2021

@omajid, this fix has landed in RTM branch: #58959 (comment).

@janvorli
Copy link
Member Author

janvorli commented Oct 5, 2021

Hmm, it is strange, the clone of this fix that I've made in the arcade repo is in the release/6.0-rc2 branch. I wonder how come the rc2 binaries don't have it.

@janvorli
Copy link
Member Author

janvorli commented Oct 5, 2021

@am11

just to validate whether the tool exists (locate_toolchain_exec(ld DISCARD_RESULT))

The lld linker usage was supposed to be optional (=> use it if installed, use standard ld if not). The build that doesn't use lld works fine on ARM64 devices with 4kB large memory pages, it only fails when the page size is larger and so the page where the gs cookie is located happens to be with code. So maybe we should enforce lld usage on arm64 linux.

@am11
Copy link
Member

am11 commented Oct 5, 2021

Yup, the fix is also in the latest RC2, downloaded from https://aka.ms/dotnet/6.0.1XX-rc2/daily/dotnet-sdk-linux-arm64.tar.gz (6.0.0-rc.2.21470.23)).

@omajid
Copy link
Member

omajid commented Oct 11, 2021

Thanks. I can confirm that that a recent rc2 build indeed fixes things for me.

@ghost ghost locked as resolved and limited conversation to collaborators Nov 10, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants