Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

.NET Native Library Packaging (RuntimeIdentifiers, build, testing, VS etc.) #33845

Open
nietras opened this issue Jul 7, 2023 · 17 comments
Open
Assignees
Labels
Area-External native-assets Issues related to how the SDK should deal with Native assets untriaged Request triage from a team member

Comments

@nietras
Copy link

nietras commented Jul 7, 2023

cc: @tannergooding @richlander @jkotas

This is yet another issue regarding how to best author native library nuget packages and define, build, test, publish deploy applications that consume these. I have tried hard to wrap my head about this by reading many issues and studying existing packages. I have a particular need that is similar to TorchSharp with massive native libraries that not only need to be split into fragments but also where if possible it would be best only to "download" the runtime identifier (RID) specific packages needed for local development. (But on windows that local development often means BOTH x86 and x64 in our case).

Below I wrote a walk-through I did of using ClangSharp (in excessive detail for reference) and the many questions that it raised for me compared to how I am used to working with this (based on our own way of authoring native library packages that are explicitly copied to sub-directories (x64, x86) alongside exe and with those directories then added at runtime based on the process arch/os/system to dll directories i.e. via AddDllDirectory. Having something "custom" is a maintenance issue of course, but also an on-boarding issue. Using documented best practices would be best, but as far as I can tell there are none?

In any case, at the end of the walk-through I encounter the problem that when specifying multiple RIDs i.e.

<RuntimeIdentifiers>win-x64;win-x86</RuntimeIdentifiers>

then the runtime.json trick does not appear to work when running unit tests from inside Visual Studio. I have to explicitly add the RID specific nuget packages anyway, so I then wonder how exactly is one supposed to author nuget packages to be able to support running multiple RIDs (in this case solely interested in win-x86 and win-x64 for now) with full support for it as usual in VS and other tools? We need to be able to debug and run from VS?

And how do you switch which RID you run with when F5 running in VS?

Should I simply accept that the runtime.json way is too flawed and explicitly reference all needed nuget packages? Would this then avoid the need to specify RIDs? Which also has issues with "forcing" self-contained (we don't want that), in fact we'd like to simply be able to deploy/copy-paste build output as something like:

App.exe
win-x64\
  // win-x64 specific native libraries
win-x86\
  // win-x86 specific native libraries

where the app is not RID specific (framework-dependent of course). And this should work on both win-x86/winx64. This is what we have now and what works. Our developers are used to this. But it's based on native library nuget packages that explicitly copy their native library contents to those folders and of course referencing all those RID specific ones. I had hoped perhaps one could avoid the RID specific referencing, but that does not seem to work "smoothly". Which I'd guess then means the whole runtime.json is not the way to go.

Secondly, I think I read somewhere (can't find or remember where) that for .NET 8 it is considered to force a specific RID on build? I can see given my experience below why one might consider doing that, but that would then raise other issues such as losing what used to be a core tenant (IMHO) of .NET which is that a build output (not publish) is RID agnostic. Would that be lost then?

All in all, to solve these issues I have to author my own little tool for packaging the native libraries, consider all the issues around consumption, testing etc. And after going through all this I am still left with feeling rather lost 😅 I still don't know exactly what is the best solution here. And the packages I am creating are intended to be published for the public, e.g. so I can publish the revived CNTK packages I've made on nuget.org for example.

On top of this we still want to support publishing RID specific applications, but then we don't want native libraries embedded in single file, there is an option for that which is great, but then we want those dlls in sub-folder, not directly next to the exe, which means we have to hack around that in MSBuild and then face issues with mixed-mode assemblies etc. Yes, we also have those which also makes things very interesting.

ML/AI isn't going away. For each new CUDA or whatever release the native libraries double in size (minimum!). Easy authoring and consumption of those would be great, but I am sure also won't be solved in the immediate future, I need to know what to do now?

The walk-through will come as the next comment.


Links

@dotnet-issue-labeler dotnet-issue-labeler bot added Area-External untriaged Request triage from a team member labels Jul 7, 2023
@nietras
Copy link
Author

nietras commented Jul 7, 2023

ClangSharp/libclang Walkthrough

Create simple console application in for example a Tester directory.

dotnet new console

Add package reference to ClangSharp package so csproj looks like:

<Project Sdk="Microsoft.NET.Sdk">

  <PropertyGroup>
    <OutputType>Exe</OutputType>
    <TargetFramework>net7.0</TargetFramework>
    <ImplicitUsings>enable</ImplicitUsings>
    <Nullable>enable</Nullable>
  </PropertyGroup>

  <ItemGroup>
    <PackageReference Include="ClangSharp" Version="16.0.0" />
  </ItemGroup>
  
</Project>

Run dotnet restore -verbosity:detailed > restore.txt on project. Verbosity set
to be able to check what happens. Nothing of worth here. Look in .nuget
package cache to see what is downloaded:

"C:\Users\<USERNAME>\.nuget\packages\clangsharp"
"C:\Users\<USERNAME>\.nuget\packages\clangsharp.interop"
"C:\Users\<USERNAME>\.nuget\packages\libclang"
"C:\Users\<USERNAME>\.nuget\packages\libclangsharp"

What's interesting here is no RID specific packages appear to be
downloaded (yet).
The ClangSharp package has a nuspec file with:

<?xml version="1.0" encoding="utf-8"?>
<package xmlns="http://schemas.microsoft.com/packaging/2013/05/nuspec.xsd">
  <metadata minClientVersion="4.3">
    <id>ClangSharp</id>
    <version>16.0.0</version>
    <authors>.NET Foundation and Contributors</authors>
    <requireLicenseAcceptance>true</requireLicenseAcceptance>
    <license type="expression">MIT</license>
    <licenseUrl>https://licenses.nuget.org/MIT</licenseUrl>
    <projectUrl>https://github.com/dotnet/clangsharp/</projectUrl>
    <description>ClangSharp are strongly-typed safe Clang bindings written in C# for .NET and Mono, tested on Linux and Windows.</description>
    <copyright>Copyright © .NET Foundation and Contributors</copyright>
    <repository type="git" url="https://github.com/dotnet/clangsharp/" commit="1c5588c84a5d22d2ddab41dbf7854667bf722332" />
    <dependencies>
      <group targetFramework="net6.0">
        <dependency id="ClangSharp.Interop" version="16.0.0" exclude="Build,Analyzers" />
      </group>
      <group targetFramework="net7.0">
        <dependency id="ClangSharp.Interop" version="16.0.0" exclude="Build,Analyzers" />
      </group>
      <group targetFramework=".NETStandard2.0">
        <dependency id="ClangSharp.Interop" version="16.0.0" exclude="Build,Analyzers" />
      </group>
    </dependencies>
  </metadata>
</package>

Jumping over the interop package and looking at libClang this nuspec has:

<?xml version="1.0" encoding="utf-8"?>
<package xmlns="http://schemas.microsoft.com/packaging/2013/01/nuspec.xsd">
  <metadata minClientVersion="2.12">
    <id>libclang</id>
    <version>16.0.6</version>
    <authors>.NET Foundation and Contributors</authors>
    <owners>.NET Foundation and Contributors</owners>
    <requireLicenseAcceptance>true</requireLicenseAcceptance>
    <license type="expression">Apache-2.0 WITH LLVM-exception</license>
    <licenseUrl>https://licenses.nuget.org/Apache-2.0%20WITH%20LLVM-exception</licenseUrl>
    <projectUrl>https://github.com/dotnet/clangsharp</projectUrl>
    <description>Multi-platform native library for libclang.</description>
    <copyright>Copyright © LLVM Project</copyright>
    <repository type="git" url="https://github.com/llvm/llvm-project" branch="llvmorg-16.0.6" />
    <dependencies>
      <group targetFramework=".NETStandard2.0" />
    </dependencies>
  </metadata>
</package>

That's interesting given it has no dependencies and contains no libraries:

"C:\Users\<USERNAME>\.nuget\packages\libclang\16.0.6\.nupkg.metadata"
"C:\Users\<USERNAME>\.nuget\packages\libclang\16.0.6\.signature.p7s"
"C:\Users\<USERNAME>\.nuget\packages\libclang\16.0.6\libclang.16.0.6.nupkg"
"C:\Users\<USERNAME>\.nuget\packages\libclang\16.0.6\libclang.16.0.6.nupkg.sha512"
"C:\Users\<USERNAME>\.nuget\packages\libclang\16.0.6\libclang.nuspec"
"C:\Users\<USERNAME>\.nuget\packages\libclang\16.0.6\LICENSE.TXT"
"C:\Users\<USERNAME>\.nuget\packages\libclang\16.0.6\runtime.json"

But what's in the runtime.json file:

{
  "runtimes": {
    "linux-arm64": {
      "libclang": {
        "libclang.runtime.linux-arm64": "16.0.6"
      }
    },
    "linux-x64": {
      "libclang": {
        "libclang.runtime.linux-x64": "16.0.6"
      }
    },
    "osx-arm64": {
      "libclang": {
        "libclang.runtime.osx-arm64": "16.0.6"
      }
    },
    "osx-x64": {
      "libclang": {
        "libclang.runtime.osx-x64": "16.0.6"
      }
    },
    "win-arm64": {
      "libclang": {
        "libclang.runtime.win-arm64": "16.0.6"
      }
    },
    "win-x64": {
      "libclang": {
        "libclang.runtime.win-x64": "16.0.6"
      }
    },
    "win-x86": {
      "libclang": {
        "libclang.runtime.win-x86": "16.0.6"
      }
    }
  }
}

Ah, that appears to map RIDs to runtime specific packages. But none were
downloaded, so what happens when we build the project. Run dotnet build -verbosity:detailed > build.txt on project. Examining the build output and the
.nuget cache none of those runtime specific packages appear to be downloaded
(yet). Let's try running the project with some dummy code in Program.cs.

using ClangSharp.Interop;

using var index = CXIndex.Create();

It runs, but still no runtime specific packages downloaded nor any native
libraries in build output. Let's try a more involved example copied from a unit
test in ClangSharp.

// https://github.com/dotnet/ClangSharp/blob/main/tests/ClangSharp.UnitTests/CXTranslationUnitTest.cs
using ClangSharp.Interop;
using static ClangSharp.Interop.CXTranslationUnit_Flags;

var name = "basic";
var dir = Path.GetRandomFileName();
_ = Directory.CreateDirectory(dir);

try
{
    // Create a file with the right name
    var file = new FileInfo(Path.Combine(dir, name + ".c"));
    File.WriteAllText(file.FullName, "int main() { return 0; }");

    using var index = CXIndex.Create();
    using var translationUnit = CXTranslationUnit.Parse(
        index, file.FullName, Array.Empty<string>(),
        Array.Empty<CXUnsavedFile>(), CXTranslationUnit_None);
    var clangFile = translationUnit.GetFile(file.FullName);
}
finally
{
    Directory.Delete(dir, true);
}

This runs fine. But still no runtime specific packages downloaded nor any
native libraries in build output. Let's trying running the code in Visual
Studio with native debugging enabled. That is add launch settings with
"nativeDebugging": true. This is just a quick way to look at which native
libraries are loaded and from where. Many ways of doing that, just using
Visual Studio since quick and easy. In the Debug window one can see:

(Win32): Loaded '\bin\Debug\net7.0\ClangSharp.Interop.dll'. 
(CoreCLR: clrhost): Loaded '\bin\Debug\net7.0\ClangSharp.Interop.dll'. Skipped loading symbols. Module is optimized and the debugger option 'Just My Code' is enabled.
(Win32): Loaded 'C:\Program Files\LLVM\bin\libclang.dll'. Module was built without symbols.

Ah, turns out I have LLVM with clang installed 🤷‍ So this must be in
environment variable PATH. Which it turns out it is C:\Program Files\LLVM\bin. Let's try removing that, and restart all consoles, applications
in use.

Running the example program again will then fail with exception:

System.DllNotFoundException: 'Unable to load DLL 'libclang' or one of its dependencies: 
The specified module could not be found. (0x8007007E)'

Hmm, so the libclang native library is not available and the package is not
downloaded automatically? How does runtime.json then work?
Let's try running the application with a runtime identifier defined:

dotnet run -r win-x64 > run.txt

This takes a while, and only output is:

C:\Program Files\dotnet\sdk\7.0.400-preview.23274.1\Sdks\Microsoft.NET.Sdk\targets\Microsoft.NET.Sdk.targets(1142,5): 
  warning NETSDK1179: One of '--self-contained' or '--no-self-contained' options are required when '--runtime' is used. 
  [Tester.csproj]
Tester\4lzfbeoi.214\basic.c

but the program runs fine. Looking in .nuget and we can see the runtime
specific packages have actually been downloaded now.

"C:\Users\<USERNAME>\.nuget\packages\libclangsharp.runtime.win-x64"
"C:\Users\<USERNAME>\.nuget\packages\libclang.runtime.win-x64"

so what this means is we cannot actually run and define the application without
specifying a runtime identifier? That's seems problematic if we want to use this
as framework dependent AnyCPU application... in fact if we run the application
from Visual Studio again it will fail with the same exception as before.

Use tree /F to see the files in the bin output, which shows all the native
libraries related to libclang for win-x64 (and others).

├───bin
│   └───Debug
│       └───net7.0
│           │   ClangSharp.dll
│           │   ClangSharp.Interop.dll
│           │   Tester.deps.json
│           │   Tester.dll
│           │   Tester.exe
│           │   Tester.pdb
│           │   Tester.runtimeconfig.json
│           │
│           ├───egfakait.om3
│           │       basic.c
│           │
│           └───win-x64
│                   ClangSharp.dll
│                   ClangSharp.Interop.dll
│                   clretwrc.dll
│                   clrgc.dll
│                   clrjit.dll
│                   coreclr.dll
│                   createdump.exe
│                   hostfxr.dll
│                   hostpolicy.dll
│                   libclang.dll
│                   libClangSharp.dll
│                   Microsoft.CSharp.dll
│                   Microsoft.DiaSymReader.Native.amd64.dll
│                   Microsoft.VisualBasic.Core.dll
│                   Microsoft.VisualBasic.dll
│                   Microsoft.Win32.Primitives.dll
│                   Microsoft.Win32.Registry.dll
│                   mscordaccore.dll
│                   mscordaccore_amd64_amd64_7.0.523.17405.dll
│                   mscordbi.dll
│                   mscorlib.dll
│                   mscorrc.dll
│                   msquic.dll
│                   Tester.deps.json
│                   Tester.dll
│                   Tester.exe
│                   Tester.pdb
│                   Tester.runtimeconfig.json
│                   netstandard.dll
                    // Almost all System.*dlls follow here
│                   System.*.dll

Note how this has an exe under the specific runtime folder and all the dlls
next to it.

As far as I can tell this means the runtime.json way of mapping runtime
identifier specific packages only works if you define a hard-coded specific
runtime identifier in the program you want to run. Which is incredibly annoying
if you want to build and deploy runtime agnostic applications. E.g. if we wanted
to deploy a win-x86 + win-x64 single exe. How is that supposed to work then?
Am I getting this wrong?

Let's try a hack. Adding the RID specific package to the project. That is add
<PackageReference Include="libclang.runtime.win-x64" Version="16.0.0" /> to
the project. Run it from VS and then it now runs fine. Right, so in some ways
this works fine if we add the RID specific packages explicitly.

Still how does this work with regards to testing and if you use MSTest for both
x86 and x64 testing? Let's add a unit test project and reference the tester
console project, and copy code from above unit test in Program.cs into this
project. Now if we run the unit test with Processor Architecture for AnyCPU
Projects
set to Auto. If we change this to x86 and it will fail with
the same exception as before:

System.DllNotFoundException: Unable to load DLL 'libclang' or one of its dependencies: 
The specified module could not be found. (0x8007007E)

Interestingly, in the output we will get:

*****IMPORTANT*****
Failed to resolve libclang.
If you are running as a dotnet tool, you may need to manually copy the appropriate DLLs 
from NuGet due to limitations in the dotnet tool support. 
Please see https://github.com/dotnet/clangsharp for more details.
*****IMPORTANT*****

Note that the RID is win10-x86 in this case if logged e.g. with
log(RuntimeInformation.RuntimeIdentifier);. If we select x64 it is
win10-x64 and the test succeeds, but only because we added the RID specific
libclang.runtime.win-x64 package to the project.

In
dotnet/ClangSharp#118 (comment)
this issue is expanded upon with the comment by Tanner Gooding:

The simple fix for now is to add <RuntimeIdentifier Condition="'$(RuntimeIdentifier)' == '' AND '$(PackAsTool)' != 'true'">$(NETCoreSdkRuntimeIdentifier)</RuntimeIdentifier>
to your project
(under a PropertyGroup), unfortunately because of the way NuGet restore works,
we can't just add this to a build/*.targets in the ClangSharp nuget package.

The issue is essentially that libClang and libClangSharp just contain a
runtime.json file which point to the real packages. This was done to avoid
users needing to download hundreds of megabytes just to consume ClangSharp
(when they only need one of the native binaries most often). You can see some
more details on the sizes here: #46 (comment), noting that that is the size of
the compressed NuGet.

I had thought this was working for dev scenarios where the RID wasn't
specified, but it apparently isn't. I'll log an issue on NuGet to see if this
is something that can be improved.

I wonder whether this actually works for the case of switching processor
architecture in VS or similar? Let's try adding it to the unit tests project and
remove the RID specific package from the console project. Hence we have console
project:

<Project Sdk="Microsoft.NET.Sdk">

  <PropertyGroup>
    <OutputType>Exe</OutputType>
    <TargetFramework>net7.0</TargetFramework>
    <ImplicitUsings>enable</ImplicitUsings>
    <Nullable>enable</Nullable>
  </PropertyGroup>

  <ItemGroup>
    <PackageReference Include="ClangSharp" Version="16.0.0" />
  </ItemGroup>
  
</Project>

and unit test project:

<Project Sdk="Microsoft.NET.Sdk">

  <PropertyGroup>
    <TargetFramework>net7.0</TargetFramework>
    <ImplicitUsings>enable</ImplicitUsings>
    <Nullable>enable</Nullable>

    <IsPackable>false</IsPackable>

    <RuntimeIdentifier Condition="'$(RuntimeIdentifier)' == '' AND '$(PackAsTool)' != 'true'">$(NETCoreSdkRuntimeIdentifier)</RuntimeIdentifier>
  </PropertyGroup>

  <ItemGroup>
    <PackageReference Include="Microsoft.NET.Test.Sdk" Version="17.3.2" />
    <PackageReference Include="MSTest.TestAdapter" Version="2.2.10" />
    <PackageReference Include="MSTest.TestFramework" Version="2.2.10" />
    <PackageReference Include="coverlet.collector" Version="3.1.2" />
  </ItemGroup>

  <ItemGroup>
    <ProjectReference Include="..\Tester\Tester.csproj" />
  </ItemGroup>

</Project>

First time you then try to build this you will get a well-known error:

Assets file 'TesterUnitTests\obj\project.assets.json' doesn't have a target for 'net7.0/win-x64'. 
Ensure that restore has run and that you have included 'net7.0' in the TargetFrameworks for your project. 
You may also need to include 'win-x64' in your project's RuntimeIdentifiers.

So restore and build again. Let's try running x86 unit tests in VS. This
succeeds but the RID is actually now win10-x64, so we can now no longer run or
debug x86 tests from Visual Studio?

Let's first try to define test running via a script test-x86-x64.ps1:

#!/usr/bin/env pwsh
Write-Host "Testing Debug X86"
dotnet test --nologo -c Debug -- RunConfiguration.TargetPlatform=x86
Write-Host "Testing Release X86"
dotnet test --nologo -c Release -- RunConfiguration.TargetPlatform=x86
Write-Host "Testing Debug X64"
dotnet test --nologo -c Debug -- RunConfiguration.TargetPlatform=x64
Write-Host "Testing Release X64"
dotnet test --nologo -c Release -- RunConfiguration.TargetPlatform=x64

For x86 this will then fail with:

Test run detected DLL(s) which would use different framework and platform versions. Following DLL(s) do not match current settings, which are .NETCoreApp,Version=v7.0 framework and X86 platform.
TesterUnitTests.dll would use Framework .NETCoreApp,Version=v7.0 and Platform X64.

again this isn't great. We need to be able to run both x64 and x86 without
having to go through hoops.

Perhaps if we add both win-x64 and win-x86 to a RuntimeIdentifiers
property instead? So change

    <RuntimeIdentifier Condition="'$(RuntimeIdentifier)' == '' AND '$(PackAsTool)' != 'true'">$(NETCoreSdkRuntimeIdentifier)</RuntimeIdentifier>
    <RuntimeIdentifiers>win-x64;win-x86</RuntimeIdentifiers>

then run test-x86-x64.ps1. Now everything fails with the same exception:

System.DllNotFoundException: Unable to load DLL 'libclang' or one of its dependencies: 
The specified module could not be found. (0x8007007E)

According to
https://learn.microsoft.com/en-us/dotnet/core/project-sdk/msbuild-props#runtimeidentifiers
I should have defined the RIDs correctly. An example from there is:

<PropertyGroup>
  <RuntimeIdentifiers>win10-x64;osx.10.11-x64;ubuntu.16.04-x64</RuntimeIdentifiers>
</PropertyGroup>

Okay, perhaps running tests then need to be done differently and not with
the RunConfiguration.TargetPlatform property? Let's try to run the tests
with --runtime instead in a new script test-x86-x64-rid.ps1:

#!/usr/bin/env pwsh
Write-Host "Testing Debug win-x86"
dotnet test --nologo -c Debug --runtime win-x86
Write-Host "Testing Release win-x86"
dotnet test --nologo -c Release --runtime win-x86
Write-Host "Testing Debug win-x64" 
dotnet test --nologo -c Debug --runtime win-x64
Write-Host "Testing Release win-x64"
dotnet test --nologo -c Release --runtime win-x64

Then the tests succeed, albeit with the annoying warnings below.

C:\Program Files\dotnet\sdk\7.0.400-preview.23274.1\Sdks\Microsoft.NET.Sdk\targets\Microsoft.NET.Sdk.targets(1142,5): 
warning NETSDK1179: One of '--self-contained' or '--no-self-contained' options are required when '--runtime' is used. [TesterUnitTests\TesterUnitTests.csproj]
C:\Program Files\dotnet\sdk\7.0.400-preview.23274.1\Sdks\Microsoft.NET.Sdk\targets\Microsoft.NET.Sdk.targets(1142,5): 
warning NETSDK1179: One of '--self-contained' or '--no-self-contained' options are required when '--runtime' is used. [Tester.csproj]

why do I need to specify whether to be self-contained or not when I am just
running tests? I am not publishing?

And are the tests really running x86 as expected? To test this I add two simple test:

    [TestMethod]
    public void X86() => Assert.AreEqual("win10-x86", RuntimeInformation.RuntimeIdentifier);
    [TestMethod]
    public void X64() => Assert.AreEqual("win10-x64", RuntimeInformation.RuntimeIdentifier);

and run the tests again. On win-x86 the X64 test fails as expected:

Assert.AreEqual failed. Expected:<win10-x64>. Actual:<win10-x86>.

and vice versa on win-x64:

Assert.AreEqual failed. Expected:<win10-x86>. Actual:<win10-x64>.

so at least that works as expected.

Let's try running these tests from Visual Studio again. First, by setting
processor architecture to x86. All tests except x86 fail, so this does switch the
runtime identifier to win10-x86, but it does not fix the libclang problem.

System.DllNotFoundException: Unable to load DLL 'libclang' or one of its dependencies: 
The specified module could not be found. (0x8007007E)

so even though RIDs are now specified this doesn't work when running tests from
VS? Switching to x64 in VS and then only X64 test passes, and still the
libclang dll cannot be found, so now this doesn't work either. The difference
apparently being there is now multiple RIDs, not just one.

Only way I think this can then be resolved is to actually explicitly add those
RID specific runtime packages after all then so console project looks like:

<Project Sdk="Microsoft.NET.Sdk">

  <PropertyGroup>
    <OutputType>Exe</OutputType>
    <TargetFramework>net7.0</TargetFramework>
    <ImplicitUsings>enable</ImplicitUsings>
    <Nullable>enable</Nullable>
  </PropertyGroup>

  <ItemGroup>
    <PackageReference Include="libtorch-cuda-11.7-win-x64" Version="2.0.1.1" />-->
    <PackageReference Include="ClangSharp" Version="16.0.0" />
    <PackageReference Include="libclang.runtime.win-x64" Version="16.0.6" />
    <PackageReference Include="libclang.runtime.win-x86" Version="16.0.6" />
  </ItemGroup>
  
</Project>

Re-running the unit tests and now libclang can be loaded and that test
succeeds. Let's try command line too and it's the same.

So after all this, it seems like the runtime.json way of packaging native
libraries has it's set of challenges, you basically end up having explicitly add
the RID specific packages anyway if you target multiple RIDs. In the process you
then end up implicitly forcing the Any CPU build to no longer be frame dependent
but self-contained? This is all very confusing and hard to understand and not
the least convey to other developers.

@tannergooding
Copy link
Member

This entire space has a large number of issues and there isn't any good or "official" way to do things. Even runtime.json is itself a largely undocumented feature.

ClangSharp is doing it the way it is primarily because of NuGet package size limits, but also because no one wants to download a single 256MB or larger package when they only need a 32MB subset of it.

Multiple issues, many of which you linked to in the OP, exist that track the general problem space.

@jkotas
Copy link
Member

jkotas commented Jul 8, 2023

Thank you for filling issue. I hope that the volume of these issues will make us to do something about the native dependencies packaging scenario. I agree that the current experience is very poor.

the runtime.json trick does not appear to work when running unit tests from inside Visual Studio

This looks like a bug to me. https://github.com/microsoft/vstest/ would be a better place to discuss this specific issue.

I read somewhere (can't find or remember where) that for .NET 8 it is considered to force a specific RID on build?

#23540 is the main tracking issue for this change in the default behavior. The change was mentioned in .NET blog posts where you have probably seen it. It addresses the confusing coupling of RID-specific and self-contained that you have touched on.

@jkotas
Copy link
Member

jkotas commented Jul 8, 2023

@mhutch We have discussed the poor experience of using NuGet to distribute native dependencies some time ago. Do you have any updates that you can share?

@nietras
Copy link
Author

nietras commented Jul 19, 2023

@jkotas @tannergooding quick question, when using runtimes/<RID>/native defined packages the native libraries are copied to this folder in output (on build, not publish), to then actually be able to run and have these libraries (or their dependencies) be loaded it appears I have to manually add this directory to the PATH environment variable (before these are loaded), does that seem right?

It does not appear to be needed for libclang so this might be specific for the library I am using, so the question might be does .NET in any way setup or ensure the runtimes/<RID>/native directory is added to path or dll directories (e.g. Set/AddDllDirectory)? Note that AddDllDirectory did not actually work for this library. PATH was the only thing I could get working. And it's when it tries to load it's dependencies and so on.

When I use NativeLibrary.Load with DllImportSearchPath.SafeDirectories then AddDllDirectory works, but I am not in charge of loading the native library here. In any case, it seems one has to do something here, and my issue is then I have to "guess" on which "RIDs" to add to PATH. Can't just use RuntimeInformation.RuntimeIdentifier directly (i.e. it is win10-x64). And there is some clear guidance saying one should not try to parse or break up the runtime identifier one self... CORRECTION: AddDllDirectory not needed when manually loading via NativeLibrary.Load. But as I say I am not in charge of loading these library dependencies, they are loaded from initial native library. How is one to then ensure the correct RIDs are added to path or similar? (Can't just use RuntimeInformation.RuntimeIdentifier directly (i.e. it is win10-x64, but library is in win-x64).

@jkotas
Copy link
Member

jkotas commented Jul 20, 2023

If there are multiple native libraries that depend on each other, it is up to them to make that work. Linking these libraries with the correct /DEPENDENTLOADFLAG is the best option. DllImportSearchPath.UseDllDirectoryForDependencies and AddDllDirectory in the calling code work too. I would stay away from modifying PATH to make this work.

@jkotas
Copy link
Member

jkotas commented Jul 20, 2023

How is one to then ensure the correct RIDs are added to path or similar?

You can use the list of paths from AppDomain.CurrentDomain.GetData(“NATIVE_DLL_SEARCH_DIRECTORIES”) to get the list of directories where native libraries are located.

@nietras
Copy link
Author

nietras commented Jul 21, 2023

@jkotas thanks for the replies.

Linking these libraries with the correct /DEPENDENTLOADFLAG is the best option.

I am not the builder of these libraries, merely the packager. Hence, have no control over linker options. Or library behavior.

DllImportSearchPath.UseDllDirectoryForDependencies and AddDllDirectory

I am not directly in charge of loading these libraries and would very much like to avoid it. The situation should be somewhat like the below. Where native libs are in runtimes/RID/native where RID could be different. Problem is NtvLibA loads dependent libraries maybe manually. I have no control over this.

flowchart LR
    A[MgdExe] -->|Uses| B[MgdLib]
    B -->|P/Invoke| C(NtvLibA)
    C -->|LoadLibrary| D[NtvLibB]
Loading

Since AddDllDirectory does not appear to cascade this does not work for this case (without manually loading the native dlls myself). SetDllDirectory does but this only allows one directory and has the well known issues of overriding previous calls.

Hence, I am as far as I can tell I am left with just one option, changing PATH. There are some mentions of changing app manifest but could not find any resource on how this could work for these native libraries, if that is an option happy to hear about that? But that would then also be a Windows only solution.

Links
https://stackoverflow.com/questions/44588618/setdlldirectory-does-not-cascade-so-dependency-dlls-cannot-be-loaded

@nietras
Copy link
Author

nietras commented Jul 21, 2023

Actually, AddDllDirectory combined with SetDefaultDllDirectories(LOAD_LIBRARY_SEARCH_DEFAULT_DIRS) appears to work. (Had some issue with manifest getting in the way). Perhaps, the best option so far.

@nietras
Copy link
Author

nietras commented Jul 21, 2023

For a published WPF app win-x64 for example, we still have an issue around WPF native dlls, that have been moved to a sub-directory, having to be manually loaded at program start or app will crash with DllNotFoundException this is despite having setup AddDllDirectory and SetDefaultDllDirectories and I am wondering why this is? (Perhaps a bit off topic but relates to native library packaging and dependencies).

        // Try manually loading the .NET WPF native library dependencies
        TryManuallyLoad("vcruntime140_cor3");
        TryManuallyLoad("wpfgfx_cor3");
        TryManuallyLoad("PresentationNative_cor3");
        TryManuallyLoad("D3DCompiler_47_cor3");

@nietras
Copy link
Author

nietras commented Jul 21, 2023

Sorry to spam and bother you guys again, but I have yet another issue that I am scratching my head over. In the application (WPF - .NET 6) where this is to be used, we also have a dependency on an old .NET Fx assembly that is situated in the GAC. We find and load this via fusion i.e. we load

        var fusionFullPath = Environment.Is64BitProcess
            ? @"C:\Windows\Microsoft.NET\Framework64\v4.0.30319\fusion.dll"
            : @"C:\Windows\Microsoft.NET\Framework\v4.0.30319\fusion.dll";
        NativeLibrary.Load(fusionFullPath);

then use something like:

    /// <summary>
    /// Gets an assembly path from the GAC given a partial name.
    /// </summary>
    /// <param name="name">An assembly partial name. May not be null.</param>
    /// <returns>
    /// The assembly path if found; otherwise null;
    /// </returns>
    public static string GetAssemblyPath(string name)
    {
        if (name == null)
        { throw new ArgumentNullException(nameof(name)); }

        var hr = CreateAssemblyCache(out var assemblyCache, 0);
        if (hr >= 0)
        {
            var assemblyInfo = new AssemblyInfo();
            assemblyInfo.cchBuf = 1024; // should be fine...
            assemblyInfo.currentAssemblyPath = new string('\0', assemblyInfo.cchBuf);

            hr = assemblyCache.QueryAssemblyInfo(0, name, ref assemblyInfo);
            if (hr >= 0)
            {
                return assemblyInfo.currentAssemblyPath;
            }
        }
        return null;
    }

    [ComImport, InterfaceType(ComInterfaceType.InterfaceIsIUnknown), Guid("e707dcde-d1cd-11d2-bab9-00c04f8eceae")]
    interface IAssemblyCache
    {
        void Reserved0();

        [PreserveSig]
        int QueryAssemblyInfo(int flags, [MarshalAs(UnmanagedType.LPWStr)] string assemblyName, ref AssemblyInfo assemblyInfo);
    }

    [StructLayout(LayoutKind.Sequential)]
    struct AssemblyInfo
    {
        public int cbAssemblyInfo;
        public int assemblyFlags;
        public long assemblySizeInKB;
        [MarshalAs(UnmanagedType.LPWStr)]
        public string currentAssemblyPath;
        public int cchBuf; // size of path buf.
    }

    // On .NET 5+ we get the following:
    // System.DllNotFoundException: Unable to load DLL 'fusion.dll' or one of its dependencies: The specified module could not be found. (0x8007007E)
    // https://github.com/dotnet/core/issues/3048
    // To fix this we use NativeLibrary.Load in the static constructor above.
    [DllImport("fusion.dll")]
    static extern int CreateAssemblyCache(out IAssemblyCache ppAsmCache, int reserved);

to find the path of that .NET assembly. We then override:

AppDomain.CurrentDomain.AssemblyResolve 

to handle this. This works fine and has worked without issue.

But after using SetDefaultDllDirectories(LOAD_LIBRARY_SEARCH_DEFAULT_DIRS) and AddDllDirectory this no longer works and the GAC assembly can no longer be loaded. It just fails. Going back to changing the PATH environment variable and it works. Why is that?

This is all very involved, but such is the real-world of industrial computer vision/AI where we have a lot of dependencies that are often out of our control. External code might be old, might be mixed-mode assemblies and so on.

@nietras
Copy link
Author

nietras commented Jul 26, 2023

@jkotas it seems to me that NATIVE_DLL_SEARCH_DIRECTORIES is not used when an app is published as self-contained is that correct?

I still have not found a solution above. Problem is calling SetDefaultDllDirectories will completely disrupt the normal Dynamic link library search order which then also means that directories added to PATH won't be searched. Since the mixed-mode assembly is third party library that puts its native dependencies in multiple directories found via PATH env.var. this then means this cannot be loaded.

Using PATH cannot be used because for some reason some Windows installations have one of the native libraries in C:\Windows\system32! And that is searched before PATH. (this is onnxruntime.dll).

Hence, I am stuck in trying to find a good solution. SetDllDirectory would be good, since this is added before C:\Windows\system32 and this still uses normal search order and hence propagates down to PATH if not found before. But this only allows one directory, is global and so on. This is what we have used for years but know with nuget packages that ship native libraries per runtimes/<RID>/native this means more directories than one when framework-dependent.

Using NATIVE_DLL_SEARCH_DIRECTORIES only works for .NET loaded native libs and only works if not published/self-contained. This still leaves loading transitive native libraries.

The more I read the more confused I get here. deps.json could be an option but docs are not understandable to me.

@jkotas
Copy link
Member

jkotas commented Jul 26, 2023

it seems to me that NATIVE_DLL_SEARCH_DIRECTORIES is not used when an app is published as self-contained is that correct?

That is not correct. NATIVE_DLL_SEARCH_DIRECTORIES is used for self-contained apps.

Self-contained apps are always RID specific. Portable self-contained apps do not exist.

RID specific apps (including self-contained RID specific apps) should have everything in the same directory. They should not be hitting any of the problems with dependencies spread over multiple directories.

@nietras
Copy link
Author

nietras commented Jul 27, 2023

That is not correct. NATIVE_DLL_SEARCH_DIRECTORIES is used for self-contained apps.

Should I then not be able to modify this first thing in startup (Main) like (where archDir is an absolute path to a directory containing the dlls):

AppDomain.CurrentDomain.SetData("NATIVE_DLL_SEARCH_DIRECTORIES", archDir);

and then per Unmanaged (native) library probing is should load from that before anything else? This does not appear to work (for self-contained), it will load from C:\Windows\System32


  1. Check if the supplied library name represents an absolute or relative path.

  2. If the name represents an absolute path, use the name directly for all subsequent operations. Otherwise, use the name and create platform-defined combinations to consider. Combinations consist of platform specific prefixes (for example, lib) and/or suffixes (for example, .dll, .dylib, and .so). This is not an exhaustive list, and it doesn't represent the exact effort made on each platform. It's just an example of what is considered.

  3. The name and, if the path is relative, each combination, is then used in the following steps. The first successful load attempt immediately returns the handle to the loaded library.

    • Append it to each path supplied in the NATIVE_DLL_SEARCH_DIRECTORIES property and attempt to load.

    • If xref:System.Runtime.InteropServices.DefaultDllImportSearchPathsAttribute is either not defined on the calling assembly or p/invoke or is defined and includes DllImportSearchPath.AssemblyDirectory, append the name or combination to the calling assembly's directory and attempt to load.

    • Use it directly to load the library.

  4. Indicate that the library failed to load.


Self-contained apps are always RID specific. Portable self-contained apps do not exist.
RID specific apps (including self-contained RID specific apps) should have everything in the same directory. They should not be hitting any of the problems with dependencies spread over multiple directories.

I understand this and perhaps I did not explain it very well. We need to support BOTH framework-dependent deployment (incl. local debugging in VS) dotnet build AND self-contained deployment dotnet publish.

And as I tried to write for self-contained deployment it is a requirement (from us/customers etc.) that the native libraries are located/moved in a sub-directory (e.g. x64) from the exe itself. This is not a new requirement we have been doing this for years. We have +3 GB native library dependencies. Additionally, we have been deploying in a way that allows us to put multiple exes in one location incl. both 32-bit/64-bit executables in same directory. Native libraries then in sub-directories.

In any case, we naturally must support developers being able run the application from visual studio or whatever with F5 for debugging. With nuget packages following runtimes/<RID>/native this means dlls can be spread out on many directories now (for framework-dependent dotnet build). This means SetDllDirectory won't work for that scenario (due to transitive dependencies). This has been our go to solution before. But requires all dlls in one location and that nothing else calls this with a different path, which is pretty brittle.

Perhaps to make this more clear I have tried to show the layout for the different scenarios below.

Framework-dependent (dotnet build)

APP.exe                        // AnyCPU, no Platform, no RuntimeIdentifier
runtimes/win-x64/native/*.dll  // From new nuget packages
runtimes/win-x86/native/*.dll
x64/*.dll                      // From existing/"legacy" nuget packages using `.target`
x86/*.dll

Self-contained (dotnet publish -r RID --self-contained)

APP-win-x64.exe                // RID specific
APP-win-x86.exe                // RID specific
TOOL-win-x86.exe               // RID specific
x64/*.dll                      // Consolidated dlls for publishing
x86/*.dll

This is of course a demonstrative example. Both of these "layouts" can be used both locally and on production machines. For many different reasons. The framework-dependent scenario cannot be supported with just SetDllDirectory since this fails for transitive dependencies.

Note this is all Windows currently, but given I am also trying to ship some of our open source dependencies in nuget packages I am also trying to play nice in the community and publish these in a way that could be used by all. Incl. trying to support the different kinds of deployments/usages so it just works.

Hence, I am trying to weigh and understand options here. Note we do not have full control over all our dependencies (one being ONNX runtime for example) and hence not on DllImport definitions, nor on how some native libraries are loading other native libraries. I fully understand this is somewhat outside the purview of .NET as such, but the way the initial native library is loaded and with which options can help here.

@jkotas
Copy link
Member

jkotas commented Jul 27, 2023

NATIVE_DLL_SEARCH_DIRECTORIES property is considered read-only. Updating it using AppDomain.SetData won't be respected.

https://learn.microsoft.com/en-us/dotnet/core/dependency-loading/loading-unmanaged#pinvoke-load-library-algorithm is a more detailed description of the native library loading algorithm. You should be able to call NativeLibrary.SetDllImportResolver for the assemblies that you want to control loading the native dependencies for. It will give you full control over how the native dependency is loaded in the callback: You can call LoadLibraryEx with any flags, you can call AddDllDirectory/RemoveDllDirectory, you can return handle that you have pre-loaded earlier, ... .

@nietras
Copy link
Author

nietras commented Jul 28, 2023

NATIVE_DLL_SEARCH_DIRECTORIES property is considered read-only. Updating it using AppDomain.SetData won't be respected.

Well that explains it of course 😅 Shouldn't it throw on set then or be documented? https://learn.microsoft.com/en-us/dotnet/core/dependency-loading/default-probing mentions that .deps.json can be used then, but are there any examples on that?

call NativeLibrary.SetDllImportResolver for the assemblies that you want to control loading the native dependencies for. It will give you full control over how the native dependency is loaded in the callback: You can call LoadLibraryEx with any flags, you can call AddDllDirectory/RemoveDllDirectory, you can return handle that you have pre-loaded earlier, ... .

Yes I have been and am looking into this. It's just seems incredibly complicated compared to setting:

SetDllDirectory(Environment.Is64BitProcess ? @"x64" : @"x86");

which is basically what we did before, and which works in concert with "normal search order". I am concerned with ending up with the extremes that TorchSharp had to go for to get a good out-of-the-box experience (something I applaud), see:

https://github.com/dotnet/TorchSharp/blob/92822a0c57da6b51681bba3b1a8aaebb4086b5ab/src/TorchSharp/Torch.cs#L78-L254

Having to define this per assembly effectually couples a lot of things together and is harder to reason about and configure for different kinds of use cases. And it doesn't solve transitive library dependencies. Don't want to load things that aren't necessarily needed and so on.

In many ways, what appears missing to me is proper OS (Windows) support for adding directories early in search order that do not then completely replace the search order like SetDefaultDllDirectories appears to do. But that's a pipe dream or concern.

Just to recap my own thoughts here, there are the following options:

  • PATH environment variable, change first thing in Main - won't work for incorrect native libraries in C:\Windows\System32 like onnxruntime.dll since last in search order.
  • SetDllDirectory - works wonderfully since early in search order (e.g. before System32) but only if all local dll dependencies in one directory which isn't (currently) the case for dotnet build/VS/framework-dependent usage. Perhaps could consolidate on runtimes/win-x64/native for dotnet build and x64 for dotnet publish (with x86 for 32-bit), but requires ALL nuget packages with native libraries to follow same RID. AND that SetDllDirectory only called with same directory for all code everywhere.
  • SetDefaultDllDirectories (+ AddDllDirectory) - overtakes normal library loading and dll search order, which means transitive native dependencies present only in PATH are not loaded e.g. for third-party SDKs (e.g. mixed-mode assembly). In principle one could add each PATH directory with AddDllDirectory but AddDllDirectory does not guarantee anything with regards to order. PATH environment variable does as far as I know.
  • AddDllDirectory - without SetDefaultDllDirectories only works for "first order" dependencies where one can control search path behavior.
  • NATIVE_DLL_SEARCH_DIRECTORIES - read-only and hence not possible for self-contained/published scenario with dlls in sub-directory. Can be used to find probing directories when in framework-dependent mode, though, and add these with AddDllDirectory or similar.
  • NativeLibrary.SetDllImportResolver - again only works for native library dependencies that .NET is in control of loading. Requires setting up for all possible assemblies with p/invoke or similar. Possibly forcing load of assemblies (since have to get Assembly for that to call SetDllImportResolver) that might not be needed or having to set this up at exact usages of this. Also for unit test scenarios or similar. (this goes for all of course).
  • LoadLibraryEx - manually load libraries even ones that may or may not be actually used if that is hard to determine up front (note some native libraries have a "plugin" model). TorchSharp approach. With + 3GB dlls seems like a waste and incredibly coupled, brittle, hard to maintain etc.

Or any combination of the above. This is harder than it should be. 🙈

@jkotas
Copy link
Member

jkotas commented Jul 28, 2023

Shouldn't it throw on set then or be documented?

It is documented in AppDomain.SetData: The cache automatically contains predefined system entries that are inserted when the application domain is created. You cannot insert or modify system entries with this method. A method call that attempts to modify a system entry has no effect; the method does not throw an exception.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area-External native-assets Issues related to how the SDK should deal with Native assets untriaged Request triage from a team member
Projects
None yet
Development

No branches or pull requests

5 participants