Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

crossgen2 fails with 139 exit code on Arm64 #4007

Closed
mthalman opened this issue Jan 23, 2024 · 46 comments
Closed

crossgen2 fails with 139 exit code on Arm64 #4007

mthalman opened this issue Jan 23, 2024 · 46 comments
Assignees
Labels
area-upstream-fix Needs a change in a contributing repo ops-monitor Issues created/handled by the source build monitor role

Comments

@mthalman
Copy link
Member

This is causing a failure in the VMR for .NET 9 Preview 1 when building the installer repo. It fails in the Ubuntu2204Arm64_Offline_MsftSdk_arm64 job (example build [internal link]).

Error:

/vmr/src/installer/src/redist/targets/Crossgen.targets(163,5): error MSB6006: "crossgen2" exited with code 139. [/vmr/src/installer/src/redist/redist.csproj]

Here is a binlog:
sourcebuild.installer.zip

We need this resolved for .NET 9 Preview 1.

@mthalman mthalman added the ops-monitor Issues created/handled by the source build monitor role label Jan 23, 2024
Copy link

I couldn't figure out the best area label to add to this issue. If you have write-permissions please help me learn by adding exactly one area label.

@mthalman
Copy link
Member Author

@jkoritzinsky - Can you help investigate? This was working recently and regressed.

@jkoritzinsky
Copy link
Member

I can try but I don't have a Linux ARM64 machine. Do you know the first VMR build that this broke in? I'll see if I can narrow down where the failure from.

@jkoritzinsky
Copy link
Member

It looks like the most recent passing build was https://dev.azure.com/dnceng/internal/_build/results?buildId=2359219&view=results and the first failing build was https://dev.azure.com/dnceng/internal/_build/results?buildId=2359821&view=results.

The runtime range of commits between these two is dotnet/runtime@8accd80...e99836a

The commits that pop out to me as possibly causing crossgen2 crashes are:

I think the second is more likely than the first.

cc: @EgorBo

@EgorBo
Copy link
Member

EgorBo commented Jan 23, 2024

Checking now

@EgorBo EgorBo self-assigned this Jan 23, 2024
@mthalman
Copy link
Member Author

It's getting a different error now and failing with error code 134:

/vmr/src/installer/src/redist/targets/Crossgen.targets(172,5): *** stack smashing detected ***: terminated [/vmr/src/installer/src/redist/redist.csproj]

binlog

Failing build (internal link)

@mthalman
Copy link
Member Author

mthalman commented Jan 24, 2024

It's consistently failing but not always failing with the same error. The two errors mentioned in this issue above are all I've seen so far.

@mthalman
Copy link
Member Author

mthalman commented Jan 24, 2024

Interestingly, the build succeeded in this recent run (internal link). But I'm not sure I trust that the issue is resolved. Is it perhaps an intermittent issue? The source changes that exist between the two builds (failing and passing) don't include any runtime changes at all.

@jkoritzinsky
Copy link
Member

Yes, it is possible that the failure is intermittent given your sample with the 134 exit code. It's likely a native corruption somewhere in crossgen2. If this is using a live crossgen2 (which I think it is), then it could be anywhere in the JIT, GC, or CoreCLR runtime. This also means it could have appeared before the first instance during the time frame where the build was failing before getting to installer, which makes this harder to diagnose...

@mthalman
Copy link
Member Author

There's been 3 other builds since that one that passed and they've all failed.

@agocke
Copy link
Member

agocke commented Jan 25, 2024

@MichaelSimons is there an easy way to test a build with a runtime change reverted? We might see some benefit at just looking at a build with each of those two commits that Jeremy identified reverted

@EgorBo

This comment was marked as outdated.

@BruceForstall
Copy link
Member

@mthalman I see you're testing a revert of @EgorBo 's change here: https://dev.azure.com/dnceng/internal/_build?definitionId=1219. Could you also test a revert of dotnet/runtime@205ef03 in parallel just to be comprehensive in our checking? (It seems unlikely to be the problem; it should have only deleted unused code)

@BruceForstall
Copy link
Member

Oh, I guess @EgorBo triggered those runs; @EgorBo maybe you could trigger runs with Jakob's change reverted?

@oleksandr-didyk
Copy link
Contributor

oleksandr-didyk commented Jan 31, 2024

The latest main VMR is red on a crossgen failure that I didn't find in this issue, so linking it in case its related or might help (source)

/vmr/src/installer/src/redist/targets/Crossgen.targets(181,5): error MSB6006: "crossgen2" exited with code 139. [/vmr/src/installer/src/redist/redist.csproj]
/vmr/src/installer/src/redist/targets/Crossgen.targets(181,5): error MSB4018: The "Crossgen" task failed unexpectedly. [/vmr/src/installer/src/redist/redist.csproj]
/vmr/src/installer/src/redist/targets/Crossgen.targets(181,5): error MSB4018: System.IO.IOException: Directory not empty : '/tmp/xinmgcn5.ilw' [/vmr/src/installer/src/redist/redist.csproj]
/vmr/src/installer/src/redist/targets/Crossgen.targets(181,5): error MSB4018:    at System.IO.FileSystem.RemoveEmptyDirectory(String fullPath, Boolean topLevel, Boolean throwWhenNotEmpty) [/vmr/src/installer/src/redist/redist.csproj]
/vmr/src/installer/src/redist/targets/Crossgen.targets(181,5): error MSB4018:    at Microsoft.DotNet.Build.Tasks.Crossgen.Execute() in /_/src/installer/src/core-sdk-tasks/Crossgen.cs:line 81 [/vmr/src/installer/src/redist/redist.csproj]
/vmr/src/installer/src/redist/targets/Crossgen.targets(181,5): error MSB4018:    at Microsoft.Build.BackEnd.TaskExecutionHost.Microsoft.Build.BackEnd.ITaskExecutionHost.Execute() [/vmr/src/installer/src/redist/redist.csproj]
/vmr/src/installer/src/redist/targets/Crossgen.targets(181,5): error MSB4018:    at Microsoft.Build.BackEnd.TaskBuilder.ExecuteInstantiatedTask(ITaskExecutionHost taskExecutionHost, TaskLoggingContext taskLoggingContext, TaskHost taskHost, ItemBucket bucket, TaskExecutionMode howToExecuteTask) [/vmr/src/installer/src/redist/redist.csproj]

@EgorBo
Copy link
Member

EgorBo commented Jan 31, 2024

Long story short: the issue was handled offline and it turned out that the issue was introduced in dotnet/runtime#96969 and the fix (revert) is dotnet/runtime#97679 (although, it's probably reverted only from release/9.0p1 branch?)

@oleksandr-didyk
Copy link
Contributor

although, it's probably reverted only from release/9.0p1 branch?

Yes, and the latest VMR builds for release/9.0 are green (source).

main is still failing in the same Ubuntu leg as before with the exception linked above. Since the original issue was about preview-1 and the exceptions seem to be different, should we open a different one for tracking the failure in main?

@mthalman
Copy link
Member Author

mthalman commented Feb 1, 2024

although, it's probably reverted only from release/9.0p1 branch?

Yes, and the latest VMR builds for release/9.0 are green (source).

main is still failing in the same Ubuntu leg as before with the exception linked above. Since the original issue was about preview-1 and the exceptions seem to be different, should we open a different one for tracking the failure in main?

It's not a separate issue. The fix wasn't applied to main. See dotnet/runtime#97679 (comment)

@mthalman
Copy link
Member Author

mthalman commented Feb 1, 2024

Should be fixed by dotnet/runtime#97817

@MichaelSimons MichaelSimons added area-upstream-fix Needs a change in a contributing repo and removed untriaged labels Feb 1, 2024
@mthalman
Copy link
Member Author

I'm not sure that the changes in dotnet/runtime#97817 fixed the issue. Either that or there's another related issue. This change finally flowed into the VMR on Friday and the build right after it came in failed in the Arm64 build leg when calling crossgen2. This time it was during the _CreateR2RImages target. Related build (internal link). The build after it passed however. But we previously saw inconsistent results for this issue as well. I'm running a full build again to get a another data point.

@mthalman
Copy link
Member Author

The new build passed as well. I'm going to close this and hope things are ok. I'll reopen if it pops up again.

@mthalman
Copy link
Member Author

@janvorli - Can you please investigate? A fix is needed for Preview 2.

@janvorli
Copy link
Member

@mthalman can you please share repro instructions with me? I am not familiar with the source build and while I have tried to mimic what I've seen in the ci log, I am not sure if I did it right.
I am building it on a physical arm64 linux device with Ubuntu 22.04, so I am not using docker. The commands I've tried to use were:

./prep.sh
BUILD_SOURCEVERSION=220c455bc727ab0a8bdd9feb61f474a605d30339 ./build.sh --ci --clean-while-building --prepareMachine --source-only

(I have used the current main for build, so I've changed the 220c455bc727ab0a8bdd9feb61f474a605d30339 to the commit hash of my checkout)

@mthalman
Copy link
Member Author

It should be enough to just run these commands:

./prep.sh
./build.sh --sb

Depending on how much disk space you have available, you may need to run --clean-while-building which cleans up repo outputs after they're done building successfully. That cleans up binaries but leaves logs.

@janvorli
Copy link
Member

@mthalman my build keeps failing due to this:
CSC : error CS1566: Error reading resource '_framework/blazor.web.js' -- 'Could not find a part of the path '/home/janvorli/git/dotnet/src/aspnetcore/src/Components/Web.JS/dist/Release/blazor.web.js'.' [/home/janvorli/git/dotnet/src/aspnetcore/src/Components/Endpoints/src/Microsoft.AspNetCore.Components.Endpoints.csproj] [/home/janvorli/git/dotnet/repo-projects/aspnetcore.proj]

I've tried several times, even with git clean -xdf in between. Any idea what is wrong?

@mthalman
Copy link
Member Author

That looks related to not having the latest source. Have you pulled latest from main branch of the VMR? That error was relevant in an older commit but was fixed by dotnet/installer@90ccc0c in dotnet/installer#18641.

Oh, and you'll also need to run it in with the --online switch to avoid #4129.

@janvorli
Copy link
Member

I did a fresh clone today, I didn't have that repo locally. I am on this commit:

commit dc4df39f794e6b9c565d721986b71a056288b812 (HEAD -> main, origin/main, origin/HEAD)
Merge: 55519d6838 85d7a9e37c
Author: dotnet-maestro[bot] <dotnet-maestro[bot]@users.noreply.github.com>
Date:   Fri Feb 16 13:18:50 2024 +0000

    [Recursive sync] installer / 6d1abb7 → 9b54914

    Updated repositories:
      - installer / 6d1abb7 → 9b54914
        https://github.com/dotnet/installer/compare/6d1abb74140d80b81d4f482a49332d55f89468d7..9b54914494e95df2d6e72803dfe7d2193cdebb22
      - aspire / 8ec92cb → b2107ab
        https://github.com/dotnet/aspire/compare/8ec92cbc5fbcba7a677fb52aaa4b0118f1ed17f4..b2107ab9a2b96b609f16997525f694f587e3018e

    [[ commit created by automation ]]

@janvorli
Copy link
Member

I've synced to the latest state now again and I am retrying.

@mthalman
Copy link
Member Author

Interesting. Well that might be another issue with Arm then. Since this is failing in the build of the aspnetcore repo, it means you passed the point of the crossgen failure in the runtime repo (runtime builds before aspnetcore). So this means you're not repro'ing the issue. Be aware that this is an intermittent issue so you may need to build several times to get it. To get a fresh VMR to start over from, I execute git clean -fdx and git checkout -- .

@janvorli
Copy link
Member

Ok, I'll ignore that error then and just retry a clean build.

@agocke
Copy link
Member

agocke commented Feb 16, 2024

I don't think this has anything to do with R2R itself. The failure in the linked issue is due to an unhandled exception. The error message is:

      Unhandled exception. System.CommandLine.CommandLineException: Target OS 'centos' is not supported
         at System.CommandLine.Helpers.GetTargetOS(String) in /_/src/runtime/src/coreclr/tools/Common/CommandLineHelpers.cs:line 84
         at System.CommandLine.CliArgument`1.<>c__DisplayClass8_0.<set_CustomParser>b__0(ArgumentResult argumentResult, Object& parsedValue)
         at System.CommandLine.Parsing.ArgumentResult.ValidateAndConvert(Boolean)
         at System.CommandLine.Parsing.CommandResult.ValidateOptions(Boolean)
         at System.CommandLine.Parsing.CommandResult.Validate(Boolean)
         at System.CommandLine.Parsing.ParseOperation.Validate()
         at System.CommandLine.Parsing.ParseOperation.Parse()
         at System.CommandLine.Parsing.CliParser.Parse(CliCommand, IReadOnlyList`1, String , CliConfiguration )
         at System.CommandLine.CliConfiguration.Invoke(String[])
         at ILCompiler.Program.Main(String[]) in /_/src/runtime/src/coreclr/tools/aot/ILCompiler/Program.cs:line 753

@janvorli
Copy link
Member

@agocke the Ubuntu run seems to be failing with 139 during crossgen2 run though.

@agocke
Copy link
Member

agocke commented Feb 16, 2024

Ah, you're right I was lookihg at a different log

@mthalman
Copy link
Member Author

I don't think this has anything to do with R2R itself. The failure in the linked issue is due to an unhandled exception. The error message is:

      Unhandled exception. System.CommandLine.CommandLineException: Target OS 'centos' is not supported
         at System.CommandLine.Helpers.GetTargetOS(String) in /_/src/runtime/src/coreclr/tools/Common/CommandLineHelpers.cs:line 84
         at System.CommandLine.CliArgument`1.<>c__DisplayClass8_0.<set_CustomParser>b__0(ArgumentResult argumentResult, Object& parsedValue)
         at System.CommandLine.Parsing.ArgumentResult.ValidateAndConvert(Boolean)
         at System.CommandLine.Parsing.CommandResult.ValidateOptions(Boolean)
         at System.CommandLine.Parsing.CommandResult.Validate(Boolean)
         at System.CommandLine.Parsing.ParseOperation.Validate()
         at System.CommandLine.Parsing.ParseOperation.Parse()
         at System.CommandLine.Parsing.CliParser.Parse(CliCommand, IReadOnlyList`1, String , CliConfiguration )
         at System.CommandLine.CliConfiguration.Invoke(String[])
         at ILCompiler.Program.Main(String[]) in /_/src/runtime/src/coreclr/tools/aot/ILCompiler/Program.cs:line 753

That was a separate issue (identified in #4111 (comment)), that has since been fixed.

@janvorli
Copy link
Member

@mthalman mystery solved. It is not a new issue. The thing is that crossgen doesn't use the current build of runtime, but rather an older version (it uses the dotnet sdk in the .dotnet folder). In this case, it used version 9.0.0-preview.2.24080.1 built from commit d40c654c274fe4f4afe66328f0599130f3eb2ea6 that predates the fix in main by two days.

@mthalman
Copy link
Member Author

Ok, I'll update the VMR to use a more recent SDK and check the results.

@ellahathaway
Copy link
Member

This is continuing to be an issue. The most recent related error that I've seen is in runtime:

/vmr/src/runtime/artifacts/bin/Crossgen2Tasks/Release/net9.0/Microsoft.NET.CrossGen.targets(357,5): error NETSDK1096: Optimizing assemblies for performance failed. You can either exclude the failing assemblies from being optimized, or set the PublishReadyToRun property to false. [/vmr/src/runtime/src/installer/pkg/sfx/Microsoft.NETCore.App/Microsoft.NETCore.App.Runtime.sfxproj]

Link to build (internal link)

@mthalman
Copy link
Member Author

mthalman commented Mar 4, 2024

The bootstrap work to resolve this is still pending.

@mthalman
Copy link
Member Author

mthalman commented Mar 8, 2024

This should be fixed by dotnet/installer#18763

@sec
Copy link

sec commented Mar 14, 2024

Hi,
I also started to hit this issue, just under FreeBSD, could I ask for some info what was the issue and/or which commit fixed that?

@mthalman
Copy link
Member Author

Reopening this because it's not yet resolved for source build in the VMR. We're still waiting for a bootstrap of the VMR onto an SDK build which contains the fix for this issue. But that's currently blocked on #4206.

@mthalman mthalman reopened this Mar 22, 2024
@mthalman
Copy link
Member Author

Hi, I also started to hit this issue, just under FreeBSD, could I ask for some info what was the issue and/or which commit fixed that?

@sec - This is the fix: dotnet/runtime#97817. But it's not enough to have the fix in the source that you're currently building. You need to build with an SDK that has this fix. That's what we're currently working towards for the VMR, to get it bootstrapped with an SDK that has this fix in it. That will allow builds to work using that SDK.

@sec
Copy link

sec commented Mar 22, 2024

@mthalman Thanks for the info. If I get it correctly, I should wait for your work to be done, then I can crosscompile SDK for FreeBSD and use it as bootstrap for preview-2, which should be fine. BUT, as looking at this, it's a bug in libunwind - which under FreeBSD is taken from ports - looking in there I've already found libunwind/libunwind#715 - this should also be fixed otherwise I will still hit this in future :)

@mthalman
Copy link
Member Author

mthalman commented Apr 1, 2024

The fix was flowed into the VMR with dotnet/installer#19145

@mthalman mthalman closed this as completed Apr 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-upstream-fix Needs a change in a contributing repo ops-monitor Issues created/handled by the source build monitor role
Projects
Archived in project
Development

No branches or pull requests

10 participants