Update mono to support shuffle for constant inputs #105299

tannergooding · 2024-07-23T07:00:36Z

Mono wasn't accelerating shuffle for anything other than byte/sbyte on x64. This was in part because doing so required detecting inputs as constant which didn't exist when that support was first brought online.

However, I had since added basic support for constants last year in #81902. This PR adds support that mirrors most of the RyuJIT Shuffle importation logic and so we can now do the right thing. As a side effect, the general support for generating such constants for Create calls when all inputs are constants was also expanded which should have some indirect benefits in other places.

Shuffle having not being accelerated is very impactful to the entire BCL and I would expect this to have a broad impact across many core benchmarks including HexConverter, XxHash, TensorPrimitives, ReverseEndianness, Base64Decoder, Guid, Matrix4x4, Quaternion, IndexOfAny, ProbablisticMap, various SpanHelpers APIs and more.

dotnet-policy-service · 2024-07-23T07:01:11Z

Tagging subscribers to this area: @lambdageek, @steveisok
See info in area-owners.md if you want to be subscribed.

src/mono/mono/mini/simd-intrinsics.c

tannergooding · 2024-07-23T14:55:58Z

No diffs for x64 is expected above since the x64 changes are MonoLLVM only.

For arm64 it should be both MonoJIT and MonoLLVM

EgorBo · 2024-07-23T15:27:32Z

@EgorBot -arm64 -mono --envvars "MONO_VERBOSE_METHOD:ShuffleBench"

using System.Numerics;
using BenchmarkDotNet.Attributes;
using System.Runtime.Intrinsics;

public class StrengthReduction
{
    Vector128<int> srcVec = Vector128.Create(1, 2, 3, 4);
    Vector128<int> dstVec;

    [Benchmark]
    public void ShuffleBench() => 
        dstVec = Vector128.Shuffle(srcVec, Vector128.Create(2, 2, 2, 2));

    [Benchmark]
    public Quaternion ConcatenateBenchmark() => 
        Quaternion.Concatenate(Quaternion.Identity, Quaternion.Identity);
}

EgorBot · 2024-07-23T15:56:23Z

Benchmark results on Arm64

BenchmarkDotNet v0.13.12, Ubuntu 22.04.4 LTS (Jammy Jellyfish)
Unknown processor
  Job-DMCBWN : .NET 9.0.0 (42.42.42.42424) using MonoVM, Arm64 ArmBase
  Job-EBPPKC : .NET 9.0.0 (42.42.42.42424) using MonoVM, Arm64 ArmBase
EnvironmentVariables=MONO_VERBOSE_METHOD=ShuffleBench

Method	Toolchain	Mean	Error	Ratio
ShuffleBench	Main	17.3048 ns	0.0015 ns	1.00
ShuffleBench	PR	0.2047 ns	0.0075 ns	0.01

ConcatenateBenchmark	Main	120.1574 ns	0.0091 ns	1.00
ConcatenateBenchmark	PR	NA	NA	?
Benchmarks with issues:
StrengthReduction.ConcatenateBenchmark: Job-EBPPKC(EnvironmentVariables=MONO_VERBOSE_METHOD=ShuffleBench, Toolchain=PR)

BDN_Artifacts.zip

tannergooding · 2024-07-23T16:15:31Z

Need to fix a MonoLLVM assert, but...

Before:

il_seq_point intr il: 0x0
 move R33 <- R32
 move R34 <- R32
 loadx_membase R35 <- [R32 + 0x10] [System.Runtime.Intrinsics.Vector128`1<int>]
 iconst R36 <- [2]
 iconst R37 <- [2]
 iconst R38 <- [2]
 iconst R39 <- [2]
 xzero R40 <- [System.Runtime.Intrinsics.Vector128`1<int>]
 insert_i4 R41 <- R40 R36
 insert_i4 R42 <- R41 R37
 insert_i4 R43 <- R42 R38
 insert_i4 R44 <- R43 R39
 il_seq_point il: 0x10, nonempty-stack
 outarg_vtretaddr R46 <- R45
 move R47 <- R46
 outarg_vt R35
 outarg_vt R44
 xcall R45 <- [System.Runtime.Intrinsics.Vector128`1<int> System.Runtime.Intrinsics.Vector128:Shuffle (System.Runtime.Intrinsics.Vector128`1<int>,System.Runtime.Intrinsics.Vector128`1<int>)]
 il_seq_point il: 0x15, nonempty-stack
 compare_imm R32 [0]
 cond_exc_eq NullReferenceException
 storex_membase [R32 + 0x20] <- R45 [System.Runtime.Intrinsics.Vector128`1<int>]
 il_seq_point il: 0x1a
 nop

After:

il_seq_point intr il: 0x0
 move R33 <- R32
 move R34 <- R32
 loadx_membase R35 <- [R34 + 0x10] [System.Runtime.Intrinsics.Vector128`1<int>]
 iconst R36 <- [2]
 iconst R37 <- [2]
 iconst R38 <- [2]
 iconst R39 <- [2]
 xzero R40 <- [System.Runtime.Intrinsics.Vector128`1<int>]
 xconst R41 <-
 il_seq_point il: 0x10, nonempty-stack
 xones R42 <-
 xop_ovr_x_x_x R43 <- R35 R42
 il_seq_point il: 0x15, nonempty-stack
 compare_imm R33 [0]
 cond_exc_eq NullReferenceException
 storex_membase [R33 + 0x20] <- R43 [System.Runtime.Intrinsics.Vector128`1<int>]
 il_seq_point il: 0x1a
 br [B1]

…tch the necessary parameter count

tannergooding · 2024-07-23T16:44:14Z

src/mono/mono/mini/simd-intrinsics.c

+		return ins;
+	}
+
+	if (is_xconst (ins) && is_const (element)) {


@matouskozak or @fanyang-mono, is there an easy way to remove MonoInst * that are known to be dead or is it perhaps handled elsewhere?

In a case like this, element will be dead because we're just folding it directly into the constant. In the case that ins is xzero or xones we will likewise be replacing it with an xconst that includes the underlying constant from element.

So in the example given above, the following IR is "dead":

iconst R36 <- [2] iconst R37 <- [2] iconst R38 <- [2] iconst R39 <- [2] xzero R40 <- [System.Runtime.Intrinsics.Vector128`1<int>]

I haven't tried it but I think MONO_REMOVE_INS (or MONO_DELETE_INS) could work for this.

…lution can function

tannergooding · 2024-07-24T19:14:27Z

This should now be ready for review.

-- Please feel free to call out anywhere I messed up the styling. Tried to follow the conventions, but it's entirely possible I made some mistakes.

EgorBo · 2024-07-24T19:18:05Z

@EgorBot -arm64 -mono --envvars "MONO_VERBOSE_METHOD:ShuffleBench"

using System.Numerics;
using BenchmarkDotNet.Attributes;
using System.Runtime.Intrinsics;

public class StrengthReduction
{
    Vector128<int> srcVec = Vector128.Create(1, 2, 3, 4);
    Vector128<int> dstVec;

    [Benchmark]
    public void ShuffleBench() => 
        dstVec = Vector128.Shuffle(srcVec, Vector128.Create(2, 2, 2, 2));

    [Benchmark]
    public Quaternion ConcatenateBenchmark() => 
        Quaternion.Concatenate(Quaternion.Identity, Quaternion.Identity);
}

EgorBot · 2024-07-24T19:48:19Z

Benchmark results on Arm64

BenchmarkDotNet v0.13.12, Ubuntu 22.04.4 LTS (Jammy Jellyfish)
Unknown processor
  Job-OKJKWC : .NET 9.0.0 (42.42.42.42424) using MonoVM, Arm64 ArmBase
  Job-CZGEXJ : .NET 9.0.0 (42.42.42.42424) using MonoVM, Arm64 ArmBase
EnvironmentVariables=MONO_VERBOSE_METHOD=ShuffleBench

Method	Toolchain	Mean	Error	Ratio
ShuffleBench	Main	17.3735 ns	0.0047 ns	1.00
ShuffleBench	PR	0.2321 ns	0.0052 ns	0.01

ConcatenateBenchmark	Main	118.2752 ns	0.0251 ns	1.00
ConcatenateBenchmark	PR	7.3540 ns	0.0022 ns	0.06

BDN_Artifacts.zip

tannergooding · 2024-07-24T19:57:22Z

Both of those numbers for MonoJIT are very good improvements:

BenchmarkDotNet v0.13.12, Ubuntu 22.04.4 LTS (Jammy Jellyfish)
Unknown processor
  Job-OKJKWC : .NET 9.0.0 (42.42.42.42424) using MonoVM, Arm64 ArmBase
  Job-CZGEXJ : .NET 9.0.0 (42.42.42.42424) using MonoVM, Arm64 ArmBase
EnvironmentVariables=MONO_VERBOSE_METHOD=ShuffleBench

Method	Toolchain	Mean	Error	Ratio
ShuffleBench	Main	17.3735 ns	0.0047 ns	1.00
ShuffleBench	PR	0.2321 ns	0.0052 ns	0.01

ConcatenateBenchmark	Main	118.2752 ns	0.0251 ns	1.00
ConcatenateBenchmark	PR	7.3540 ns	0.0022 ns	0.06

I had done some local testing for a few other scenarios and seen similar improvements to APIs using Shuffle. The IR from MONO_VERBOSE_METHOD generally looks good too, but I think there's a broader possibility to trim out some MonoInst in a later phase if they're unused. Such trimming I think is better suited for .NET 10, however.

tannergooding · 2024-07-24T20:06:45Z

For ShuffleBench the codegen is as below. There's still a few places that could be optimized further, but I don't think those are as prevalent for .NET 9 and are likely more involved/general-purpose.

Before

   0:	a9ba7bfd 	stp	x29, x30, [sp, #-96]!
   4:	910003fd 	mov	x29, sp
   8:	f9000bba 	str	x26, [x29, #16]
   c:	aa0003fa 	mov	x26, x0
  10:	d29c1000 	mov	x0, #0xe080                	// #57472
  14:	f2b6d620 	movk	x0, #0xb6b1, lsl #16
  18:	f2defa00 	movk	x0, #0xf7d0, lsl #32
  1c:	f9400011 	ldr	x17, [x0]
  20:	b4000051 	cbz	x17, 0x28
  24:	954b13d1 	bl	0x52c4f68
  28:	3dc00740 	ldr	q0, [x26, #16]
  2c:	3d800fa0 	str	q0, [x29, #48]
  30:	d2800044 	mov	x4, #0x2                   	// #2
  34:	d2800043 	mov	x3, #0x2                   	// #2
  38:	d2800042 	mov	x2, #0x2                   	// #2
  3c:	d2800041 	mov	x1, #0x2                   	// #2
  40:	4f00e400 	movi	v0.16b, #0x0
  44:	4e041c80 	mov	v0.s[0], w4
  48:	4e0c1c60 	mov	v0.s[1], w3
  4c:	4e141c40 	mov	v0.s[2], w2
  50:	4e1c1c20 	mov	v0.s[3], w1
  54:	3d8013a0 	str	q0, [x29, #64]
  58:	910083a0 	add	x0, x29, #0x20
  5c:	f9002ba0 	str	x0, [x29, #80]
  60:	f9401ba0 	ldr	x0, [x29, #48]
  64:	f9401fa1 	ldr	x1, [x29, #56]
  68:	f94023a2 	ldr	x2, [x29, #64]
  6c:	f94027a3 	ldr	x3, [x29, #72]
  70:	9400001c 	bl	0xe0
  74:	f9402bbe 	ldr	x30, [x29, #80]
  78:	a90007c0 	stp	x0, x1, [x30]
  7c:	eb1f035f 	cmp	x26, xzr
  80:	10000011 	adr	x17, 0x80
  84:	540000e0 	b.eq	0xa0  // b.none
  88:	3dc00ba0 	ldr	q0, [x29, #32]
  8c:	3d800b40 	str	q0, [x26, #32]
  90:	f9400bba 	ldr	x26, [x29, #16]
  94:	910003bf 	mov	sp, x29
  98:	a8c67bfd 	ldp	x29, x30, [sp], #96
  9c:	d65f03c0 	ret
  a0:	d28027c0 	mov	x0, #0x13e                 	// #318
  a4:	aa1103e1 	mov	x1, x17
  a8:	954b50ac 	bl	0x52d4358

After:

   0:	a9be7bfd 	stp	x29, x30, [sp, #-32]!
   4:	910003fd 	mov	x29, sp
   8:	f9000bba 	str	x26, [x29, #16]
   c:	aa0003fa 	mov	x26, x0
  10:	3dc00740 	ldr	q0, [x26, #16]
  14:	9c0001e1 	ldr	q1, 0x50
  18:	4e010000 	tbl	v0.16b, {v0.16b}, v1.16b
  1c:	eb1f035f 	cmp	x26, xzr
  20:	10000011 	adr	x17, 0x20
  24:	540000c0 	b.eq	0x3c  // b.none
  28:	3d800b40 	str	q0, [x26, #32]
  2c:	f9400bba 	ldr	x26, [x29, #16]
  30:	910003bf 	mov	sp, x29
  34:	a8c27bfd 	ldp	x29, x30, [sp], #32
  38:	d65f03c0 	ret
  3c:	d28027c0 	mov	x0, #0x13e                 	// #318
  40:	aa1103e1 	mov	x1, x17
  44:	94000007 	bl	0x60
  48:	b9006740 	str	w0, [x26, #100]
  4c:	b9806740 	ldrsw	x0, [x26, #100]
  50:	0b0a0908 	add	w8, w8, w10, lsl #2
  54:	0b0a0908 	add	w8, w8, w10, lsl #2
  58:	0b0a0908 	add	w8, w8, w10, lsl #2
  5c:	0b0a0908 	add	w8, w8, w10, lsl #2

fanyang-mono

LGTM! Just some minor format adjustment needed. Thanks for adding this for mono.

src/mono/mono/mini/simd-intrinsics.c

kg · 2024-07-25T18:03:21Z

I'm not sure whether interp does this kind of optimization, but jiterpreter does, just FYI. If it becomes important we could probably move this optimization from the jiterpreter up into the interpreter, but I think the only target where interpreter performance really matters right now is WASM.

* Support mono creating xconst in a few more places * Update mono to support shuffle for constant inputs * Ensure that arm64 also accelerates shuffle for non-constant inputs * Ensure OP_XZERO and OP_XONES are recognized as being constant * Ensure shuffle creates a correct instruction when the fsig doesn't match the necessary parameter count * Ensure that getting the index for floating-point shuffle is possible * Ensure the right class handle is passed down to LLVM so overload resolution can function * Make sure we update the original xconst if we mutate it * Return a new constant and instead of mutating the existing one * Insert relevant xcast nodes * Add some asserts around the ecount * Ensure we get the right element type * Ensure we don't create nodes unnecessarily for create_elementwise * Ensure that create_elementwise still works for other vector sizes * Ensure indentation of switch cases is correct for Mono

…05110) * Remove support for older LLVM versions, and re-order linker flags We have generally tried to support linking multiple versions of LLVM within our git tree. Every new LLVM version moves symbols around between libraries, and as a result, every new version of LLVM requires different linker flags to build. The command line tool `llvm-config` should tell you the exact flags you need, but it is a problem for us when cross-compiling to rely on this, and as a result, we transcribe the result of various llvm-config outputs directly into Mono's CMakeLists.txt. In an effort to support multiple versions of LLVM, flags common between all supported versions were kept in one place, then the version-specific flags appended afterwards. And this has worked fine for years. However: 1. Whilst we only link with `lld`, it is common for contributors and source-build to link with `gold`, `bfd`, or some other GNU-flavoured linker, where library order is essential 2. The list of common libraries to link has remained unchanged for years, but the symbol intra-dependencies may have changed a long time ago, so common symbol order cannot be assumed to remain valid between LLVM versions This has resulted in a long-standing problem for people using e.g. Debian or Ubuntu or GitHub CodeSpaces, rather than always building with one of our dockerfiles representing our "real" build environment, when targeting platforms which use Mono and link LLVM. * Bumping clang and llvm - make docs less ambiguous. (#105401) * Bump main to RC1 (#105338) * Update SDK to preview 6 (#104696) * Update SDK to preview 6 * Update Shared.csproj Fix `error NU1903: Package 'System.Text.Json' 8.0.0 has a known high severity vulnerability` * Fix with existing version. --------- Co-authored-by: Ilona Tomkowicz <32700855+ilonatommy@users.noreply.github.com> Co-authored-by: Viktor Hofer <viktor.hofer@microsoft.com> * Change DefaultMaximumErrorResponseLength to KB from Byte (#105396) * Change DefaultMaximumErrorResponseLength to KB from Byte * Handle overflow * Review feedback * Fix warning for MakeGenericType annotation mismatch (#104921) Fixes warning code when a generic type whose type parameters have DAM annotations is used with MakeGenericType, over a type that doesn't have matching annotations. The code IL2070 used to mention the 'this' argument. Instead it should have been IL2071 which mentions the generic argument as the cause of the mismatch. Similar for MakeGenericMethod with IL2090 and IL2091. * Set GCStressIncompatible on GenericContext tests (#104686) Co-authored-by: Vladimir Sadov <vsadov@microsoft.com> * Add runtime config parameter to force ijwhost to load assemblies in an isolated context (#105337) * Add support for isolated load context in LoadInMemoryAssemblyInContext by passing -1 as loadContext * Have ijwhost check a runtime config parameter to determine if it should run in an isolated load context * Added test for ijwhost isolated load context runtime config option * [RISC-V] Fix passing float and uint arguments in VM (#105021) * Add tests * Fix passing float and uint arguments in VM * Change test lib name so it doesn't clash with managed DLL on Windows * Fix platform analyzer attribute order for MacCatalyst (#105409) We need to make sure the attribute for MacCatalyst comes _after_ the iOS one due to how MacCatalyst is a superset of iOS: https://learn.microsoft.com/en-us/dotnet/standard/analyzers/platform-compat-analyzer#platform-inclusion This caused an error in aspnetcore in the latest dependency flow because the analyzer thought AesGcm is _only_ supported on MacCatalyst: > error CA1416: (NETCORE_ENGINEERING_TELEMETRY=Build) This call site is reachable on all platforms. 'AesGcm.Decrypt(ReadOnlySpan<byte>, ReadOnlySpan<byte>, ReadOnlySpan<byte>, Span<byte>, ReadOnlySpan<byte>)' is only supported on: 'maccatalyst' 13.0 and later. * Use correct `ExceptionArgument` value in `System.IO.Pipelines` (#105418) * Remove zlib from requirements script and instruction files (#105419) * Remove zlib from requirements instructions * Remove zlib from native requirements installation script * Revert "Remove zlib from requirements script and instruction files (#105419)" (#105449) This reverts commit 3ec6286. * Ensure that WaitForPendingFinalizers has seen the expected Full GC count (#105289) * Ensure that WaitForPendingFinalizers has seen the expected Full GC * NativeAOT and some renames * a testcase * make the test not unsafe and make OuterLoop * Use unsigned math when comparing collection ticks * cast the diff to int when comparing gc ticks * Migrate to zlib-ng, part 3: Remove zlib and zlib-intel source code and license mentions (second attempt) (#105371) * Remove zlib/ * Remove zlib-intel/ * Remove third party notice * Remove patches * Remove version txts * Remove cgmanifest.json entries * Remove installer third party notice * Update docs --------- Co-authored-by: Jan Kotas <jkotas@microsoft.com> * [browser] Trigger relink on `EmccMaximumHeapSize` change (#105027) * Edit test + trigger relink. * Remove logging to speed up the test + decrease loop runs to prevent "Browser has been disconnected" error. * Feedback - properties are not bool-only anymore. * Fix: workload needed when heap size set. --------- Co-authored-by: Larry Ewing <lewing@microsoft.com> * Delete erroneous Socket test (#105448) * Pull the python dependency from the EmsdkVersion where possible (#105437) * Use `BinaryPrimitives` more in the ILCompiler (#105404) * JIT: use VNVisitReachingVNs in IsVNNeverNegative (#105197) * Set xunit env var to not print output for passing tests (#105392) * Set xunit env var to not print output for passing tests Fixes #103445 * Update xunit.console.targets * Update xunit.console.targets * Some more automated C# modernization in corelib (#105151) * Fix IDE0056 on corelib (indexing can be simplified) * Fix IDE (null check can be simplified) * Fix IDE0078 (use pattern matching) * Fix IDE0019 (use pattern matching) * Fix IDE0066 (use switch expression) * Fix IDE0250 (struct can be made readonly) * Fix nullability warning and address PR feedback * Address PR feedback and revert a downlevel change * Wrap any `?? throw new`s that go beyond 120 characters * Fix ECMA 355 Partition download links (#105454) On https://github.com/dotnet/runtime/blob/main/docs/project/dotnet-standards.md the Partition with Notes download links using HTTP protocol fail to download in Chrome: >Mixed Content: The site at 'https://github.com/' was loaded over a secure connection, but the file at 'https://download.microsoft.com/download/7/3/3/733AD403-90B2-4064-A81E-01035A7FE13C/MS%20Partition%20I.pdf' was redirected through an insecure connection. This file should be served over HTTPS. See https://blog.chromium.org/2020/02/protecting-users-from-insecure.html for more details. (It might be reasonble for somebody else to followup fixing all domains on all pages with regex `\(http://([.a-z0-9-]+)` replacing with `(https://$1`, but I didn't test each blog site supports HTTPS.) * Add swiftcall signature check for `mono_class_try_get_swift_error_class` (#105408) * Add signature check for swiftcall * Handle null values for swift_error_ptr * Enable NativeAOT runtime tests on MacCatalyst (#102882) This PR updates the CLRTest.Execute.Bash.targets file to set the apple run command for MacCatalyst. The command apple just-run used on Apple mobile is not permitted, and apple test requires the a test runner. Additionally, it is necessary to locate Info.plist in the Contents/ directory and the binary in Contents/MacOS/ within the bundle. --------- Co-authored-by: Ivan Povazan <ivan.povazan@gmail.com> * Resolving an antigen failure (#105260) * Resolving an antigen failure * Fix method accessibility so xunit doesn't complain * Support field access on GetType() of T constrained to be Enum (#105351) Adds trimming support for instance.GetType().GetFields(), where instance is a variable of type `T` that is constrained to System.Enum. This includes a change to have ILLink's TypeProxy track a TypeReference instead of TypeDefinition, which was necessary to allow TypeProxy to represent a generic parameter. Note that this only supports the specific case where `GetType()` is called on a variable of type `T` that is constrained to `Enum`. A variable of type `Enum` is not supported, so the following will still warn: ```csharp static void M(Enum v) { v.GetType().GetFields(); } ``` * Update mono to support shuffle for constant inputs (#105299) * Support mono creating xconst in a few more places * Update mono to support shuffle for constant inputs * Ensure that arm64 also accelerates shuffle for non-constant inputs * Ensure OP_XZERO and OP_XONES are recognized as being constant * Ensure shuffle creates a correct instruction when the fsig doesn't match the necessary parameter count * Ensure that getting the index for floating-point shuffle is possible * Ensure the right class handle is passed down to LLVM so overload resolution can function * Make sure we update the original xconst if we mutate it * Return a new constant and instead of mutating the existing one * Insert relevant xcast nodes * Add some asserts around the ecount * Ensure we get the right element type * Ensure we don't create nodes unnecessarily for create_elementwise * Ensure that create_elementwise still works for other vector sizes * Ensure indentation of switch cases is correct for Mono * Make TooDeepJsonDocument test more consistent across platforms (#105445) * Make TooDeepJsonDocument test more consistent across platforms Run the test on a thread with as consistent a stack size as possible so that we don't inadvertently succeed due to having a really large stack. * Disable test on mono * Update the TypeLib embedding and add comments on API use (#105416) There is an undocumented semantic of Win32 Resource APIs. The missing semantic is that all resource type/name strings are transparently converted to uppercase when calling any of the Win32 Resource APIs. We don't want to apply this undocumented semantic to the reader/writer API so we document it instead. We are avoiding applying the behavior since ReadyToRun scenarios are designed to be a byte for byte copy of the resource, including name as it was written by other tooling. * Update docs for ByRefLike with generics for work in .NET 10 (#103318) Co-authored-by: Jan Kotas <jkotas@microsoft.com> * Fix double printing in StressLog and simplify stresslog macros (#105420) * Add support for nested types in the `corelib.h` parser for rooting descriptors (#105432) * Fixed for assertion failure due to not checking if we are processing Eph samples (#105164) * [PERF] Use correct python executable on windows in venv (#105451) * Fix up Fuzzlyn CI scripts for new hardware intrinsics support (#105470) 1) Strip out the extensions in the seed name when using it for file/directory names, since the list of extensions is quite long 2) Limit the number of unreduced/uncategorized example seeds we show * Try to re-enable DeepEquals_TooDeepJsonDocument_ThrowsInsufficientExecutionStackException test on mono (#105509) * zlib-ng: avoid suppressing WD4242 and WD4244 (#105433) WD4242 and WD4244 are compiler warnings that should not be suppressed because the warn about possible loss of data. WD4242 shows up in zlib-ng/arch/*/slide_hash*.c files and comes from the arguments passed to the slide_hash_chain method. WD4244 happens in Windows when building in Debug configuration, in various zlib-ng/deflate*.c files, and comes from the arguments passed to the check_match method. Fixed by: - Adding asserts to verify the values are below the maximum allowed for their type. - Casting them the proper type before passing them as arguments to their methods. - Removing the WD suppressions, which unfortunately also propagated to other unrelated cmake files. - Fixed a similar loss of data error in an unrelated mono file where the warning suppression was propagated due to this. * Delete outdated comments (#105519) * Arm64/Sve: Add FFR register liveness tracking (#105348) * Add tracking of FFR register somewhat workable code cleanup Remove FFR Add all the GetFfr* wip Work with MskCns() model Use physReg approach Remove commented prototypes working Remove bunch of unnecessary code Remove SpecialImport from GetFFR/SetFFR/LoadFirstFaulting some more code cleanup some fixup * Change condition for PhysReg * jit format * review feedback * unspill for LoadVectorFirstFaulting as well * Use the right opReg * skip spilling tracking * review feedback * Use non-existent REG_FFR * Do not reload from FFR for GetFfr() * review feedback * Make just GrabTemp * fix build and formatting * missed another build failure for arm * Fix throwing exception when calling RunClassConstructor on a generic type with a static constructor (#105513) * Fix throwing exception when calling RunClassConstructor on a generic type with a static constructor #99183 seems to have done away with assuming that a generic type's static constructor was always "initialized". As a result, if you call RunClassConstructor on it, the runtime would throw an exception. Fixes #103891 * Apply suggestions from code review --------- Co-authored-by: Jan Kotas <jkotas@microsoft.com> * [RISC-V][LoongArch64] New passing info for floating-point structs (#103945) * Replace StructFloatFieldInfoFlags with FpStructInRegistersInfo which carries also exact field sizes and offsets * Replace StructFloatFieldInfoFlags with FpStruct::Flags in profiler * Remove FpStructInRegistersInfo::FromOldFlags() * Fix duplicating types in HandleInlineArray * Remove signedness from FpStruct::IntKind because most probably we won't need it * Remove old StructFloatFieldInfoFlags calculating routine * Typo in TARGET_LOONGARCH64 * Remove m_returnedFpFieldOffsets from ArgIterator * Add missing ENREGISTERED_PARAMTYPE_MAXSIZE condition to C# version of FpStruct info calculation * Rename RISCV64PassStructInRegister to match settled casing for RiscV in class names * Update hardcoded flags for float and double in ArgIteratorTemplate::ComputeReturnFlags() This fixes JIT/HardwareIntrinsics/General/Vector* tests. * Fix build on other platforms * Update LoongArch to use FpStructInRegistersInfo * Remove unused old flag masks * LoongArch64 typo Co-authored-by: Qiao Pengcheng <qiaopengcheng@loongson.cn> * Missing FpStruct namespace Co-authored-by: Qiao Pengcheng <qiaopengcheng@loongson.cn> * Missing FpStruct namespace Co-authored-by: Qiao Pengcheng <qiaopengcheng@loongson.cn> * Missing FpStruct namespace Co-authored-by: Qiao Pengcheng <qiaopengcheng@loongson.cn> * Use FpStruct namespace everywhere in JIT * JIT review * Update StructFloatFieldInfoFlags description * Revert to hitherto instruction set order as it's not the point of this PR * Unify get{LoongArch,RiscV}64PassFpStructInRegistersInfo JIT interfaces * Use JIT_TO_EE_TRANSITION instead of _LEAF because MethodTable::GetFpStructInRegistersInfo may throw * Remove FpStruct::IntKind, we should have similar info in ClassLayout in JIT * Change JIT interface to return a struct similar to CORINFO_SWIFT_LOWERING to facilitate code unification in the future * Change JIT to use new Swift-like getFpStructLowering * Cache CORINFO_FPSTRUCT_LOWERING * Update LoongArch classifier to use CORINFO_FPSTRUCT_LOWERING * Update StructFloatInfoFlags doc comment on C# * Move StructFloatFieldInfoFlags and FpStructInRegistersInfo out of the JIT interface * Merge LoongArch and RISC-V AOT calculation of FpStructInRegistersInfo because they were identical. Move it to Common\Internal/Runtime because it's no longer exposed in JIT interface. * Don't zero-initialize CORINFO_FPSTRUCT_LOWERING * Add note for CORINFO_FPSTRUCT_LOWERING::loweredElements type --------- Co-authored-by: Qiao Pengcheng <qiaopengcheng@loongson.cn> * Ensure we don't reuse temps when calling fgMorphArgs on LIR nodes (#105508) * Ensure constant evaluation of shifts on xarch broadcast the operand to the correct size (#105487) * Ensure constant evaluation of shifts on xarch broadcast the operand to the correct size * Ensure we don't try to execute AVX2 code on unsupported platforms * Use ConcurrentDictionary in runtimecounters test (#105520) * Use ConcurrentDictionary in runtimecounters test Fixes #105443 * Fix build break * Fix issue 98506 - Excessive exceptions generated in StackTraceSymbols (#105530) * Fix issue 98506 - Excessive exceptions generated in StackTraceSymbols * Code review feedback * Clean up some usages of LowLevelList<T> (#105407) * Fix ShuffleThunk cache heap (#105480) There was a problem with using heap from the related LoaderAllocator for shuffle thunk cache heap. I have tested it again and it seems that the issue is gone. So I am removing the workaround, making the cache use LoaderAllocator local heap. Close #55697 * [browser] Fix computing destination sub path and publish extension target path in Wasm SDK (#105458) * Bump flags to LLVM 19, not 16 --------- Co-authored-by: Larry Ewing <lewing@microsoft.com> Co-authored-by: Alexander Köplinger <alex.koeplinger@outlook.com> Co-authored-by: Ilona Tomkowicz <32700855+ilonatommy@users.noreply.github.com> Co-authored-by: Carlos Sánchez López <1175054+carlossanlop@users.noreply.github.com> Co-authored-by: Adeel Mujahid <3840695+am11@users.noreply.github.com> Co-authored-by: Viktor Hofer <viktor.hofer@microsoft.com> Co-authored-by: Ahmet Ibrahim Aksoy <aaksoy@microsoft.com> Co-authored-by: Sven Boemer <sbomer@gmail.com> Co-authored-by: Steve Pfister <steveisok@users.noreply.github.com> Co-authored-by: Vladimir Sadov <vsadov@microsoft.com> Co-authored-by: Mike Oliphant <oliphant@nostatic.org> Co-authored-by: Tomasz Sowiński <tomeksowi@gmail.com> Co-authored-by: xtqqczze <45661989+xtqqczze@users.noreply.github.com> Co-authored-by: Jan Kotas <jkotas@microsoft.com> Co-authored-by: Stephen Toub <stoub@microsoft.com> Co-authored-by: Paulus Pärssinen <paulus.parssinen@gmail.com> Co-authored-by: Egor Bogatov <egorbo@gmail.com> Co-authored-by: Carl Walsh <darthwalsh@gmail.com> Co-authored-by: Milos Kotlar <kotlarmilos@gmail.com> Co-authored-by: Ivan Povazan <ivan.povazan@gmail.com> Co-authored-by: Tanner Gooding <tagoo@outlook.com> Co-authored-by: Aaron Robinson <arobins@microsoft.com> Co-authored-by: Jeremy Koritzinsky <jekoritz@microsoft.com> Co-authored-by: Mukund Raghav Sharma (Moko) <68247673+mrsharm@users.noreply.github.com> Co-authored-by: Cameron Aavik <99771732+caaavik-msft@users.noreply.github.com> Co-authored-by: Jakob Botsch Nielsen <Jakob.botsch.nielsen@gmail.com> Co-authored-by: Kunal Pathak <Kunal.Pathak@microsoft.com> Co-authored-by: Qiao Pengcheng <qiaopengcheng@loongson.cn> Co-authored-by: Mike McLaughlin <mikem@microsoft.com> Co-authored-by: Huo Yaoyuan <huoyaoyuan@hotmail.com> Co-authored-by: Jan Vorlicek <janvorli@microsoft.com> Co-authored-by: Marek Fišera <mara@neptuo.com>

tannergooding added 2 commits July 22, 2024 23:23

Support mono creating xconst in a few more places

4820a21

Update mono to support shuffle for constant inputs

fcdbd18

dotnet-issue-labeler bot added the area-Codegen-JIT-mono label Jul 23, 2024

dotnet-policy-service bot assigned tannergooding Jul 23, 2024

tannergooding mentioned this pull request Jul 23, 2024

[Perf] Linux/x64: 20 Regressions on 6/20/2024 7:04:47 AM #104022

Closed

tannergooding commented Jul 23, 2024

View reviewed changes

src/mono/mono/mini/simd-intrinsics.c Outdated Show resolved Hide resolved

Ensure that arm64 also accelerates shuffle for non-constant inputs

095873b

tannergooding marked this pull request as ready for review July 23, 2024 14:15

tannergooding requested a review from fanyang-mono as a code owner July 23, 2024 14:15

This comment was marked as resolved.

Sign in to view

Ensure OP_XZERO and OP_XONES are recognized as being constant

d87a444

Ensure shuffle creates a correct instruction when the fsig doesn't ma…

71708fe

…tch the necessary parameter count

tannergooding commented Jul 23, 2024

View reviewed changes

tannergooding added 2 commits July 23, 2024 10:28

Ensure that getting the index for floating-point shuffle is possible

f85e513

Ensure the right class handle is passed down to LLVM so overload reso…

2995754

…lution can function

tannergooding force-pushed the mono-shuffle branch from 762d241 to 2995754 Compare July 23, 2024 19:11

Make sure we update the original xconst if we mutate it

b35b1e3

tannergooding force-pushed the mono-shuffle branch from d05673f to b35b1e3 Compare July 23, 2024 20:46

tannergooding added 2 commits July 23, 2024 18:19

Return a new constant and instead of mutating the existing one

1d55132

Insert relevant xcast nodes

3c327b0

tannergooding marked this pull request as draft July 24, 2024 02:31

This was referenced Jul 24, 2024

Failed USB connection via port 54050, error 61, in tvOS arm64 Release AllSubsets_Mono #82637

Open

[browser] RuntimeError: memory access out of bounds #96546

Closed

WasmTestOnChrome timeouts in CI #105363

Open

tannergooding added 4 commits July 24, 2024 08:49

Add some asserts around the ecount

50cec20

Ensure we get the right element type

fe451d7

Ensure we don't create nodes unnecessarily for create_elementwise

08aa477

Ensure that create_elementwise still works for other vector sizes

4a49172

tannergooding marked this pull request as ready for review July 24, 2024 19:13

fanyang-mono approved these changes Jul 25, 2024

View reviewed changes

Ensure indentation of switch cases is correct for Mono

5a8a221

build-analysis bot mentioned this pull request Jul 25, 2024

msbuild crashes with "MSB0001: Internal MSBuild Error: must be valid" dotnet/dnceng#3304

Open

3 tasks

fanyang-mono requested a review from steveisok July 25, 2024 18:02

fanyang-mono merged commit 4fbd498 into dotnet:main Jul 25, 2024
78 of 80 checks passed

tannergooding deleted the mono-shuffle branch July 25, 2024 18:27

matouskozak mentioned this pull request Jul 26, 2024

[mono] Vector shuffle failures in Mono LLVM scenario #105537

Closed

github-actions bot locked and limited conversation to collaborators Aug 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update mono to support shuffle for constant inputs #105299

Update mono to support shuffle for constant inputs #105299

tannergooding commented Jul 23, 2024 •

edited

Loading

dotnet-policy-service bot commented Jul 23, 2024

This comment was marked as resolved.

This comment was marked as resolved.

tannergooding commented Jul 23, 2024

This comment was marked as resolved.

EgorBo commented Jul 23, 2024

EgorBot commented Jul 23, 2024

tannergooding commented Jul 23, 2024

tannergooding Jul 23, 2024

matouskozak Jul 23, 2024 •

edited

Loading

tannergooding commented Jul 24, 2024

EgorBo commented Jul 24, 2024

EgorBot commented Jul 24, 2024

tannergooding commented Jul 24, 2024

tannergooding commented Jul 24, 2024

fanyang-mono left a comment

kg commented Jul 25, 2024

Update mono to support shuffle for constant inputs #105299

Update mono to support shuffle for constant inputs #105299

Conversation

tannergooding commented Jul 23, 2024 • edited Loading

dotnet-policy-service bot commented Jul 23, 2024

This comment was marked as resolved.

This comment was marked as resolved.

tannergooding commented Jul 23, 2024

This comment was marked as resolved.

EgorBo commented Jul 23, 2024

EgorBot commented Jul 23, 2024

tannergooding commented Jul 23, 2024

Before:

After:

tannergooding Jul 23, 2024

Choose a reason for hiding this comment

matouskozak Jul 23, 2024 • edited Loading

Choose a reason for hiding this comment

tannergooding commented Jul 24, 2024

EgorBo commented Jul 24, 2024

EgorBot commented Jul 24, 2024

tannergooding commented Jul 24, 2024

tannergooding commented Jul 24, 2024

Before

After:

fanyang-mono left a comment

Choose a reason for hiding this comment

kg commented Jul 25, 2024

tannergooding commented Jul 23, 2024 •

edited

Loading

matouskozak Jul 23, 2024 •

edited

Loading