Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Convert small atomic fallbacks to managed #99011

Merged
merged 10 commits into from
Mar 22, 2024

Conversation

MichalPetryka
Copy link
Contributor

@MichalPetryka MichalPetryka commented Feb 27, 2024

Makes the non-intrinsic implementations of Exchange/CompareExchange for small types be implemented as loops using 32b variants instead of calling into native code.

I'm not sure what's the performance difference here as the only platforms that should use those will be RISC-V and Mono (ARM32 is handled with #97792 and LoongArch64 seems trivial to do but I have no way to test it).

I've based the idea for the implementation on the fact that Linux Kernel and Libatomic use such 32b operations for their fallbacks (I took no code from those however as they're both GPL).
The approach also relies on an implementation detail of the GCs with that it'll keep refs backtracked to 4B aligned and in the same object to avoid pinning.

I'm not 100% sure whether doing this in managed makes sense here since it's a lot of code, but it also removed all the C++ paths from 3 runtimes.
cc @jkotas

@dotnet-issue-labeler dotnet-issue-labeler bot added the needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners label Feb 27, 2024
@ghost
Copy link

ghost commented Feb 27, 2024

Tagging subscribers to this area: @mangod9
See info in area-owners.md if you want to be subscribed.

Issue Details

Makes the non-intrinsic implementations of Exchange/CompareExchange for small types be implemented as loops using 32b variants instead of calling into native code.

I'm not sure what's the performance difference here as the only platforms that should use those will be RISC-V and Mono (ARM32 is handled with #97792 and LoongArch64 seems trivial to do but I have no way to test it).

I've based the idea for the implementation on the fact that Linux Kernel and Libatomic use such 32b operations for their fallbacks (I took no code from those however as they're both GPL).
The approach also relies on an implementation detail of the GCs with that it'll keep refs backtracked to 4B aligned and in the same object to avoid pinning.

I'm not 100% sure whether doing this in managed makes sense here since it's a lot of code, but it also removed all the C++ paths from 3 runtimes.
cc @jkotas

Author: MichalPetryka
Assignees: -
Labels:

area-System.Threading, needs-area-label

Milestone: -

@filipnavara filipnavara added community-contribution Indicates that the PR has been added by a community member and removed needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners labels Feb 27, 2024
/// <returns>The original value of <paramref name="location1"/>.</returns>
/// <exception cref="NullReferenceException">The address of location1 is a null pointer.</exception>
[Intrinsic]
[MethodImpl(MethodImplOptions.AggressiveInlining)]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is AggressiveInlining here just a copy&paste? It does not sound like a good idea to aggressively inline all this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume this should be similar in size to what native compilers emit for RISC-V and they do inline that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One benefit to inlining this is that with #99130 it lets the JIT fold most of the bit operations here when passed a ref to a static field.

@jkotas
Copy link
Member

jkotas commented Feb 27, 2024

I think it is fine to do this for CoreCLR/NativeAOT. As you have said, it is just a fallback that should be only used during platform bring up. We effectively require JIT to expand these inline for best perf.

I am not sure about Mono. @vargaz @AlekseyTs Thoughts?

@MichalPetryka
Copy link
Contributor Author

It'd be nice if somebody from the Samsung RISC-V team could benchmark this on a RISC-V device to see the performance difference.

@jkotas
Copy link
Member

jkotas commented Feb 27, 2024

It'd be nice if somebody from the Samsung RISC-V team could benchmark this on a RISC-V device to see the performance difference.

cc @gbalykov @HJLeee @wscho77 @clamp03 @JongHeonChoi @t-mustafin @viewizard

@MichalPetryka
Copy link
Contributor Author

MichalPetryka commented Feb 27, 2024

exit code 139 means SIGSEGV Illegal memory access. Deref invalid pointer, overrunning buffer, stack overflow etc. Core dumped.

Seems like Mono crashes with this implementation, could somebody say if the implementation assumptions are invalid there or if there's some bug instead?

MichalPetryka added a commit to MichalPetryka/runtime that referenced this pull request Feb 27, 2024
@jkotas
Copy link
Member

jkotas commented Feb 27, 2024

You may want to temporarily enable the fallback implementation for CoreCLR and see whether it passes all tests.

@MichalPetryka
Copy link
Contributor Author

You may want to temporarily enable the fallback implementation for CoreCLR and see whether it passes all tests.

It's currently tested with ARM32 and it asserts which I've fixed in #99019.
I've ran it locally on X64 beforehand and it passed for me (even with GCStress, which would actually be nice to run here).

@clamp03
Copy link
Member

clamp03 commented Feb 27, 2024

It'd be nice if somebody from the Samsung RISC-V team could benchmark this on a RISC-V device to see the performance difference.

I am out of office. @bartlomiejko Could your team can check the performance difference?

@MichalPetryka
Copy link
Contributor Author

MichalPetryka commented Mar 2, 2024

All failures here seem unrelated, this seems to be only waiting for a review of the assumptions on Mono I think.
EDIT: I've also added some more tests for asserts I've seen locally but that should be fixed with #99019.

@MichalPetryka
Copy link
Contributor Author

Seems like there's still some assert here:

Assert failure(PID 26748 [0x0000687c], Thread: 36716 [0x8f6c]): Assertion failed 'CoercedConstantValue<size_t>(vn) != 0' in 'InterlockedDisasm:Cas(int):ubyte' during 'Do value numbering' (IL size 18; hash 0x0e503796; FullOpts)

    File: H:\Projects\dotnet\runtime\src\coreclr\jit\valuenum.cpp:1630
    Image: H:\Projects\dotnet\runtime\artifacts\bin\coreclr\windows.x64.Checked\CoreRun.exe
ERROR:
Assert failure(PID 26748 [0x0000687c], Thread: 36716 [0x8f6c]): Assertion failed 'CoercedConstantValue<size_t>(vn) != 0' in 'InterlockedDisasm:Cas(int):ubyte' during 'Do value numbering' (IL size 18; hash 0x0e503796; FullOpts)

    File: H:\Projects\dotnet\runtime\src\coreclr\jit\valuenum.cpp:1630
    Image: H:\Projects\dotnet\runtime\artifacts\bin\coreclr\windows.x64.Checked\CoreRun.exe

@MichalPetryka
Copy link
Contributor Author

there's still some assert

Fixed now with #100060.

I am not sure about Mono.

I guess we still need somebody from the Mono team to check this?

Copy link
Contributor

@hamarb123 hamarb123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove Unsafe.AsPointer uses

Co-authored-by: Hamish Arblaster <hamarb123@gmail.com>
Copy link
Member

@lambdageek lambdageek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mono LGTM

@MichalPetryka I would leave the small ops in atomic.h - they are generally useful. Also it might be worthwhile to open a follow-up issue for mono to add intrinsics for these operations in the JIT and interpreter.

@MichalPetryka
Copy link
Contributor Author

I would leave the small ops in atomic.h - they are generally useful.

I could do that but I think they're simple enough that they could be readded when needed.

Also it might be worthwhile to open a follow-up issue for mono to add intrinsics for these operations in the JIT and interpreter.

#93488

@MichalPetryka
Copy link
Contributor Author

@jkotas does the musl arm assert here seem related to the changes here for you?
image

_ASSERTE((GetComponentSize() <= 2) || IsArray());

@mangod9
Copy link
Member

mangod9 commented Mar 21, 2024

@jkotas does the musl arm assert here seem related to the changes here for you? image

_ASSERTE((GetComponentSize() <= 2) || IsArray());

This is a known issue logged here: #86273

@MichalPetryka
Copy link
Contributor Author

This seems ready to merge I think but it'll slightly conflict with #100021 so I'm not sure which one should go in first cc @jkotas

@jkotas
Copy link
Member

jkotas commented Mar 21, 2024

/azp run runtime-nativeaot-outerloop

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Copy link
Member

@jkotas jkotas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks

@jkotas jkotas merged commit da95abd into dotnet:main Mar 22, 2024
174 of 182 checks passed
@github-actions github-actions bot locked and limited conversation to collaborators Apr 21, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-System.Threading community-contribution Indicates that the PR has been added by a community member
Projects
None yet
Development

Successfully merging this pull request may close these issues.