Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use StoreAligned not Store in WidenAsciiToUtf16 #89892

Closed
wants to merge 3 commits into from

Conversation

Ruihan-Yin
Copy link
Contributor

Description

This PR is to improve the in-loop write logic in WidenAsciiToUtf16, the major change is to replace StoreAligned with Store inside the loop to reduce the penalty caused by split loads.

We are open to adjusting the implementation style for either path. Perf number attached in the comments.

@ghost ghost added the community-contribution Indicates that the PR has been added by a community member label Aug 2, 2023
@ghost
Copy link

ghost commented Aug 2, 2023

Tagging subscribers to this area: @dotnet/area-system-text-encoding
See info in area-owners.md if you want to be subscribed.

Issue Details

Description

This PR is to improve the in-loop write logic in WidenAsciiToUtf16, the major change is to replace StoreAligned with Store inside the loop to reduce the penalty caused by split loads.

We are open to adjusting the implementation style for either path. Perf number attached in the comments.

Author: Ruihan-Yin
Assignees: -
Labels:

area-System.Text.Encoding, community-contribution

Milestone: -

@Ruihan-Yin
Copy link
Contributor Author

Perf numbers

Base: main (531ad95)
Diff: main + changes on WidenAsciiToUtf16

Avx512

summary:
better: 5, geomean: 1.099
worse: 1, geomean: 1.022
total diff: 6

Slower diff/base Base Median (ns) Diff Median (ns) Modality
System.Text.Tests.Perf_Encoding.GetChars(size: 512, encName: "utf-8") 1.02 85.25 87.11
Faster base/diff Base Median (ns) Diff Median (ns) Modality
System.Text.Tests.Perf_Encoding.GetChars(size: 1024, encName: "utf-8") 1.18 176.81 150.47
System.Text.Tests.Perf_Encoding.GetChars(size: 1024, encName: "ascii") 1.14 152.16 133.17
System.Text.Tests.Perf_Encoding.GetChars(size: 16, encName: "ascii") 1.09 17.97 16.54
System.Text.Tests.Perf_Encoding.GetChars(size: 512, encName: "ascii") 1.06 79.47 74.72
System.Text.Tests.Perf_Encoding.GetChars(size: 16, encName: "utf-8") 1.03 25.79 24.94

AVX

summary:
better: 3, geomean: 1.049
total diff: 3

No Slower results for the provided threshold = 1% and noise filter = 0.5 ns.

Faster base/diff Base Median (ns) Diff Median (ns) Modality
System.Text.Tests.Perf_Encoding.GetChars(size: 512, encName: "ascii") 1.07 81.68 76.09
System.Text.Tests.Perf_Encoding.GetChars(size: 1024, encName: "utf-8") 1.05 174.54 165.92
System.Text.Tests.Perf_Encoding.GetChars(size: 1024, encName: "ascii") 1.02 148.66 145.48

SSE

summary:
better: 5, geomean: 1.065
total diff: 5

No Slower results for the provided threshold = 1% and noise filter = 0.5 ns.

Faster base/diff Base Median (ns) Diff Median (ns) Modality
System.Text.Tests.Perf_Encoding.GetChars(size: 512, encName: "ascii") 1.11 108.07 97.21
System.Text.Tests.Perf_Encoding.GetChars(size: 16, encName: "ascii") 1.10 17.74 16.14
System.Text.Tests.Perf_Encoding.GetChars(size: 1024, encName: "ascii") 1.05 183.11 173.62
System.Text.Tests.Perf_Encoding.GetChars(size: 1024, encName: "utf-8") 1.04 219.66 211.76
System.Text.Tests.Perf_Encoding.GetChars(size: 512, encName: "utf-8") 1.02 119.36 116.59

@neon-sunset
Copy link
Contributor

neon-sunset commented Aug 3, 2023

How does this change affect *-arm64 targets? The PR introduces unconditional .ExtractMostSignificantBits() which is expected to regress its performance.

UPD: @Ruihan-Yin thank you

@Ruihan-Yin
Copy link
Contributor Author

Ruihan-Yin commented Aug 4, 2023

How does this change affect *-arm64 targets? The PR introduces unconditional .ExtractMostSignificantBits() which is expected to regress its performance.

Thanks for pointing out. That was a mistake, changed to VectorContainsNonAsciiChar to make sure Arm64 won't be affected by the use of ExtractMSB.

@Ruihan-Yin
Copy link
Contributor Author

Hi @xtqqczze @neon-sunset, kindly asking if there is any further change needed?

@xtqqczze
Copy link
Contributor

xtqqczze commented Aug 9, 2023

Perhaps we shouldn't assume the argument pUtf16Buffer is naturally aligned (i.e. 2 byte), as System.Text.Encoding.GetChars is a public API.

I'm not sure how to handle this, but System.SpanHelpers.IndexOfNullCharacter(System.Char*) does the following check:

if (((int)searchSpace & 1) != 0)
{
// Input isn't char aligned, we won't be able to align it to a Vector
}

@MichalPetryka
Copy link
Contributor

Perhaps we shouldn't assume the argument pUtf16Buffer is naturally aligned (i.e. 2 byte), as System.Text.Encoding.GetChars is a public API.

AFAIR the guideline is that if the platform the code is running on works with unaligned memory, dotnet APIs should work too.

@xtqqczze
Copy link
Contributor

AFAIR the guideline is that if the platform the code is running on works with unaligned memory, dotnet APIs should work too.

@MichalPetryka We have existing code that does not account for this, e.g. WidenLatin1ToUtf16, see #90319.

@anthonycanino
Copy link
Contributor

Perhaps we shouldn't assume the argument pUtf16Buffer is naturally aligned (i.e. 2 byte), as System.Text.Encoding.GetChars is a public API.

I'm not sure how to handle this, but System.SpanHelpers.IndexOfNullCharacter(System.Char*) does the following check:

if (((int)searchSpace & 1) != 0)
{
// Input isn't char aligned, we won't be able to align it to a Vector
}

@tannergooding based on some of the conversation we had, I am not sure how to proceed.

Perhaps we want to implement a fallback method that does not pin, and does not explicitly use StoreAligned and instead uses Store but skips the initial alignment if the char pointer is not naturally aligned?

@xtqqczze
Copy link
Contributor

Perhaps we shouldn't assume the argument pUtf16Buffer is naturally aligned (i.e. 2 byte)

See also comments at #90319.

@eiriktsarpalis eiriktsarpalis added this to the 9.0.0 milestone Aug 14, 2023
@tannergooding
Copy link
Member

tannergooding commented Aug 14, 2023

Perhaps we want to implement a fallback method that does not pin, and does not explicitly use StoreAligned and instead uses Store but skips the initial alignment if the char pointer is not naturally aligned?

For right now we should keep the pin and try to align, but if its unalignable then we should just continue as-is. It's basically the same code just with a check for "is this alignable at all" and using Store rather than StoreAligned (this is ultimately the same codegen and perf on modern hardware since the underlying data will actually be aligned in most cases).

For the future, we probably want to come to an agreement about how to universally handle this. My guess/vote is that's probably going to involve not pinning and using StoreUnsafe with optimistic alignment of the underlying data (basically optimistically presuming it won't be moved by the GC, which will be the common case but still using ref so that if it is moved, everything still works as expected). There's ultimately a balance between writing safe/readable code and performant code, so we need to find the right spot to land. Ideally we'd just be using Vector128.Create(ROSpan<T>) and the JIT would elide the bounds checks, so we don't have any unsafeness; but that's not possible today.

@eiriktsarpalis
Copy link
Member

Just checking up on the status of this PR, @anthonycanino have you had the chance to address @tannergooding's feedback?

@anthonycanino
Copy link
Contributor

Just checking up on the status of this PR, @anthonycanino have you had the chance to address @tannergooding's feedback?

@tannergooding how do we feel about this change now that we are moving to ISimdVector? Is it better to address the alignment with that change in one PR?

@tannergooding
Copy link
Member

I don't think it's changed from my last feedback.

We really need to account for unalignable data and the easiest way to do that is to pin, check the alignment, align if possible, and then continue processing using Store which works regardless of whether the data is aligned or unaligned.

In general the code pattern for efficiently handling alignment efficiently, for idempotent data, looks something like https://source.dot.net/#System.Numerics.Tensors/System/Numerics/Tensors/TensorPrimitives.netcore.cs,2930

Using ISimdVector then lets you merge the 3 different vectorized code paths down to 1 shared code path. The amount of unrolling and other factors can depend on the exact algorithm, the number of inputs, etc. But this is the general basic shape that works well for both large and small inputs.

@tannergooding tannergooding added the needs-author-action An issue or pull request that requires more info or actions from the author. label Nov 3, 2023
@eiriktsarpalis eiriktsarpalis self-assigned this Nov 15, 2023
@ghost ghost added the no-recent-activity label Nov 29, 2023
@ghost
Copy link

ghost commented Nov 29, 2023

This pull request has been automatically marked no-recent-activity because it has not had any activity for 14 days. It will be closed if no further activity occurs within 14 more days. Any new comment (by anyone, not necessarily the author) will remove no-recent-activity.

@ghost
Copy link

ghost commented Dec 13, 2023

This pull request will now be closed since it had been marked no-recent-activity but received no further activity in the past 14 days. It is still possible to reopen or comment on the pull request, but please note that it will be locked if it remains inactive for another 30 days.

@ghost ghost closed this Dec 13, 2023
@github-actions github-actions bot locked and limited conversation to collaborators Jan 13, 2024
This pull request was closed.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-System.Text.Encoding community-contribution Indicates that the PR has been added by a community member needs-author-action An issue or pull request that requires more info or actions from the author. no-recent-activity
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants