Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add WidenLatin1ToUtf16_MisalignedAddress #90319

Closed
wants to merge 3 commits into from

Conversation

xtqqczze
Copy link
Contributor

@xtqqczze xtqqczze commented Aug 10, 2023

Handle the case where the pUtf16Buffer parameter is misaligned, and fallback to a simple loop.

The current code uses of Sse2.StoreAligned (MOVDQA), which is documented to throw a general-protection exception if the memory operand is misaligned. However, it is possible data could be silently corrupted if the existing logic does not account for misaligned address.

@ghost ghost added the community-contribution Indicates that the PR has been added by a community member label Aug 10, 2023
@ghost
Copy link

ghost commented Aug 10, 2023

Tagging subscribers to this area: @dotnet/area-system-text-encoding
See info in area-owners.md if you want to be subscribed.

Issue Details

Handle the case where the pUtf16Buffer parameter is misaligned.

The current code uses of Sse2.StoreAligned (MOVDQA, which is documented to throw a general-protection exception if the memory operand is misaligned. However it is likely the method will silently corrupt data instead
.

Author: xtqqczze
Assignees: -
Labels:

area-System.Text.Encoding, community-contribution

Milestone: -

@xtqqczze xtqqczze marked this pull request as draft August 10, 2023 15:50
@xtqqczze xtqqczze marked this pull request as ready for review August 10, 2023 15:56
if (((nuint)pUtf16Buffer & 1) != 0)
{
// Input isn't char aligned, we won't be able to vectorize.
WidenLatin1ToUtf16_MisalignedAddress(pLatin1Buffer, pUtf16Buffer, elementCount);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure I understand the need of this, I am seeing that we already align data in

if (elementCount >= SizeOfVector128)
{
// First, perform an unaligned 1x 64-bit read from the input buffer and an unaligned
// 1x 128-bit write to the destination buffer.
latin1Vector = Sse2.LoadScalarVector128((ulong*)pLatin1Buffer).AsByte(); // unaligned load
Sse2.Store((byte*)pUtf16Buffer, Sse2.UnpackLow(latin1Vector, zeroVector)); // unaligned write
// Calculate how many elements we wrote in order to get pOutputBuffer to its next alignment
// point, then use that as the base offset going forward. Remember the >> 1 to account for
// that we wrote chars, not bytes. This means we may re-read data in the next iteration of
// the loop, but this is ok.
currentOffset = (SizeOfVector128 >> 1) - (((nuint)pUtf16Buffer >> 1) & (MaskOfAllBitsInVector128 >> 1));
Debug.Assert(0 < currentOffset && currentOffset <= SizeOfVector128 / sizeof(char));
// Calculating the destination address outside the loop results in significant
// perf wins vs. relying on the JIT to fold memory addressing logic into the
// write instructions. See: https://github.com/dotnet/runtime/issues/33002
char* pCurrentWriteAddress = pUtf16Buffer + currentOffset;
// Now run the main 1x 128-bit read + 2x 128-bit write loop.
nuint finalOffsetWhereCanIterateLoop = elementCount - SizeOfVector128;
while (currentOffset <= finalOffsetWhereCanIterateLoop)
, don't we?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the address that pUtf16Buffer points to is naturally aligned, then this logic should work as expected.

The problem is:

char* pCurrentWriteAddress = pUtf16Buffer + currentOffset;

which does not effectively ensure correct alignment for a misaligned char*.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@EgorBo As an alternative approach, change the following to Sse2.Store and accept the performance hit pre-Nehalem.

Sse2.StoreAligned(pLatin1Buffer + currentOffsetInElements, latin1Vector); // aligned

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should just use unaligned load/store here. It may have some small hit on hardware that's more than 10-12 years old, but it simplifies things overall and will be increasingly rare to encounter such hardware over time. Plus, we don't use aligned loads/stores in most of our workloads already, so this won't be any different.

Opportunistically aligning the data is still goodness and will avoid cache line splits.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like Sse2.Store approach is better since unaligned char* should be rare? (e.g. char[] is always aligned)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The consideration would be that Sse2.Store doesn't allow containment on pre-AVX hardware and on pre-AVX hardware movups was frequently slower than movaps even if the data was definitely aligned.

But, given the age of the hardware involved and the logic we use elsewhere, it shouldn't be a huge consideration (and the cost also wasn't huge, typically an extra cycle or so; the split cache line accesses were much worse and closer to 20 cycles for loads).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Plus, we don't use aligned loads/stores in most of our workloads already, so this won't be any different.

WidenLatin1ToUtf16_Sse2 and NarrowUtf16ToLatin1_Sse2 are the only methods in the repository that use StoreAligned.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like Sse2.Store approach is better since unaligned char* should be rare? (e.g. char[] is always aligned)

A potential case is the public API System.Text.Encoding.GetChars(System.Byte*, System.Int32, System.Char*, System.Int32).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WidenLatin1ToUtf16_Sse2 and NarrowUtf16ToLatin1_Sse2 are the only methods in the repository that use StoreAligned.

Right, and so preserving the alignment here won't make a big difference to downlevel hardware. We should just normalize to not using StoreAligned so the code can be simpler and robust in the face of unalignable data.

@tarekgh tarekgh added this to the 9.0.0 milestone Aug 16, 2023
Copy link
Member

@adamsitnik adamsitnik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should just normalize to not using StoreAligned so the code can be simpler and robust in the face of unalignable data.

I agree with @tannergooding . @xtqqczze please apply the suggested change

@xtqqczze thank you for your contribution!

@ghost ghost added the needs-author-action An issue or pull request that requires more info or actions from the author. label Oct 20, 2023
@adamsitnik adamsitnik self-assigned this Oct 22, 2023
@ghost ghost added the no-recent-activity label Nov 5, 2023
@ghost
Copy link

ghost commented Nov 5, 2023

This pull request has been automatically marked no-recent-activity because it has not had any activity for 14 days. It will be closed if no further activity occurs within 14 more days. Any new comment (by anyone, not necessarily the author) will remove no-recent-activity.

@ghost
Copy link

ghost commented Nov 20, 2023

This pull request will now be closed since it had been marked no-recent-activity but received no further activity in the past 14 days. It is still possible to reopen or comment on the pull request, but please note that it will be locked if it remains inactive for another 30 days.

@ghost ghost closed this Nov 20, 2023
@github-actions github-actions bot locked and limited conversation to collaborators Dec 20, 2023
This pull request was closed.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-System.Text.Encoding community-contribution Indicates that the PR has been added by a community member needs-author-action An issue or pull request that requires more info or actions from the author. no-recent-activity
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants