Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Light up BitArray APIs with Vector512 code path #91903

Merged
merged 13 commits into from
Sep 21, 2023

Conversation

khushal1996
Copy link
Contributor

@khushal1996 khushal1996 commented Sep 11, 2023

This PR is about adding Vector512 support to the existing BitArray library APIs.

Upgrading the following APIs

  1. AND
  2. OR
  3. NOT
  4. XOR
  5. CopyTo

PERF

AND

Method Toolchain Size Mean Error StdDev Median Min Max Ratio
BitArrayAnd \Base_repos\ 4 5.755 ns 0.0587 ns 0.0521 ns 5.742 ns 5.688 ns 5.877 ns 1.00
BitArrayAnd \Git_repos\ 4 3.960 ns 0.0670 ns 0.0594 ns 3.949 ns 3.889 ns 4.099 ns 0.69
BitArrayAnd \Base_repos\ 512 20.326 ns 0.2416 ns 0.2260 ns 20.249 ns 20.082 ns 20.852 ns 1.00
BitArrayAnd \Git_repos\ 512 15.694 ns 0.2445 ns 0.2287 ns 15.753 ns 15.254 ns 15.945 ns 0.77
BitArrayAnd \Base_repos\ 1024 29.006 ns 0.3448 ns 0.3057 ns 28.949 ns 28.671 ns 29.767 ns 1.00
BitArrayAnd \Git_repos\ 1024 27.184 ns 0.2091 ns 0.1746 ns 27.108 ns 26.971 ns 27.492 ns 0.94
BitArrayAnd \Base_repos\ 2048 48.374 ns 0.2138 ns 0.1895 ns 48.372 ns 48.084 ns 48.813 ns 1.00
BitArrayAnd \Git_repos\ 2048 36.306 ns 0.4263 ns 0.3988 ns 36.361 ns 35.164 ns 36.745 ns 0.75
BitArrayAnd \Base_repos\ 5096 119.520 ns 0.4770 ns 0.3983 ns 119.573 ns 118.475 ns 120.006 ns 1.00
BitArrayAnd \Git_repos\ 5096 87.259 ns 0.3279 ns 0.3067 ns 87.360 ns 86.864 ns 87.705 ns 0.73
BitArrayAnd \Base_repos\ 10192 235.570 ns 0.7442 ns 0.6961 ns 235.487 ns 234.528 ns 236.940 ns 1.00
BitArrayAnd \Git_repos\ 10192 169.757 ns 0.4938 ns 0.4619 ns 169.847 ns 168.973 ns 170.546 ns 0.72
BitArrayAnd \Base_repos\ 21000 429.673 ns 6.5540 ns 6.1306 ns 431.928 ns 413.258 ns 435.284 ns 1.00
BitArrayAnd \Git_repos\ 21000 341.525 ns 1.0404 ns 0.9732 ns 341.577 ns 339.583 ns 343.168 ns 0.80
BitArrayAnd \Base_repos\ 40768 2,161.032 ns 12.9112 ns 11.4454 ns 2,160.517 ns 2,134.601 ns 2,176.669 ns 1.00
BitArrayAnd \Git_repos\ 40768 1,571.903 ns 5.9028 ns 5.5215 ns 1,571.657 ns 1,560.521 ns 1,580.874 ns 0.73
BitArrayAnd \Base_repos\ 45000 1,717.522 ns 5.6070 ns 4.6821 ns 1,719.758 ns 1,708.995 ns 1,724.478 ns 1.00
BitArrayAnd \Git_repos\ 45000 1,636.948 ns 4.8665 ns 4.3140 ns 1,635.762 ns 1,631.063 ns 1,643.209 ns 0.95
BitArrayAnd \Base_repos\ 81536 3,237.686 ns 12.3738 ns 11.5745 ns 3,236.375 ns 3,217.934 ns 3,255.880 ns 1.00
BitArrayAnd \Git_repos\ 81536 3,044.604 ns 10.7379 ns 9.5189 ns 3,045.358 ns 3,029.502 ns 3,062.342 ns 0.94

OR

Method Toolchain Size Mean Error StdDev Median Min Max Ratio
BitArrayOr \Base_repos\ 4 3.416 ns 0.0128 ns 0.0120 ns 3.413 ns 3.399 ns 3.443 ns 1.00
BitArrayOr \Git_repos\ 4 3.410 ns 0.0131 ns 0.0109 ns 3.408 ns 3.395 ns 3.427 ns 1.00
BitArrayOr \Base_repos\ 512 17.693 ns 0.0251 ns 0.0234 ns 17.692 ns 17.656 ns 17.729 ns 1.00
BitArrayOr \Git_repos\ 512 14.068 ns 0.1786 ns 0.1583 ns 14.077 ns 13.632 ns 14.271 ns 0.80
BitArrayOr \Base_repos\ 1024 24.655 ns 0.0538 ns 0.0449 ns 24.661 ns 24.571 ns 24.716 ns 1.00
BitArrayOr \Git_repos\ 1024 22.728 ns 0.0317 ns 0.0281 ns 22.736 ns 22.685 ns 22.772 ns 0.92
BitArrayOr \Base_repos\ 2048 42.538 ns 0.2008 ns 0.1780 ns 42.559 ns 42.143 ns 42.816 ns 1.00
BitArrayOr \Git_repos\ 2048 33.692 ns 0.0833 ns 0.0738 ns 33.704 ns 33.506 ns 33.789 ns 0.79
BitArrayOr \Base_repos\ 5096 104.173 ns 0.2304 ns 0.2042 ns 104.197 ns 103.889 ns 104.588 ns 1.00
BitArrayOr \Git_repos\ 5096 77.893 ns 0.1001 ns 0.0936 ns 77.921 ns 77.720 ns 78.064 ns 0.75
BitArrayOr \Base_repos\ 10192 205.448 ns 0.5040 ns 0.4468 ns 205.458 ns 204.724 ns 206.357 ns 1.00
BitArrayOr \Git_repos\ 10192 152.242 ns 0.5260 ns 0.4663 ns 152.096 ns 151.806 ns 153.459 ns 0.74
BitArrayOr \Base_repos\ 21000 421.743 ns 1.4978 ns 1.3278 ns 422.123 ns 418.260 ns 423.429 ns 1.00
BitArrayOr \Git_repos\ 21000 304.741 ns 0.5222 ns 0.4629 ns 304.811 ns 303.993 ns 305.515 ns 0.72
BitArrayOr \Base_repos\ 40768 1,416.464 ns 1.3063 ns 1.2219 ns 1,416.352 ns 1,413.959 ns 1,417.812 ns 1.00
BitArrayOr \Git_repos\ 40768 1,396.072 ns 2.5532 ns 2.2633 ns 1,396.338 ns 1,392.342 ns 1,400.270 ns 0.99
BitArrayOr \Base_repos\ 45000 1,489.972 ns 4.8809 ns 4.3268 ns 1,489.743 ns 1,480.979 ns 1,496.312 ns 1.00
BitArrayOr \Git_repos\ 45000 1,456.042 ns 3.0334 ns 2.8374 ns 1,455.710 ns 1,452.493 ns 1,461.546 ns 0.98
BitArrayOr \Base_repos\ 81536 3,099.383 ns 6.4334 ns 6.0178 ns 3,098.615 ns 3,091.553 ns 3,111.546 ns 1.00
BitArrayOr \Git_repos\ 81536 2,703.577 ns 6.2980 ns 5.5830 ns 2,702.195 ns 2,696.802 ns 2,717.062 ns 0.87

NOT

Method Toolchain Size Mean Error StdDev Median Min Max Ratio RatioSD
BitArrayNot \Base_repos\ 4 1.856 ns 0.0139 ns 0.0116 ns 1.859 ns 1.827 ns 1.872 ns 1.00 0.00
BitArrayNot \Git_repos\ 4 4.761 ns 0.0326 ns 0.0305 ns 4.760 ns 4.709 ns 4.813 ns 2.56 0.03
BitArrayNot \Base_repos\ 512 14.282 ns 0.0727 ns 0.0607 ns 14.282 ns 14.208 ns 14.436 ns 1.00 0.00
BitArrayNot \Git_repos\ 512 12.737 ns 0.0451 ns 0.0422 ns 12.728 ns 12.653 ns 12.809 ns 0.89 0.01
BitArrayNot \Base_repos\ 1024 21.502 ns 0.0392 ns 0.0347 ns 21.497 ns 21.451 ns 21.576 ns 1.00 0.00
BitArrayNot \Git_repos\ 1024 16.260 ns 0.0499 ns 0.0417 ns 16.248 ns 16.204 ns 16.338 ns 0.76 0.00
BitArrayNot \Base_repos\ 2048 41.464 ns 0.2000 ns 0.1871 ns 41.358 ns 41.252 ns 41.797 ns 1.00 0.00
BitArrayNot \Git_repos\ 2048 26.453 ns 0.0839 ns 0.0701 ns 26.441 ns 26.385 ns 26.634 ns 0.64 0.00
BitArrayNot \Base_repos\ 5096 103.785 ns 0.9264 ns 0.7233 ns 104.006 ns 101.965 ns 104.568 ns 1.00 0.00
BitArrayNot \Git_repos\ 5096 74.959 ns 0.6169 ns 0.5468 ns 74.974 ns 74.251 ns 76.167 ns 0.72 0.01
BitArrayNot \Base_repos\ 10192 193.150 ns 1.7687 ns 1.5679 ns 192.848 ns 190.961 ns 196.477 ns 1.00 0.00
BitArrayNot \Git_repos\ 10192 141.069 ns 0.8077 ns 0.6745 ns 140.804 ns 140.271 ns 142.497 ns 0.73 0.00
BitArrayNot \Base_repos\ 21000 381.219 ns 4.0155 ns 3.5597 ns 382.620 ns 373.034 ns 385.203 ns 1.00 0.00
BitArrayNot \Git_repos\ 21000 285.250 ns 0.6539 ns 0.5797 ns 285.243 ns 284.068 ns 286.175 ns 0.75 0.01
BitArrayNot \Base_repos\ 40768 745.946 ns 10.7711 ns 10.0753 ns 741.678 ns 736.959 ns 767.180 ns 1.00 0.00
BitArrayNot \Git_repos\ 40768 555.144 ns 9.7408 ns 9.1115 ns 553.072 ns 546.462 ns 573.699 ns 0.74 0.02
BitArrayNot \Base_repos\ 45000 830.258 ns 9.7348 ns 9.1059 ns 828.995 ns 815.830 ns 849.017 ns 1.00 0.00
BitArrayNot \Git_repos\ 45000 604.994 ns 11.2423 ns 8.7772 ns 608.792 ns 583.741 ns 611.298 ns 0.73 0.02
BitArrayNot \Base_repos\ 81536 1,949.911 ns 31.4415 ns 29.4104 ns 1,935.153 ns 1,916.576 ns 1,999.952 ns 1.00 0.00
BitArrayNot \Git_repos\ 81536 1,613.934 ns 16.9771 ns 15.8804 ns 1,609.429 ns 1,597.242 ns 1,644.358 ns 0.83 0.01

XOR

Method Toolchain Size Mean Error StdDev Median Min Max Ratio
BitArrayXor \Base_repos\ 4 3.441 ns 0.0444 ns 0.0415 ns 3.424 ns 3.388 ns 3.518 ns 1.00
BitArrayXor \Git_repos\ 4 3.403 ns 0.0199 ns 0.0186 ns 3.408 ns 3.359 ns 3.430 ns 0.99
BitArrayXor \Base_repos\ 512 17.665 ns 0.0436 ns 0.0364 ns 17.656 ns 17.608 ns 17.739 ns 1.00
BitArrayXor \Git_repos\ 512 14.298 ns 0.0943 ns 0.0836 ns 14.291 ns 14.176 ns 14.449 ns 0.81
BitArrayXor \Base_repos\ 1024 25.079 ns 0.0507 ns 0.0450 ns 25.070 ns 25.021 ns 25.155 ns 1.00
BitArrayXor \Git_repos\ 1024 25.393 ns 0.3087 ns 0.2736 ns 25.504 ns 24.662 ns 25.712 ns 1.01
BitArrayXor \Base_repos\ 2048 48.558 ns 0.4576 ns 0.4056 ns 48.556 ns 48.091 ns 49.528 ns 1.00
BitArrayXor \Git_repos\ 2048 37.543 ns 0.1663 ns 0.1556 ns 37.549 ns 37.235 ns 37.843 ns 0.77
BitArrayXor \Base_repos\ 5096 119.227 ns 0.6169 ns 0.5469 ns 119.324 ns 118.145 ns 119.979 ns 1.00
BitArrayXor \Git_repos\ 5096 87.705 ns 0.3281 ns 0.2909 ns 87.656 ns 87.304 ns 88.331 ns 0.74
BitArrayXor \Base_repos\ 10192 236.858 ns 1.1733 ns 0.9798 ns 237.090 ns 235.090 ns 238.500 ns 1.00
BitArrayXor \Git_repos\ 10192 170.274 ns 0.4845 ns 0.4046 ns 170.349 ns 169.611 ns 170.684 ns 0.72
BitArrayXor \Base_repos\ 21000 435.484 ns 1.9970 ns 1.7702 ns 435.125 ns 433.164 ns 439.914 ns 1.00
BitArrayXor \Git_repos\ 21000 338.851 ns 2.7534 ns 2.5755 ns 339.787 ns 334.028 ns 342.311 ns 0.78
BitArrayXor \Base_repos\ 40768 1,622.483 ns 22.1149 ns 20.6863 ns 1,629.022 ns 1,565.318 ns 1,654.954 ns 1.00
BitArrayXor \Git_repos\ 40768 1,572.564 ns 7.8358 ns 6.5433 ns 1,570.673 ns 1,564.600 ns 1,583.716 ns 0.97
BitArrayXor \Base_repos\ 45000 1,720.882 ns 10.9159 ns 10.2108 ns 1,718.896 ns 1,705.590 ns 1,743.439 ns 1.00
BitArrayXor \Git_repos\ 45000 1,635.023 ns 13.6223 ns 12.7423 ns 1,633.632 ns 1,612.510 ns 1,661.331 ns 0.95
BitArrayXor \Base_repos\ 81536 3,551.303 ns 27.7492 ns 25.9566 ns 3,556.540 ns 3,511.864 ns 3,595.681 ns 1.00
BitArrayXor \Git_repos\ 81536 3,054.647 ns 22.1350 ns 20.7051 ns 3,057.608 ns 3,014.403 ns 3,087.547 ns 0.86

CopyTo(bool[] arr)

Method Toolchain Size Mean Error StdDev Median Min Max Ratio RatioSD
BitArrayCopyToBoolArray \Base_repos\ 4 12.47 ns 0.064 ns 0.060 ns 12.46 ns 12.38 ns 12.58 ns 1.00 0.00
BitArrayCopyToBoolArray \Git_repos\ 4 17.91 ns 0.185 ns 0.173 ns 17.88 ns 17.67 ns 18.23 ns 1.44 0.02
BitArrayCopyToBoolArray \Base_repos\ 512 161.35 ns 1.552 ns 1.376 ns 160.72 ns 159.72 ns 163.86 ns 1.00 0.00
BitArrayCopyToBoolArray \Git_repos\ 512 138.78 ns 1.773 ns 1.481 ns 139.24 ns 135.69 ns 140.40 ns 0.86 0.01
BitArrayCopyToBoolArray \Base_repos\ 1024 292.61 ns 2.186 ns 2.044 ns 292.10 ns 290.52 ns 297.04 ns 1.00 0.00
BitArrayCopyToBoolArray \Git_repos\ 1024 252.47 ns 0.971 ns 0.908 ns 252.21 ns 251.42 ns 254.13 ns 0.86 0.01
BitArrayCopyToBoolArray \Base_repos\ 2048 558.76 ns 2.581 ns 2.155 ns 558.27 ns 554.78 ns 563.54 ns 1.00 0.00
BitArrayCopyToBoolArray \Git_repos\ 2048 479.75 ns 4.603 ns 4.081 ns 481.60 ns 471.01 ns 483.57 ns 0.86 0.01
BitArrayCopyToBoolArray \Base_repos\ 5096 1,345.30 ns 17.294 ns 16.177 ns 1,343.58 ns 1,324.55 ns 1,374.42 ns 1.00 0.00
BitArrayCopyToBoolArray \Git_repos\ 5096 1,168.02 ns 6.042 ns 5.356 ns 1,167.66 ns 1,158.34 ns 1,178.37 ns 0.87 0.01
BitArrayCopyToBoolArray \Base_repos\ 10192 2,609.57 ns 27.355 ns 25.588 ns 2,605.42 ns 2,559.66 ns 2,643.63 ns 1.00 0.00
BitArrayCopyToBoolArray \Git_repos\ 10192 2,356.48 ns 21.120 ns 18.723 ns 2,360.62 ns 2,316.14 ns 2,385.56 ns 0.90 0.01
BitArrayCopyToBoolArray \Base_repos\ 21000 5,526.89 ns 49.638 ns 44.003 ns 5,518.23 ns 5,441.50 ns 5,619.36 ns 1.00 0.00
BitArrayCopyToBoolArray \Git_repos\ 21000 4,866.75 ns 25.872 ns 22.934 ns 4,865.47 ns 4,823.92 ns 4,912.20 ns 0.88 0.01
BitArrayCopyToBoolArray \Base_repos\ 40768 10,733.13 ns 47.056 ns 44.017 ns 10,726.90 ns 10,641.33 ns 10,800.83 ns 1.00 0.00
BitArrayCopyToBoolArray \Git_repos\ 40768 9,521.82 ns 56.416 ns 52.771 ns 9,510.39 ns 9,452.20 ns 9,615.65 ns 0.89 0.01
BitArrayCopyToBoolArray \Base_repos\ 45000 11,902.28 ns 223.652 ns 239.305 ns 11,908.61 ns 11,386.91 ns 12,372.51 ns 1.00 0.00
BitArrayCopyToBoolArray \Git_repos\ 45000 10,452.36 ns 178.697 ns 167.153 ns 10,377.74 ns 10,261.44 ns 10,731.30 ns 0.88 0.03
BitArrayCopyToBoolArray \Base_repos\ 81536 21,451.38 ns 180.884 ns 169.199 ns 21,407.75 ns 21,229.71 ns 21,783.76 ns 1.00 0.00
BitArrayCopyToBoolArray \Git_repos\ 81536 18,816.90 ns 257.678 ns 241.033 ns 18,749.22 ns 18,569.43 ns 19,377.38 ns 0.88 0.01

@ghost ghost added the community-contribution Indicates that the PR has been added by a community member label Sep 11, 2023
@ghost
Copy link

ghost commented Sep 11, 2023

Tagging subscribers to this area: @dotnet/area-system-collections
See info in area-owners.md if you want to be subscribed.

Issue Details

NO NEED TO REVIEW RIGHT NOW

This PR is about adding Vector512 support to the existing BitArray library APIs.

Upgrading the following APIs

  1. AND
  2. OR
  3. NOT
  4. XOR
  5. CopyTo

PERF

Author: khushal1996
Assignees: -
Labels:

area-System.Collections

Milestone: -

Copy link
Member

@gfoidl gfoidl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One point for better codegen, didn't look into the rest at the moment.

@@ -141,7 +137,7 @@ public unsafe BitArray(bool[] values)

if (Vector256.IsHardwareAccelerated)
{
for (; (i + Vector256ByteCount) <= (uint)values.Length; i += Vector256ByteCount)
for (; (i + Vector256<byte>.Count) <= (uint)values.Length; i += (uint)Vector256<byte>.Count)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
for (; (i + Vector256<byte>.Count) <= (uint)values.Length; i += (uint)Vector256<byte>.Count)
for (; (i) <= (uint)values.Length - Vector256<byte>.Count; i += (uint)Vector256<byte>.Count)

With the suggestion i can be used from a register directly, same for the length-count.
With the code as is i needs to be added first, then compared.

See the difference in M1 and M2 from sharplab (silly) example.

Same on other places.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have made the suggested change

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tannergooding do you wat this to be in the scope of this PR? this PR was about upgrading the APIs mentioned in the description. This is surely a optimization opportunity but just checking if adding this here is fine.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's simply enough to include here. But generally speaking it's nice to separate the two considerations as it helps us more easily track performance differences

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ohkay. They are already added.

@khushal1996 khushal1996 marked this pull request as ready for review September 14, 2023 05:53
@khushal1996
Copy link
Contributor Author

@DeepakRajendrakumaran @Ruihan-Yin for the review

@tannergooding
Copy link
Member

I've asked for a secondary review on this and it should be mergeable after that happens.

@tannergooding tannergooding merged commit 3470c4c into dotnet:main Sep 21, 2023
105 checks passed
@ghost ghost locked as resolved and limited conversation to collaborators Oct 22, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-System.Collections community-contribution Indicates that the PR has been added by a community member
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants