Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tune scan on A100 #302

Conversation

gevtushenko
Copy link
Collaborator

Description

Partially addresses #238

A100 SXM

I8 I16 I32 I64 I128 F32 F64
-7.38% -12.47% -7.07% -12.34% -39.34% -6.22% -9.67%
T{ct} OffsetT{ct} Elements{io} Ref Time Ref Noise Cmp Time Cmp Noise Diff %Diff Status
I8 I32 2^16 11.274 us 1.62% 11.206 us 1.24% -0.068 us -0.60% PASS
I8 I32 2^20 15.281 us 1.11% 18.514 us 38.02% 3.232 us 21.15% FAIL
I8 I32 2^24 66.328 us 4.75% 68.364 us 1.90% 2.035 us 3.07% FAIL
I8 I32 2^28 985.373 us 0.50% 912.611 us 0.28% -72.762 us -7.38% FAIL
I16 I32 2^16 11.454 us 1.74% 11.843 us 1.89% 0.390 us 3.40% FAIL
I16 I32 2^20 16.281 us 1.14% 18.805 us 29.97% 2.524 us 15.50% FAIL
I16 I32 2^24 75.730 us 3.34% 72.374 us 2.28% -3.356 us -4.43% FAIL
I16 I32 2^28 1.096 ms 0.62% 959.182 us 0.36% -136.606 us -12.47% FAIL
I32 I32 2^16 12.001 us 1.93% 11.339 us 1.52% -0.662 us -5.52% FAIL
I32 I32 2^20 17.548 us 1.27% 17.540 us 1.71% -0.007 us -0.04% PASS
I32 I32 2^24 101.450 us 2.59% 97.099 us 2.18% -4.351 us -4.29% FAIL
I32 I32 2^28 1.461 ms 0.93% 1.358 ms 0.61% -103.249 us -7.07% FAIL
I64 I32 2^16 12.204 us 1.42% 12.424 us 1.27% 0.219 us 1.80% FAIL
I64 I32 2^20 23.132 us 6.37% 23.322 us 2.01% 0.189 us 0.82% PASS
I64 I32 2^24 193.714 us 3.63% 174.254 us 1.29% -19.459 us -10.05% FAIL
I64 I32 2^28 2.918 ms 1.05% 2.558 ms 0.50% -360.115 us -12.34% FAIL
I128 I32 2^16 17.732 us 2.08% 19.449 us 0.91% 1.716 us 9.68% FAIL
I128 I32 2^20 67.733 us 1.93% 66.599 us 2.08% -1.134 us -1.67% PASS
I128 I32 2^24 901.361 us 0.40% 573.754 us 0.35% -327.608 us -36.35% FAIL
I128 I32 2^28 14.352 ms 0.30% 8.706 ms 0.18% -5645.326 us -39.34% FAIL
F32 I32 2^16 11.611 us 2.22% 11.010 us 2.15% -0.601 us -5.18% FAIL
F32 I32 2^20 17.474 us 1.39% 17.918 us 1.48% 0.444 us 2.54% FAIL
F32 I32 2^24 101.294 us 2.91% 97.482 us 2.07% -3.812 us -3.76% FAIL
F32 I32 2^28 1.461 ms 1.01% 1.371 ms 0.59% -90.847 us -6.22% FAIL
F64 I32 2^16 12.154 us 1.40% 11.932 us 1.62% -0.223 us -1.83% FAIL
F64 I32 2^20 23.647 us 1.53% 23.833 us 6.12% 0.185 us 0.78% PASS
F64 I32 2^24 193.145 us 4.23% 177.964 us 1.27% -15.181 us -7.86% FAIL
F64 I32 2^28 2.911 ms 1.25% 2.630 ms 0.50% -281.643 us -9.67% FAIL
C64 I32 2^16 14.414 us 2.30% 14.516 us 1.81% 0.101 us 0.70% PASS
C64 I32 2^20 32.692 us 1.77% 32.494 us 1.72% -0.197 us -0.60% PASS
C64 I32 2^24 313.287 us 0.68% 313.224 us 0.67% -0.063 us -0.02% PASS
C64 I32 2^28 4.894 ms 0.33% 4.893 ms 0.35% -0.713 us -0.01% PASS

A100 PCIe

I8 I16 I32 I64 I128 F32 F64
-8.56% -13.04% -4.8% -7.01% -39.26% -3.27% -4.93%
T{ct} OffsetT{ct} Elements{io} Ref Time Ref Noise Cmp Time Cmp Noise Diff %Diff Status
I8 I32 2^16 10.568 us 13.61% 10.430 us 12.75% -0.138 us -1.30% PASS
I8 I32 2^20 14.421 us 1.28% 16.937 us 34.15% 2.517 us 17.45% FAIL
I8 I32 2^24 66.860 us 3.65% 68.554 us 1.82% 1.694 us 2.53% FAIL
I8 I32 2^28 1.009 ms 0.50% 922.707 us 0.33% -86.393 us -8.56% FAIL
I16 I32 2^16 10.321 us 1.95% 10.345 us 1.65% 0.023 us 0.23% PASS
I16 I32 2^20 15.722 us 1.23% 16.695 us 16.36% 0.974 us 6.19% FAIL
I16 I32 2^24 79.256 us 3.20% 75.415 us 2.42% -3.841 us -4.85% FAIL
I16 I32 2^28 1.171 ms 0.91% 1.018 ms 0.60% -152.608 us -13.04% FAIL
I32 I32 2^16 10.494 us 1.93% 10.458 us 1.93% -0.036 us -0.35% PASS
I32 I32 2^20 17.621 us 1.61% 17.774 us 1.55% 0.153 us 0.87% PASS
I32 I32 2^24 116.338 us 1.84% 112.834 us 1.23% -3.504 us -3.01% FAIL
I32 I32 2^28 1.712 ms 0.73% 1.629 ms 0.50% -82.209 us -4.80% FAIL
I64 I32 2^16 11.498 us 2.00% 11.362 us 1.59% -0.136 us -1.18% PASS
I64 I32 2^20 24.167 us 1.30% 24.570 us 1.59% 0.403 us 1.67% FAIL
I64 I32 2^24 221.281 us 1.40% 210.492 us 0.73% -10.789 us -4.88% FAIL
I64 I32 2^28 3.411 ms 0.81% 3.172 ms 0.50% -239.059 us -7.01% FAIL
I128 I32 2^16 17.267 us 2.24% 18.614 us 1.27% 1.347 us 7.80% FAIL
I128 I32 2^20 68.209 us 1.74% 68.623 us 1.81% 0.414 us 0.61% PASS
I128 I32 2^24 927.940 us 0.41% 584.699 us 0.42% -343.241 us -36.99% FAIL
I128 I32 2^28 14.797 ms 0.33% 8.988 ms 0.17% -5809.738 us -39.26% FAIL
F32 I32 2^16 10.120 us 1.96% 10.131 us 2.72% 0.010 us 0.10% PASS
F32 I32 2^20 17.474 us 1.35% 17.849 us 1.63% 0.375 us 2.15% FAIL
F32 I32 2^24 116.126 us 2.10% 113.764 us 1.13% -2.362 us -2.03% FAIL
F32 I32 2^28 1.696 ms 0.71% 1.641 ms 0.50% -55.495 us -3.27% FAIL
F64 I32 2^16 11.283 us 1.96% 11.031 us 1.52% -0.252 us -2.24% FAIL
F64 I32 2^20 24.829 us 1.17% 25.073 us 9.31% 0.243 us 0.98% PASS
F64 I32 2^24 220.856 us 2.02% 211.677 us 0.70% -9.178 us -4.16% FAIL
F64 I32 2^28 3.355 ms 0.73% 3.190 ms 0.50% -165.506 us -4.93% FAIL
C64 I32 2^16 13.720 us 2.14% 13.400 us 2.11% -0.321 us -2.34% FAIL
C64 I32 2^20 33.843 us 1.58% 33.702 us 1.60% -0.141 us -0.42% PASS
C64 I32 2^24 334.440 us 0.63% 334.209 us 0.67% -0.231 us -0.07% PASS
C64 I32 2^28 5.239 ms 0.40% 5.238 ms 0.43% -0.412 us -0.01% PASS

Checklist

  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@gevtushenko gevtushenko requested review from a team as code owners August 4, 2023 09:12
@gevtushenko gevtushenko requested review from elstehle and ericniebler and removed request for a team August 4, 2023 09:12
Copy link
Collaborator

@elstehle elstehle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Impressively large tile sizes for __[u]int128 😮

@gevtushenko gevtushenko merged commit 10e8a1f into NVIDIA:branch/2.2.x Aug 6, 2023
369 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

2 participants