-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gemm acc issue on AVX512 and AVX #15447
Comments
This is a very urgent issue. Please help to fix this in high priority @jianhang-liu . |
Could you try #15450, maybe it is a same problem with #15032 (comment)? |
This is nothing about scope cache, actually this should be an independent issue of MKL. I made a separating test for this issue. https://github.com/tensor-tang/benchmark/tree/master/gemm |
It pass on 6148, when tried
But this should slow down the speed. And it's still failed on 2620v2.
|
结论对于同一个版本的mkl不同指令集系统, mkl本身的结果就是有一定误差(可能高于1e-5), 可以采用如下办法对齐差别:
前者会force使用指定的指令集run,这样便可保证逻辑与在对应指令集系统结果一致,但是由于换了指令集所以会损失一定的性能,其次只在对齐内存时才有效,还有mkl线程数必须一致。 对于不同版本的mkl本身就有对不齐的风险 |
Run By Run的numeric reproduce, 应该不需要export MKL_CBWR=AVX/COMPATIBLE(这样会强制使用AVX甚至SSE指令),而是应设成Auto(由MKL选择指令集)。这是推荐的设置吧? |
Yes,再细分下来是这样,多谢补充。 |
code base:
#15448
how to reproduce
cmake .. -DCMAKE_BUILD_TYPE=Release -DWITH_GPU=OFF -DWITH_MKL=ON -DWITH_TESTING=ON -DWITH_FLUID_ONLY=ON -DWITH_DOC=OFF -DWITH_MKLDNN=OFF -DWITH_CONTRIB=OFF
make jit_kernel_test -j
make test ARGS="-R jit_kernel_test -V"
This would pass.
But when delete line here, and do step2 and step3, then it will fail like:
Failed on both 2620v2 and 5117.
The text was updated successfully, but these errors were encountered: