Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gemm acc issue on AVX512 and AVX #15447

Closed
tensor-tang opened this issue Jan 21, 2019 · 7 comments
Closed

gemm acc issue on AVX512 and AVX #15447

tensor-tang opened this issue Jan 21, 2019 · 7 comments
Labels

Comments

@tensor-tang
Copy link
Contributor

tensor-tang commented Jan 21, 2019

code base:
#15448

how to reproduce

  1. cmake .. -DCMAKE_BUILD_TYPE=Release -DWITH_GPU=OFF -DWITH_MKL=ON -DWITH_TESTING=ON -DWITH_FLUID_ONLY=ON -DWITH_DOC=OFF -DWITH_MKLDNN=OFF -DWITH_CONTRIB=OFF
  2. make jit_kernel_test -j
  3. make test ARGS="-R jit_kernel_test -V"

This would pass.

But when delete line here, and do step2 and step3, then it will fail like:

...
73: /home/tangjian/paddle-tj-docker/paddle/fluid/operators/jit/test.cc:42: Failure
73: The difference between target[i] and refer[i] is 5.340576171875e-05, which exceeds FLAGS_acc, where
73: target[i] evaluates to -55.43963623046875,
73: refer[i] evaluates to -55.439689636230469, and
73: FLAGS_acc evaluates to 1.0000000000000001e-05.
73: /home/tangjian/paddle-tj-docker/paddle/fluid/operators/jit/test.cc:42: Failure
73: The difference between target[i] and refer[i] is 8.392333984375e-05, which exceeds FLAGS_acc, where
73: target[i] evaluates to 93.255752563476562,
73: refer[i] evaluates to 93.255836486816406, and
73: FLAGS_acc evaluates to 1.0000000000000001e-05.
73: /home/tangjian/paddle-tj-docker/paddle/fluid/operators/jit/test.cc:42: Failure
73: The difference between target[i] and refer[i] is 6.103515625e-05, which exceeds FLAGS_acc, where
73: target[i] evaluates to -67.680465698242188,
73: refer[i] evaluates to -67.680526733398438, and
73: FLAGS_acc evaluates to 1.0000000000000001e-05.
...

Failed on both 2620v2 and 5117.

@tensor-tang tensor-tang changed the title gemm acc issue on AVX512 gemm acc issue on AVX512 and AVX Jan 21, 2019
@tensor-tang
Copy link
Contributor Author

tensor-tang commented Jan 21, 2019

This is a very urgent issue. Please help to fix this in high priority @jianhang-liu .
Thanks.

@luotao1
Copy link
Contributor

luotao1 commented Jan 21, 2019

Could you try #15450, maybe it is a same problem with #15032 (comment)?

@tensor-tang
Copy link
Contributor Author

This is nothing about scope cache, actually this should be an independent issue of MKL.

I made a separating test for this issue.

https://github.com/tensor-tang/benchmark/tree/master/gemm

@tensor-tang
Copy link
Contributor Author

tensor-tang commented Jan 21, 2019

It pass on 6148, when tried

export MKL_CBWR=AVX

But this should slow down the speed.

And it's still failed on 2620v2.

export MKL_CBWR=AVX
export KMP_DETERMINISTIC_REDUCTION=yes

@tensor-tang
Copy link
Contributor Author

结论

对于同一个版本的mkl

不同指令集系统, mkl本身的结果就是有一定误差(可能高于1e-5), 可以采用如下办法对齐差别:

export MKL_CBWR=AVX/COMPATIBLE
export KMP_DETERMINISTIC_REDUCTION=yes

前者会force使用指定的指令集run,这样便可保证逻辑与在对应指令集系统结果一致,但是由于换了指令集所以会损失一定的性能,其次只在对齐内存时才有效,还有mkl线程数必须一致。

对于不同版本的mkl

本身就有对不齐的风险

@jianhang-liu
Copy link
Contributor

Run By Run的numeric reproduce, 应该不需要export MKL_CBWR=AVX/COMPATIBLE(这样会强制使用AVX甚至SSE指令),而是应设成Auto(由MKL选择指令集)。这是推荐的设置吧?
Processor By Processor的numeric reproduce, 则需要强行指定MKL_CBWR为具体的指令集(不能使用auto);
Version By Version的numeric reproduce, 应该是无法做到的

@tensor-tang
Copy link
Contributor Author

Yes,再细分下来是这样,多谢补充。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants