Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disable pool&conv_transpose&quantize caching #36695

Merged
merged 15 commits into from
Nov 5, 2021

Conversation

jczaja
Copy link
Contributor

@jczaja jczaja commented Oct 25, 2021

PR types

Bug fixes

PR changes

OPs

Describe

PaddlePaddle is caching oneDNN objects and this PR is to disable this for pool2d and conv_transpose. Reason is that oneDNN is caching some of its objects itself

@paddle-bot-old
Copy link

Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

@jczaja jczaja added the Intel label Oct 25, 2021
@jczaja jczaja changed the title [WIP] disable pool&conv_transpose caching Disable pool&conv_transpose&quantize caching Oct 25, 2021
@jczaja jczaja requested review from wozna and jakpiase October 25, 2021 13:10
@jczaja
Copy link
Contributor Author

jczaja commented Oct 25, 2021

@wozna Please take a look at changes in quantize.

@jczaja
Copy link
Contributor Author

jczaja commented Oct 26, 2021

@tsocha Please review

tsocha
tsocha previously approved these changes Oct 26, 2021
for (int j = 0; j < o; ++j) {
int in_offset = j * h * w + i * o * h * w;
int out_offset = j * c * h * w + i * h * w;
std::memcpy(&(reordered_filter_data.get())[out_offset],
Copy link
Contributor

@jakpiase jakpiase Oct 26, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know that might not be your code, but shouldn't that reorder be done using oneDNN?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:) @Silv3S is working on that.

Copy link
Contributor

@lidanqing-intel lidanqing-intel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@jczaja
Copy link
Contributor Author

jczaja commented Oct 26, 2021

@chenwhql Could you please approve so that PR-CI-APPROVAL pass? PADDLE_ENFORCE is correct but due only part of it is visible in diff, corresponding checker cannot see full statement and report error.

@lidanqing-intel
Copy link
Contributor

@baoachun @Superjomn This PR is part of the improving cache mechanism. CI reported PADDLE_ENFORCE error but actually it is correct but due only part of it is visible in diff, corresponding checker cannot see full statement and report error.
Could you please help approve it? Thanks.

platform::errors::InvalidArgument(
"Got wrong formats for Filter tensor."));
"Got wrong format for Bias tensor."));
Copy link
Contributor

@lidanqing-intel lidanqing-intel Oct 28, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image
Hi, @baoachun
这里,错误信息长度符合要求,要求至少20个字符长度,这里错误信息有30个字符左右长度。这里的问题是PADDLE_ENFORCE 机制不认未更改的行,这里 95行 platform::errors::InvalidArgument(已经写了,只是这个PR没有更改,所以就没检测到,图片里InvalidArgument代码就没有。但是其实代码中是有的,是PADLLE_ENFORCE机制没检测到。

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个PADDLE_ENFORCE机制以后会考虑优化吗,以后可能也经常有类似问题。

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里是字符串正则匹配检测的,先grep找出修改了的行,然后正则匹配做合规检查,如果全文件正则检查,可以避免这个问题,但是可能会检出和PR不相关的一些问题,不太友好。。。目前这一点没有想到好的办法处理,先找我approve吧

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

好的,非常感谢!

wozna
wozna previously approved these changes Oct 28, 2021
Copy link
Contributor

@wozna wozna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

jakpiase
jakpiase previously approved these changes Oct 28, 2021
Copy link
Contributor

@jakpiase jakpiase left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@lidanqing-intel lidanqing-intel changed the title Disable pool&conv_transpose&quantize caching [Approval Needed] Disable pool&conv_transpose&quantize caching Nov 1, 2021
@paddle-bot-old
Copy link

paddle-bot-old bot commented Nov 2, 2021

Sorry to inform you that b27d1b8's CIs have passed for more than 7 days. To prevent PR conflicts, you need to re-run all CIs manually.

chenwhql
chenwhql previously approved these changes Nov 2, 2021
Copy link
Contributor

@chenwhql chenwhql left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM for PADDLE_ENFORCE

@lidanqing-intel lidanqing-intel changed the title [Approval Needed] Disable pool&conv_transpose&quantize caching [Approval-CI Passed] Disable pool&conv_transpose&quantize caching Nov 2, 2021
@lidanqing-intel lidanqing-intel changed the title [Approval-CI Passed] Disable pool&conv_transpose&quantize caching Disable pool&conv_transpose&quantize caching Nov 2, 2021
@lidanqing-intel
Copy link
Contributor

@jczaja This PR is approved by Baidu. But could you please watch PR-CI-iScan-C to pass ? Cause only this one keep failing.

@lidanqing-intel lidanqing-intel changed the title Disable pool&conv_transpose&quantize caching [Approved] Disable pool&conv_transpose&quantize caching Nov 2, 2021
@jczaja
Copy link
Contributor Author

jczaja commented Nov 2, 2021

@baoachun Please help with failing CI (it did pass few reruns ago). Error is: "git code download failed ". I have try couple of reruns but without success.

@baoachun
Copy link
Contributor

baoachun commented Nov 3, 2021

@baoachun Please help with failing CI (it did pass few reruns ago). Error is: "git code download failed ". I have try couple of reruns but without success.

My colleague is solving the problem, please wait.

@baoachun
Copy link
Contributor

baoachun commented Nov 4, 2021

Hi @jczaja, my colleague said that you may need to pull the latest code of the develop branch.

@jczaja
Copy link
Contributor Author

jczaja commented Nov 4, 2021

@baoachun I have rebased this PR but still PR-CI-iScan-C is failing (I have done one rerurn). Please help

@lidanqing-intel lidanqing-intel changed the title [Approved] Disable pool&conv_transpose&quantize caching [Waiting some CIs exemption] Disable pool&conv_transpose&quantize caching Nov 4, 2021
@lidanqing-intel lidanqing-intel changed the title [Waiting some CIs exemption] Disable pool&conv_transpose&quantize caching Disable pool&conv_transpose&quantize caching Nov 5, 2021
@jczaja jczaja merged commit db6c00c into PaddlePaddle:develop Nov 5, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants