Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize memcpy operation in Eigh #42853

Merged
merged 3 commits into from
May 30, 2022

Conversation

JamesLim-sy
Copy link
Contributor

@JamesLim-sy JamesLim-sy commented May 18, 2022

PR types

Performance optimization

PR changes

OPs

Describe

  • 优化目标:
    Eigh计算的核心计算依赖for循环,循环中每次计算的状态值都需要HtoD的拷贝,优化后将拷贝的次数压缩为1次,降低HtoD的拷贝开销.

  • 效果:
    image
    Eigh中巨大的开销都是由cusolver调用带来的,根据上图统计(测试频次为1000,每次会循环调用256次cusolver),每个cusolver会产生6种内置计算kernel开销,3种DtoH拷贝,3种HtoD拷贝。当前的写法会导致每次cusolver调用后额外再产生一个DtoH的拷贝开销,经过本PR优化调整后消除.

@paddle-bot-old
Copy link

你的PR提交成功,感谢你对开源项目的贡献!
请关注后续CI自动化测试结果,详情请参考Paddle-CI手册
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

@JamesLim-sy JamesLim-sy requested a review from Xreki May 22, 2022 02:08
info,
sizeof(int) * batch_size,
dev_ctx.stream());
for (auto i = 0; i < batch_size; ++i) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DtoH的拷贝,后面要显式sync一下,不然可能CPU上拿到的是脏数据。

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

根据建议已修改

Copy link
Contributor

@Xreki Xreki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@JamesLim-sy JamesLim-sy merged commit 806073d into PaddlePaddle:develop May 30, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants