Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

clear local scope every setp #37569

Merged
merged 3 commits into from
Nov 26, 2021
Merged

clear local scope every setp #37569

merged 3 commits into from
Nov 26, 2021

Conversation

wanghuancoder
Copy link
Contributor

PR types

Bug fixes

PR changes

Others

Describe

PR #37474 在解决test_transformer单测时,在GC中clear了 LoDTensorArray。这导致test_slice_op单测失败:

  • test_transformer单测遇到的问题是,第一轮运行完后,某LoDTensorArray Tensors中的Holder被GC回收了,而LoDTensorArray的vector中仍保留着原有的几个没有Holder的Tensor。而在第二轮时,算子直接在原有LoDTensorArray基础上push_back了几个新的Tensor,此时,前几个Tensor的Holder是为空的。进而导致了后续运行中出现错误。
  • 而在GC中clear了LoDTensorArray后,test_slice_op单测中发生如下场景:slice的输入为一个LoDTensorArray,同时这个输入也是slice_grad的输入,但slice_grad对这个LoDTensorArray的要求是NoNeedBuffer的。slice_grad只需要这个LoDTensorArray的size。因此在slice运行结束后就GC了LoDTensorArray,并clear了vector。而到slice_grad运行时,则无法拿到LoDTensorArray的size。

在原Executor是通过在step结束时,调用DeleteScope会清理掉所有的Variable,因此不会有该问题。
原PE中,也是通过设置num_iteration_per_drop_scope清理scope时,我猜测,在test_transformer这种场景下num_iteration_per_drop_scope的值必须为1。

目前本PR的做法是,在每个Step结束,如果local scope中的Var是LoDTensorArray类型,则调用clear。
需要做GC检查吗?

  • 我认为不需要做GC,如果这个阶段仍有未回收的Holder,那么可以认为是GC的bug,应该改善GC。

@paddle-bot-old
Copy link

Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

// return Fetch Tensors
auto* fetch_var = global_scope_->Var(interpreter::kFetchVarName);
return std::move(*fetch_var->GetMutable<framework::FetchList>());
}

void InterpreterCore::ClearLocalScope() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rename to ClearLoDTensorArrayInLocalScope and add some comments

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done, thx!

Copy link
Contributor

@zhiqiu zhiqiu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@wanghuancoder wanghuancoder merged commit 641038d into PaddlePaddle:develop Nov 26, 2021
Zjq9409 pushed a commit to Zjq9409/Paddle that referenced this pull request Dec 10, 2021
* clear local scope every setp, test=develop

* refine,test=develop

* refine, test=develop
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants