Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add some passes which can be applied to Program #34730

Merged
merged 14 commits into from
Aug 17, 2021

Conversation

sneaxiy
Copy link
Collaborator

@sneaxiy sneaxiy commented Aug 9, 2021

PR types

New features

PR changes

Others

Describe

Add some passes which can be applied to Program:

  • sync_batch_norm_pass
  • fuse_relu_depthwise_conv_pass
  • fuse_bn_act_pass
  • fuse_bn_add_act_pass
  • fusion_group_pass
  • fuse_elewise_add_act_pass
  • fuse_adam_op_pass
  • fuse_sgd_op_pass
  • fuse_momentum_op_pass
  • runtime_context_cache_pass
  • inplace_addto_op_pass
  • buffer_shared_inplace_pass

TODO: add fuse_allreduce_op_pass.

@paddle-bot-old
Copy link

paddle-bot-old bot commented Aug 9, 2021

Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

Copy link
Contributor

@chenwhql chenwhql left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM for framework.py & blcok_desc.h

@@ -238,5 +238,37 @@ BlockDesc *BlockDesc::ForwardBlock() const {
return prog_->MutableBlock(static_cast<size_t>(desc_->forward_block_idx()));
}

void BlockDesc::MoveFrom(BlockDesc *block) {
PADDLE_ENFORCE_NOT_NULL(
block, platform::errors::InvalidArgument("block must be provided"));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we recommend the first letter of error message sentence capitalized and ends with ., other cases are same

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

outputs[i]->ShareBufferWith(*inputs[i]);
VLOG(10) << "Share tensor buffer " << (*input_args)[i] << " -> "
<< (*output_args)[i];
if (!share_dims.empty() && share_dims[i]) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better check share_dims.size()==n

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have been checked here.

update_attr(attrs, attr_types, "nranks", 1, "size_t")
update_attr(attrs, attr_types, "use_cuda", False, "bool")
# TODO(zjl): how to skip fetch variables ?
update_attr(attrs, attr_types, "mem_opt_skip_vars",
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. I only skip the variables that satisfies var.is_data == True currently.

const auto &startups =
graph.Get<details::ProgramDescs>(details::kStartupProgramDescs);
VLOG(10) << "Merge startup programs";
MergePrograms(startup_program, startups, /*append=*/true);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder why we need to merge here? IF startup_program is the applied program, why it will lose some vars or ops?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commonly, startup_program would not lose some vars or ops. We only need to merge the created kStartupProgramDescs to the startup_program. Some passes may produce kStartupProgramDescs (a list of some ProgramDescs) (see here) and run it once when using ParallelExecutor (see here). So when we apply passes to ProgramDescs, startup_program should contain these kStartupProgramDescs, so that these new created ops can run and only run once.

Comment on lines 379 to 381
if (is_first_var_valid == is_second_var_valid) {
continue;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is_first_var_valid and is_second_var_valid are both true is ok ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. I agree with you.

Copy link
Collaborator Author

@sneaxiy sneaxiy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your suggestions, @zhiqiu . I do think is_first_var_valid and is_second_var_valid are both true is ok. I have added the logic. Please review again.

outputs[i]->ShareBufferWith(*inputs[i]);
VLOG(10) << "Share tensor buffer " << (*input_args)[i] << " -> "
<< (*output_args)[i];
if (!share_dims.empty() && share_dims[i]) {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have been checked here.

update_attr(attrs, attr_types, "nranks", 1, "size_t")
update_attr(attrs, attr_types, "use_cuda", False, "bool")
# TODO(zjl): how to skip fetch variables ?
update_attr(attrs, attr_types, "mem_opt_skip_vars",
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. I only skip the variables that satisfies var.is_data == True currently.

const auto &startups =
graph.Get<details::ProgramDescs>(details::kStartupProgramDescs);
VLOG(10) << "Merge startup programs";
MergePrograms(startup_program, startups, /*append=*/true);
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commonly, startup_program would not lose some vars or ops. We only need to merge the created kStartupProgramDescs to the startup_program. Some passes may produce kStartupProgramDescs (a list of some ProgramDescs) (see here) and run it once when using ParallelExecutor (see here). So when we apply passes to ProgramDescs, startup_program should contain these kStartupProgramDescs, so that these new created ops can run and only run once.

Comment on lines 171 to 178

scope1 = paddle.static.Scope()
with paddle.static.scope_guard(scope1):
self.executor.run(startup1)

scope2 = paddle.static.Scope()
with paddle.static.scope_guard(scope2):
self.executor.run(startup2)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
scope1 = paddle.static.Scope()
with paddle.static.scope_guard(scope1):
self.executor.run(startup1)
scope2 = paddle.static.Scope()
with paddle.static.scope_guard(scope2):
self.executor.run(startup2)
paddle.seed(2021)
scope1 = paddle.static.Scope()
with paddle.static.scope_guard(scope1):
self.executor.run(startup1)
paddle.seed(2021)
scope2 = paddle.static.Scope()
with paddle.static.scope_guard(scope2):
self.executor.run(startup2)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Thanks for you suggestions!

zhiqiu
zhiqiu previously approved these changes Aug 15, 2021
Copy link
Contributor

@zhiqiu zhiqiu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

PADDLE_ENFORCE_EQ(block.ID(), 0, platform::errors::Unimplemented(
"Inplace can only perform in block 0."));
// only take block 0 gc_vars
const auto all_gc_vars =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Readability: suggest to rename 'all_gc_vars' to 'op_gc_vars' or other name, then it is easier to know why its size is same to all_ops.size() and we can know 'op_gc_vars[i]' means 'gc_vars' of 'op[i]'

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

for (const auto &pair : var_pair) {
const auto &input_slot = pair.first;
const auto &output_slot = pair.second;
auto input_var = GetFirstVarName(op, input_slot, true);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not forward to read the meaning of boolean without looking at the code of GetFirstVarName.

Suggest for two options:

  1. `/* is_input = */ true

2 Change GetFirstVarName to GetNameMapFirstVar, then pass it op.Inputs() instead of op

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Contributor

@lanxianghit lanxianghit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

API.spec changed,but no Python API changed,approve for this.

Copy link
Contributor

@jzhang533 jzhang533 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

Copy link
Contributor

@Xreki Xreki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM for op benchmark ci

Copy link
Member

@zhhsplendid zhhsplendid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@sneaxiy sneaxiy merged commit 8046e33 into PaddlePaddle:develop Aug 17, 2021
@sneaxiy sneaxiy deleted the program_pass_dev branch August 17, 2021 03:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants