Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

complete data layout transform #7440

Merged
merged 17 commits into from
Jan 19, 2018

Conversation

jacquesqiao
Copy link
Member

@jacquesqiao jacquesqiao commented Jan 11, 2018

fix: #7436

@jacquesqiao jacquesqiao changed the title [WIP]init complete data layout transform init complete data layout transform Jan 17, 2018
@jacquesqiao jacquesqiao changed the title init complete data layout transform complete data layout transform Jan 17, 2018
inline bool TransFromNeeded(const OpKernelType& l, const OpKernelType& r) {
return (!platform::places_are_same_class(l.place_, r.place_)) ||
(l.data_type_ != r.data_type_) || (l.data_layout_ != r.data_layout_);
(l.data_type_ != r.data_type_) ||
Copy link
Contributor

@dzhwinter dzhwinter Jan 18, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe we are in the wrong way.

let's start from a simple convolution case:

data -> conv2d -> batch_norm -> fc -> softmax

we may have a quite different layout pipelines choices.

NHWC->NCHW -> NCHW -> NCHW
NHWC->NHWC -> NCHW -> NCHW
NHWC->NHWC -> NHWC -> NCHW
....
NCHW->NCHW -> NCHW -> NCHW
NCHW->NHWC -> NCHW -> NCHW
....

All of them are legal, but only one of them is the best. in cudnn, NHWC is 20% faster than other formats.
So, the answer to select one from above, it's would be NHWC pipeline, no matter what the data format is, just transform the data into NHWC, then go through the data pipeline.

In the case above, I want to claim that we do not run on mixed devices, we can run on different device type..
It means the data pipeline layout can be determined once we have the device information, we only need to transform data before or after the data pipeline.
data pipeline refer to conv2d -> batch_norm -> fc -> softmax, without data input and output.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be clear, we support multiple device, but not mixed device.

Copy link
Member Author

@jacquesqiao jacquesqiao Jan 18, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, I Agree with you, we have a data process chain, and there should be some best practice. On the other hand, I also think that the current data transform just does transform according to expected_kernel_type and actual_kernel_type_for_var, it does not consider what kind of expected_kernel_type it should use, I think to find the best expected_kernel_type should be implemented in GetExpectedKernelType by the user or by the framework. So it does not conflict with the current implementation.

// out_ptr is used to store the result of data transform function.
// when one transform is done, the in_ptr should be deleted and the out_ptr
// should be assigned to in_ptr for next data transformation.
Tensor* out_ptr = new Tensor();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

new and delete inside DataTransform looks like awful. Furthermore, the memory optimizer cannot track these actions.
How about create 3 Tensor(for each channel transform) in scope?

Copy link
Member Author

@jacquesqiao jacquesqiao Jan 18, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for this comment~~

This data transform is a runtime process, and will not be in ProgramDesc, so memory optimizer can never reach here, it's like some inner logic inside an Operator.

I think here we should have two principles,

  • inner tensor should not see by outside
  • inner tensor should be freed as soon as possible.

I will find a better way to do this

kernel_type_for_var.data_layout_)) {
TransDataLayout(kernel_type_for_var, expected_kernel_type, *in_ptr,
out_ptr);
free_tmp_tensor(&input_tensor, in_ptr);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

free_tmp_tensor looks awful too...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

Copy link
Contributor

@dzhwinter dzhwinter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically LGTM. Please fix the flaws in Next PR.

}
PADDLE_ENFORCE_NOT_NULL(out, "out should not be null");

PADDLE_ENFORCE(transformed, "no transform is done, please check!");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment follow UpperCase, and in written English.
"No transform is applied"

Tensor in;
in.ShareDataWith(input_tensor);
Tensor out;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is the datatype transform ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in next pr


namespace paddle {
namespace framework {

static void PassTensorData(Tensor* from, Tensor* to) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove static kerword

@jacquesqiao jacquesqiao merged commit 0071b5f into PaddlePaddle:develop Jan 19, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

complete data layout transform
2 participants