Cudnn conv op #4195

typhoonzero · 2017-09-19T09:45:43Z

Fix #4194
Fix #4187

TODO list all done, please review, thanks~

Reuse CPU code after Convolution operator #4042 merged.
Use paddle::memory to allock cudnn workspace.
Support group.

… cudnn_conv_op

chengduoZH · 2017-09-25T12:50:45Z

fix #3691

qingqing01 · 2017-09-26T03:35:21Z

paddle/api/PaddleAPI.h

@@ -19,6 +19,7 @@ limitations under the License. */
 #include <stdexcept>
 #include <string>
 #include <vector>
+#include "paddle/gserver/dataproviders/MultiDataProvider.h"


revert this file in this PR.

qingqing01 · 2017-09-26T03:36:54Z

paddle/operators/cudnn_conv_op.cc

@@ -0,0 +1,110 @@
+/* Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserve.


文件命名：conv实现可以统一conv开头，这样所有conv相关的代码都放在一起。

qingqing01 · 2017-09-26T03:38:17Z

paddle/operators/cudnn_conv_op.cc

+  return output_size;
+}
+
+class CudnnConvOp : public framework::OperatorWithKernel {


Op的定义可以和ConvGemm公共一个吧

qingqing01 · 2017-09-26T03:39:11Z

paddle/operators/cudnn_conv_op.cc

+  }
+};
+
+class CudnnConvGradOp : public framework::OperatorWithKernel {


同forward的定义，可以和conv gemm公用一个

qingqing01 · 2017-09-26T03:53:28Z

paddle/operators/cudnn_conv_op.cu

@@ -0,0 +1,260 @@
+/* Copyright (c) 2016 PaddlePaddle Authors All Rights Reserve.


cudnn实现可以用g++编译，直接用.cc是不是就行了，但是注册GPU kernel时，又需要在.cu里，所以那还是这样吧..

嗯，cudnn的API都是host执行的，但API调用后是在device执行的。

qingqing01 · 2017-09-26T05:05:24Z

paddle/operators/cudnn_conv_op.cu

+              cudnn_output_grad_desc, cudnn_conv_desc,
+              // dxDesc: Handle to the previously initialized output tensor
+              // descriptor.
+              cudnn_input_grad_desc, CUDNN_CONVOLUTION_BWD_DATA_PREFER_FASTEST,


CUDNN_CONVOLUTION_BWD_DATA_PREFER_FASTEST使用和forward同样的问题。

qingqing01 · 2017-09-26T05:06:38Z

paddle/operators/cudnn_conv_op.cu

+    cudnn_workspace = paddle::memory::Alloc(gpu, workspace_size_in_bytes);
+    // ------------------- cudnn conv backward data ---------------------
+    // FIXME(typhoonzero): template type T may not be the same as cudnn call.
+    float alpha = 1.0f, beta = 0.0f;


same as above

qingqing01 · 2017-09-26T05:07:36Z

paddle/operators/cudnn_conv_op.cu

+        T* input_grad_data = input_grad->mutable_data<T>(ctx.GetPlace());
+        PADDLE_ENFORCE(platform::dynload::cudnnConvolutionBackwardData(
+            h, &alpha, cudnn_filter_desc, filter_data + i * group_offset_filter,
+            cudnn_output_grad_desc, output_grad_data + i * group_offset_Y,


感觉cudnn_xxx_desc可以把cudnn_前缀去了，名字会短一些。

cudnn_xxx_desc 都是诸如cudnnTensorDescriptor_t的类型，xxx_desc的名称是诸如ScopedTensorDescriptor的类型。cudnn_xxx_desc也会多次调用，所以保存一个临时变量也是有必要的。

qingqing01 · 2017-09-26T05:09:38Z

paddle/operators/cudnn_conv_op.h

+using Tensor = framework::Tensor;
+
+// FIXME(typhoonzer): If CudnnConvOp is running on CPU
+// reuse the code from gemm_conv2d_op.h.


觉得这个可以不用写，GPU上GEMM Conv、CUDNN Conv的选择可以后续Python里(或其他方式)自动选择。

加这个是考虑如果用户配置了使用cudnn conv op，并且需要测试cpu上的执行这样用户的程序可以不用做修改。感觉也没必要增加在pybind中注册类似USE_GPU_ONLY_OP

qingqing01 · 2017-09-26T05:12:16Z

paddle/platform/cudnn_helper.h

-                                            const std::vector<int>& dims) {
-    // the format is not used now, but it maybe useful feature
+                                            const std::vector<int>& dims,
+                                            const int groups = 1) {


ScopedTensorDescriptor不止Conv层可以用，Pooling/BatchNorm均可用，所以这里的group是不是应该在conv op里处理，而不是这个接口处理

这个groups觉得还是移到cudnn_conv_op里处理吧

chengduoZH · 2017-09-27T02:09:56Z

paddle/operators/cudnn_conv_op.cc

+  void InferShape(const framework::InferShapeContext &ctx) const override {
+    auto in = ctx.Input<Tensor>("Input");
+    auto filter = ctx.Input<Tensor>("Filter");
+    auto out = ctx.Output<framework::Tensor>("Output");


从ctx获取的数据要进行非空检测

PADDLE_ENFORCE_NOT_NULL(ctx.InputVar("Input"), "Input(Input) of CudnnConvOp should not be null."); PADDLE_ENFORCE_NOT_NULL(ctx.OutputVar("Output"), "Output(Output) of CudnnConvOp should not be null.");

chengduoZH · 2017-09-27T02:10:35Z

paddle/operators/cudnn_conv_op.cc

+  void InferShape(const framework::InferShapeContext &ctx) const override {
+    auto in = ctx.Input<Tensor>("Input");
+    auto filter = ctx.Input<Tensor>("Filter");
+    auto out = ctx.Output<framework::Tensor>("Output");


auto out = ctx.Output<framework::Tensor>("Output"); => auto out = ctx.Output<Tensor>("Output");

chengduoZH · 2017-09-27T02:16:40Z

paddle/operators/cudnn_conv_op.cc

+    auto d_in = ctx.Output<framework::Tensor>(framework::GradVarName("Input"));
+    auto d_filter =
+        ctx.Output<framework::Tensor>(framework::GradVarName("Filter"));
+    if (d_in) d_in->Resize(in->dims());


同上，从ctx获取数据进行非空检测

另外：

auto d_in = ctx.Output<framework::Tensor>(framework::GradVarName("Input")); auto d_filter = ctx.Output<framework::Tensor>(framework::GradVarName("Filter"));

=>

auto d_in = ctx.Output<Tensor>(framework::GradVarName("Input")); auto d_filter = ctx.Output<Tensor>(framework::GradVarName("Filter"));

chengduoZH · 2017-09-27T02:45:42Z

paddle/operators/cudnn_conv_op.cc

+    AddAttr<std::vector<int>>("strides", "").SetDefault(std::vector<int>{});
+    AddAttr<std::vector<int>>("paddings", "paddings of convolution operator.")
+        .SetDefault(std::vector<int>{});
+    // FIXME(typhoonzero): cudnn doesn't support "group" Attributes.


dilations default{1,1}，strides default{1,1}，paddings default{0,0}

chengduoZH · 2017-09-27T03:11:33Z

paddle/operators/cudnn_conv_op.cc

@@ -0,0 +1,110 @@
+/* Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserve.


conv => conv2d，下同

改成了conv_cudnn_op.x 后续考虑是否可以和3d合并？

我觉得2d 和 3d的可以合并

… cudnn_conv_op

typhoonzero · 2017-09-29T07:52:14Z

@qingqing01 I've encountered some wired problem, I move dims update with groups to conv_cudnn_op.cu instead of in cudnn_helper.h:

// TensorDimWithGroups convert 4d Tensor dims to dims split by groups.
std::vector<int> TensorDimWithGroups(const framework::DDim& dims,
                                     const int groups) {
  // Update tensor descriptor dims setting if groups > 1
  // FIXME(typhoonzero): Assume using NCHW order
  std::vector<int> ret = Dims2Vector(dims);
  if (groups > 1) {
    ret[1] = ret[1] / groups;
  }
  return ret;
}

// FilterDimWithGroups convert filter dims to dims split by groups.
std::vector<int> FilterDimWithGroups(const framework::DDim& dims,
                                     const int groups) {
  // filter layout: MCHW
  std::vector<int> ret = Dims2Vector(dims);
  if (groups > 1) {
    ret[0] = ret[0] / groups;
  }
  return ret;
}

and call filter_desc.descriptor<T>(layout, FilterDimWithGroups(filter->dims(), groups));

Then running gradient check fails like:

AssertionError: Gradient Check On GPUPlace(0) Variable Filter@GRAD max gradient diff 0.148416 over limit 0.050000, the first error element is 0

but put modifying dims in cudnn_helper.h is fine. The updated dims of both methods are all same.

Tested inline functions have nothing to do with this error.

… cudnn_conv_op

chengduoZH · 2017-10-09T04:03:37Z

paddle/operators/conv_cudnn_op.cc

+REGISTER_OP(conv_cudnn, ops::CudnnConvOp, ops::CudnnConvOpMaker,
+            conv_cudnn_grad, ops::CudnnConvGradOp);
+REGISTER_OP_CPU_KERNEL(conv_cudnn,
+                       ops::CudnnConvKernel<paddle::platform::CPUPlace, float>);


I think this is unnecessary, you can directly register "GemmConv2DKernel", like this:

REGISTER_OP_CPU_KERNEL(conv_cudnn, ops:: GemmConv2DKernel <paddle::platform::CPUPlace, float>);

And

REGISTER_OP(conv_cudnn, ops:: Conv2DOp, ops::CudnnConvOpMaker, conv_cudnn_grad, ops:: Conv2DOpGrad);

Good point will try.

chengduoZH · 2017-10-09T04:10:17Z

paddle/operators/conv2d_op.h

+void ConvInferShape(framework::InferShapeContextBase* ctx);
+void ConvGradInferShape(framework::InferShapeContextBase* ctx);
+
+template <typename Place, typename T>


I think it is not appropriate to carry out the function alone.

typhoonzero · 2017-10-09T12:21:53Z

Need to update this PR to get cudnn handle after #4593
merged

chengduoZH · 2017-10-09T12:24:19Z

paddle/operators/conv_cudnn_op.cu

+    T alpha = 1.0f, beta = 0.0f;
+    if (input_grad) {
+      for (int i = 0; i < groups; i++) {
+        T* input_grad_data = input_grad->mutable_data<T>(ctx.GetPlace());


line 238 should be moved out of loop.
And input_grad should be clean cleared. like this https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/operators/lookup_table_op.h#L60

Thanks! Done.

chengduoZH · 2017-10-09T12:25:17Z

paddle/operators/conv_cudnn_op.cu

+    // ------------------- cudnn conv backward filter ---------------------
+    if (filter_grad) {
+      for (int i = 0; i < groups; i++) {
+        T* filter_grad_data = filter_grad->mutable_data<T>(ctx.GetPlace());


Same above.

… cudnn_conv_op

qingqing01

另外，后续需要增加一个对比不同实现结果一致的单测，不止是conv需要，其他的一些op也需要，这个可以后续做~

qingqing01 · 2017-10-10T12:28:49Z

paddle/operators/conv2d_op.h

@@ -24,6 +24,38 @@ namespace operators {

 using Tensor = framework::Tensor;

+// Base convolution operator definations for other conv
+// like operators to reuse the implementation.
+inline int outputSize(int input_size, int filter_size, int padding,


outputSize -> OutputSize

qingqing01 · 2017-10-10T12:30:03Z

paddle/platform/cudnn_helper.h

-                                            const std::vector<int>& dims) {
-    // the format is not used now, but it maybe useful feature
+                                            const std::vector<int>& dims,
+                                            const int groups = 1) {


这个groups觉得还是移到cudnn_conv_op里处理吧

qingqing01 · 2017-10-10T12:31:52Z

paddle/operators/conv_cudnn_op.h

+
+namespace paddle {
+namespace operators {}  // namespace operators
+}  // namespace paddle


paddle/operators/conv_cudnn_op.h可以去掉吧，在conv_cudnn.cc/cu里直接使用paddle/operators/conv2d_op.h就可以吧。

这个groups觉得还是移到cudnn_conv_op里处理吧

前面有个comment，groups的处理挪出去会有一个诡异的bug，后续跟进下，可否先放在这里，其他op在使用的时候，不加group参数和原先的表现一致。

这个merge后创建一个issue，后续PR fix这个问题。

qingqing01 · 2017-10-10T12:32:18Z

paddle/platform/cudnn_helper.h

@@ -14,6 +14,7 @@ limitations under the License. */

 #pragma once

+#include <iostream>


remove this line.

qingqing01 · 2017-10-10T12:39:39Z

paddle/operators/conv_cudnn_op.cc

+  CudnnConvOpMaker(framework::OpProto* proto,
+                   framework::OpAttrChecker* op_checker)
+      : Conv2DOpMaker(proto, op_checker) {
+    AddAttr<std::vector<int>>("dilations", "paddings of convolution operator.")


The comment is not correct for attr dilations.

qingqing01 · 2017-10-10T12:40:16Z

paddle/operators/conv_cudnn_op.cc

+      : Conv2DOpMaker(proto, op_checker) {
+    AddAttr<std::vector<int>>("dilations", "paddings of convolution operator.")
+        .SetDefault(std::vector<int>{1, 1});
+    AddAttr<int>("workspace_size_MB", "workspace size for cudnn, in MB.")


need more detailed comments.

qingqing01 · 2017-10-10T12:41:18Z

paddle/operators/conv_cudnn_op.cu

+   distributed under the License is distributed on an "AS IS" BASIS,
+   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+   See the License for the specific language governing permissions and
+   limitations under the License. */


License和conv_cudnn_op.cc 保持一致，去掉缩进。

保持和其他一致，conv_cudnn_op.cc增加了缩进。

qingqing01 · 2017-10-10T12:46:19Z

paddle/operators/conv_cudnn_op.cu

+
+    int group_offset_X = input_channels / groups * input_height * input_width;
+    int group_offset_Y =
+        output_channels / groups * output_height * output_width;


group_offset_X -> group_offset_in
group_offset_Y -> group_offset_out

命名保持一致，不要X、input混用，Y、output混用。

qingqing01

Approve this PR, some remaining issues can be updated later.

typhoonzero added 3 commits September 18, 2017 21:38

add cudnn_conv_op

79d1552

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

caf4af7

… cudnn_conv_op

WIP

ba95b32

typhoonzero changed the title ~~Cudnn conv op~~ [WIP] Cudnn conv op Sep 19, 2017

typhoonzero added the OpPorting label Sep 19, 2017

typhoonzero added 3 commits September 21, 2017 16:21

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

c0fbc19

… cudnn_conv_op

update

53a9b61

update

4e8b492

qingqing01 requested review from hedaoyuan, chengduoZH and NHZlX September 22, 2017 12:51

typhoonzero added 4 commits September 23, 2017 16:27

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

5dee5cb

… cudnn_conv_op

fix grad check

b3ab7bc

use platform::memory

4f44298

add support group for cudnn

6f3d4a3

typhoonzero changed the title ~~[WIP] Cudnn conv op~~ Cudnn conv op Sep 25, 2017

qingqing01 requested review from jacquesqiao and reyoung and removed request for jacquesqiao September 25, 2017 10:03

chengduoZH closed this Sep 25, 2017

chengduoZH reopened this Sep 25, 2017

qingqing01 reviewed Sep 26, 2017

View reviewed changes

chengduoZH reviewed Sep 27, 2017

View reviewed changes

typhoonzero added 3 commits September 27, 2017 21:51

update

e83c379

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

f9a0dba

… cudnn_conv_op

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

2065e83

… cudnn_conv_op

typhoonzero force-pushed the cudnn_conv_op branch from 072eded to 2065e83 Compare September 30, 2017 01:20

follow comments

e92e555

typhoonzero added 4 commits September 30, 2017 13:05

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

5943152

… cudnn_conv_op

fix onlycpu build

d8db9bf

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

09432ba

… cudnn_conv_op

update cuda define

a379740

chengduoZH reviewed Oct 9, 2017

View reviewed changes

follow comments

19dba4c

chengduoZH reviewed Oct 9, 2017

View reviewed changes

typhoonzero added 4 commits October 9, 2017 21:37

follow comments

7bdf883

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

0c72dab

… cudnn_conv_op

merge with updates

9e0fbbf

fix compile error

276e699

qingqing01 reviewed Oct 10, 2017

View reviewed changes

typhoonzero added 2 commits October 11, 2017 21:13

follow comments

0c3d332

follow comments

8d7cfc5

qingqing01 approved these changes Oct 12, 2017

View reviewed changes

typhoonzero merged commit a3ccbdb into PaddlePaddle:develop Oct 12, 2017

typhoonzero deleted the cudnn_conv_op branch December 22, 2017 05:43

		@@ -0,0 +1,110 @@
		/* Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserve.

		@@ -0,0 +1,260 @@
		/* Copyright (c) 2016 PaddlePaddle Authors All Rights Reserve.

		@@ -14,6 +14,7 @@ limitations under the License. */

		#pragma once

		#include <iostream>

Cudnn conv op #4195

Cudnn conv op #4195

Conversation

typhoonzero commented Sep 19, 2017 • edited Loading

chengduoZH commented Sep 25, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

typhoonzero commented Sep 29, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

typhoonzero commented Oct 9, 2017

chengduoZH Oct 9, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

qingqing01 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

qingqing01 left a comment

Choose a reason for hiding this comment

typhoonzero commented Sep 19, 2017 •

edited

Loading

typhoonzero commented Sep 29, 2017 •

edited

Loading

chengduoZH Oct 9, 2017 •

edited

Loading