Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Pten]Refactor the Elementwise_add Kernel #37043

Merged
merged 12 commits into from
Nov 12, 2021

Conversation

YuanRisheng
Copy link
Contributor

@YuanRisheng YuanRisheng commented Nov 8, 2021

PR types

Others

PR changes

OPs

Describe

相关背景

需要迁移elementwise系列kernel到新的Tensor计算库下,其中elementwise_add是迁移的第一个该系列kernel。该PR主要迁移了elementwise_add计算逻辑以及大部分elementwise系列kernel依赖的基本组件,涉及到了较多的文件,构成了后续其他elementwise迁移的基础。

迁移主要逻辑

1,elementwise基础组件迁移

image

2,elementwise_add kernel迁移

image

@paddle-bot-old
Copy link

paddle-bot-old bot commented Nov 8, 2021

Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

int axis = PackTensorsIntoVector<T>(ctx, &ins, &outs);
LaunchElementwiseCudaKernel<ElementwiseType::kBinary, T, T>(
cuda_ctx, ins, &outs, axis, AddFunctor<T>());
auto* x = ctx.Input<framework::LoDTensor>("X");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个函数是不是可以移除了,直接复用下面paddle/fluid/operators/elementwise/elementwise_add_op.h中的kernel,在注册时,给cuda注册一份就可以

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -20,6 +20,13 @@ limitations under the License. */
#include "paddle/fluid/operators/math/blas.h"
#include "paddle/fluid/operators/math/math_function.h"

#include "paddle/fluid/framework/pten_utils.h"

// only can include the headers in paddle/top/api dirs
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

注释目录需要更改,现在应该是paddle/pten/include

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -29,6 +29,11 @@ limitations under the License. */
#include "paddle/fluid/platform/gpu_info.h"
#include "paddle/fluid/platform/transform.h"

// only can include the headers in paddle/top/api dirs
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

同上

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

(*post) *= x_dims[i];
}
pten::general::get_mid_dims(x_dims, y_dims, axis, pre, n, post,
is_run_common_broadcast);
}

inline int GetElementwiseIndex(const int *x_dims_array, const int max_dim,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这层壳的保留是必要的吗

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

}
std::vector<const pten::DenseTensor *> pt_inputs;
std::vector<pten::DenseTensor *> pt_outputs;
// *_tmp for cache DenseTensor
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

可以加TODO注释说明是DenseTensor暂不支持拷贝构造导致需要这样写?后续会优化?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -3,3 +3,5 @@ cc_library(linalg_cpu SRCS linalg.cc DEPS dense_tensor kernel_context kernel_fac
cc_library(creation_cpu SRCS creation.cc DEPS dense_tensor kernel_context kernel_factory eigen_function)
cc_library(utils_cpu SRCS utils.cc DEPS dense_tensor kernel_context kernel_factory memory convert_utils)
cc_library(manipulation_cpu SRCS manipulation.cc DEPS dense_tensor kernel_context kernel_factory utils_cpu unary)
cc_library(nn_cpu SRCS nn.cc DEPS dense_tensor kernel_context kernel_factory blas eigen_function)
add_subdirectory(funcs)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

可以将原来的functions改为funcs,是不是不在每个目录下新建funcs比较好

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -4,10 +4,12 @@ if(WITH_GPU)
nv_library(creation_cuda SRCS creation.cu DEPS eigen_function dense_tensor kernel_context kernel_factory)
nv_library(utils_cuda SRCS utils.cu DEPS dense_tensor kernel_context kernel_factory memory convert_utils)
nv_library(manipulation_cuda SRCS manipulation.cu DEPS dense_tensor kernel_context kernel_factory utils_cuda unary)
elseif(WITH_ROCM)
nv_library(nn_cuda SRCS nn.cu DEPS dense_tensor kernel_context kernel_factory eigen_function)
elseif(WITH_ROCM)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

缩进有点问题

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

int64_t post_;
};

// #if defined(__NVCC__) || defined(__HIPCC__)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这些注释是什么情况

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

namespace general {

using DDim = paddle::framework::DDim;
using CPUContext = paddle::platform::CPUDeviceContext;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里用的cpu,为什么是在general下面

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里是cpu和gpu的公共方法,所以放在里general下边


#pragma once

#include "paddle/pten/kernels/cuda/funcs/elementwise/elementwise_broadcast.cu.h"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

同上,要不在functions下面新增cpu和cuda目录吧,不在kernel层的设备目录里新增子目录了,后续这里也还要调整,另外如果都在cuda目录下,是不是文件后缀中的cu也可以去掉了

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

namespace paddle {
namespace experimental {

Tensor elementwise_add(const Tensor& x, const Tensor& y, int axis) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

API中好像没有aixs这个参数

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Contributor

@MingMingShangTian MingMingShangTian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@chenwhql chenwhql left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@@ -2020,7 +1856,8 @@ void FusedElemwiseAndActComputeWithBroadcast(
axis = (y_dim.size() == 0) ? x_dim.size() : axis;

int pre, n, post, is_run_common_broadcast;
get_mid_dims(x_dim, y_dim, axis, &pre, &n, &post, &is_run_common_broadcast);
pten::general::get_mid_dims(x_dim, y_dim, axis, &pre, &n, &post,
Copy link
Contributor

@chenwhql chenwhql Nov 11, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

函数命名建议按照code style统一,采用驼峰式命名,这里原先就不太规范,可在后续PR更改

Copy link
Contributor

@Avin0323 Avin0323 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM for PR-CI-OP-benchmark

@chenwhql chenwhql merged commit c131034 into PaddlePaddle:develop Nov 12, 2021
@YuanRisheng YuanRisheng deleted the elementwise_add_refactor branch November 19, 2021 02:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants