Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cuBlasLt Epilogue To Fuse Linear + ReLU|GeLU #39437

Merged
merged 38 commits into from
Mar 7, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
38 commits
Select commit Hold shift + click to select a range
4c7ee94
Added cuBlasLtHandle_t to device context.
mingxu1067 Jan 14, 2022
a82c0a8
Added fused_gemm_epilogue op.
mingxu1067 Jan 14, 2022
26e6411
Added UT to fused_gemm_epilogue op.
mingxu1067 Jan 14, 2022
41b701a
Added LinearAct Pattern
mingxu1067 Jan 17, 2022
6349809
Added FuseGemmEpiloguePass
mingxu1067 Jan 17, 2022
a0c0f48
Added pybind to BuildStrageter.fuse_gemm_epilogue_.
mingxu1067 Jan 17, 2022
cb1f790
Added UT for fuse_gemm_epilogue_pass.
mingxu1067 Jan 19, 2022
f001541
GeLU support and EpilogueSingleton
mingxu1067 Jan 19, 2022
51e6a36
Rename cublaslt_epilogue_opto gemm_epilogue_op.*.
mingxu1067 Jan 19, 2022
2c24ad7
Added both train and infer pattern to LinearAct.
mingxu1067 Jan 19, 2022
6919ce7
Changed CUDA requirement from 11.4 to 11.6 for fuse_gemm_epilogue_pass.
mingxu1067 Jan 19, 2022
a65ab08
Added identity activation support to gemm_epilogue_op.
mingxu1067 Jan 19, 2022
1b7541b
Added Linear Fusion (matmul_v2 + ele_add)
mingxu1067 Jan 19, 2022
ac1a8ca
Rename gemm_epilogue_op.* to fused_gemm_epilogue_op.*
mingxu1067 Jan 19, 2022
9cdf442
Add fused_gemm_epilogue_grad op.
mingxu1067 Jan 21, 2022
fbda512
Add UTs to fused_gemm_epilogue_grad_op.
mingxu1067 Jan 21, 2022
64a43ea
Change attribute name in fused_gemm_epilogue_grad_op for clearing.
mingxu1067 Jan 21, 2022
0369fb4
Allow DX and DBias be dispensable to fused_gemm_epilogue_grad op.
mingxu1067 Jan 21, 2022
88c9ecb
Added ElementwiseAdd+Matmul+Act graph pattern detection.
mingxu1067 Jan 25, 2022
009eea2
Fuse backward of Linear( Act(x))
mingxu1067 Jan 26, 2022
a8076a9
Added UTs to backward fusion of Linear(Act(x)).
mingxu1067 Jan 26, 2022
1268d48
Complete document of arguments to fused_gemm_epilogue_op.
mingxu1067 Jan 28, 2022
dbed64f
Made arguments of some functions pass by reference.
mingxu1067 Feb 8, 2022
d8a862e
Modify code with review comments.
mingxu1067 Feb 8, 2022
54a8588
Made 'const' code style be consistent
mingxu1067 Feb 8, 2022
06f4240
Fixed random seed of python UTs.
mingxu1067 Feb 8, 2022
fba452e
Merge branch 'develop'
mingxu1067 Feb 10, 2022
fe8a560
Set Compiling constrains to cuBlasLt
mingxu1067 Feb 10, 2022
dcdab08
Code Reivew from Paddle
mingxu1067 Feb 18, 2022
02c007f
Remove EpilogueSingleton
mingxu1067 Feb 22, 2022
84fd06a
Fix a logical error and enhance UTs.
mingxu1067 Feb 22, 2022
30b20da
Fix Linear and GeLU fusion issues.
mingxu1067 Feb 23, 2022
1510a96
Removed fused_gemm_epilogue_op.h.
mingxu1067 Feb 23, 2022
a421be8
Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…
mingxu1067 Feb 23, 2022
2768d2a
Rename namespace pten to phi.
mingxu1067 Feb 23, 2022
3a27015
Rename name of arguments in fused_gemm_epilogue_op
mingxu1067 Mar 1, 2022
2f23475
Change EpiloguePassActivationCache as local variable.
mingxu1067 Mar 1, 2022
5c47882
Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…
mingxu1067 Mar 1, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 5 additions & 5 deletions cmake/operators.cmake
Original file line number Diff line number Diff line change
Expand Up @@ -293,11 +293,11 @@ function(op_library TARGET)
# Define operators that don't need pybind here.
foreach(manual_pybind_op "compare_all_op" "compare_op" "logical_op" "bitwise_op" "nccl_op"
"tensor_array_read_write_op" "tensorrt_engine_op" "conv_fusion_op")
if ("${TARGET}" STREQUAL "${manual_pybind_op}")
set(pybind_flag 1)
endif()
endforeach()

if ("${TARGET}" STREQUAL "${manual_pybind_op}")
set(pybind_flag 1)
endif()
endforeach()

# The registration of USE_OP, please refer to paddle/fluid/framework/op_registry.h.
# Note that it's enough to just adding one operator to pybind in a *_op.cc file.
Expand Down
2 changes: 1 addition & 1 deletion paddle/fluid/framework/details/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -139,7 +139,7 @@ set(IR_PASS_DEPS graph_viz_pass multi_devices_graph_pass
coalesce_grad_tensor_pass fuse_all_reduce_op_pass backward_optimizer_op_deps_pass
fuse_adam_op_pass fuse_sgd_op_pass fuse_momentum_op_pass
sync_batch_norm_pass runtime_context_cache_pass graph_to_program_pass
fix_op_run_order_pass)
fix_op_run_order_pass fuse_gemm_epilogue_pass)

if (WITH_CINN)
set(IR_PASS_DEPS ${IR_PASS_DEPS} build_cinn_pass)
Expand Down
9 changes: 9 additions & 0 deletions paddle/fluid/framework/details/build_strategy.cc
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
/* Copyright (c) 2018 PaddlePaddle Authors. All Rights Reserved.
Copyright (c) 2022 NVIDIA Authors. All Rights Reserved.

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
Expand Down Expand Up @@ -175,6 +176,11 @@ class ParallelExecutorPassBuilder : public ir::PassBuilder {
!defined(_WIN32) && !defined(__APPLE__)
AppendPassWithCheck(strategy_.enable_auto_fusion_, "fusion_group_pass");
#endif

#if (defined(PADDLE_WITH_CUDA) && CUDA_VERSION >= 11060)
AppendPassWithCheck(strategy_.fuse_gemm_epilogue_,
"fuse_gemm_epilogue_pass");
#endif
AppendPassWithCheck(strategy_.fuse_elewise_add_act_ops_,
"fuse_elewise_add_act_pass");
// for single card training, fuse_all_reduce_ops is unnecessary.
Expand Down Expand Up @@ -507,3 +513,6 @@ USE_PASS(mkldnn_placement_pass);
!defined(_WIN32) && !defined(__APPLE__)
USE_PASS(fusion_group_pass);
#endif
#if (defined(PADDLE_WITH_CUDA) && CUDA_VERSION >= 11060)
USE_PASS(fuse_gemm_epilogue_pass);
#endif
3 changes: 3 additions & 0 deletions paddle/fluid/framework/details/build_strategy.h
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
// Copyright (c) 2018 PaddlePaddle Authors. All Rights Reserved.
// Copyright (c) 2022 NVIDIA Authors. All Rights Reserved.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
Expand Down Expand Up @@ -124,6 +125,8 @@ struct BuildStrategy {
paddle::optional<bool> fuse_broadcast_ops_{paddle::none};
// replace batch_norm with sync_batch_norm.
bool sync_batch_norm_{false};
// Fuse GEMM+Epilogue via cublasLt epilogue.
bool fuse_gemm_epilogue_{false};

// mkldnn_enabled_op_types specify the operator type list to
// use MKLDNN acceleration. It is null in default, means
Expand Down
1 change: 1 addition & 0 deletions paddle/fluid/framework/ir/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -158,6 +158,7 @@ endif()
cc_library(fuse_bn_act_pass SRCS fuse_bn_act_pass.cc DEPS pass graph_pattern_detector )
cc_library(fuse_bn_add_act_pass SRCS fuse_bn_add_act_pass.cc DEPS pass graph_pattern_detector )
cc_library(fuse_elewise_add_act_pass SRCS fuse_elewise_add_act_pass.cc DEPS pass graph_pattern_detector )
cc_library(fuse_gemm_epilogue_pass SRCS fuse_gemm_epilogue_pass.cc DEPS pass graph_pattern_detector )
cc_library(fuse_relu_depthwise_conv_pass SRCS fuse_relu_depthwise_conv_pass.cc DEPS pass graph_pattern_detector )

set(GLOB_PASS_LIB ${PASS_LIBRARY} CACHE INTERNAL "Global PASS library")
Expand Down
Loading