Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

【PaddlePaddle Hackathon】31. 为 Paddle Inference 添加新的前端语言 #37162

Merged
merged 28 commits into from
Feb 10, 2022

Conversation

chenyanlann
Copy link
Contributor

@chenyanlann chenyanlann commented Nov 12, 2021

PR types

New features

PR changes

APIs

Describe

为 Paddle Inference 添加 Java Apis

Task: #35977

Paddle Inference java API

Paddle Inference java API 基于 capi 和 jni 实现,需要您提前准备好C预测库。

安装(Linux)

1.下载C预测库

您可以选择直接下载paddle_inference_c预测库,或通过源码编译的方式安装,源码编译方式参考官网文档,注意这里cmake编译时打开-DON_INFER=ON,在编译目录下得到paddle_inference_c_install_dir

2.准备预测部署模型

下载 resnet50 模型后解压,得到 Paddle Combined 形式的模型。

wget https://paddle-inference-dist.bj.bcebos.com/Paddle-Inference-Demo/resnet50.tgz
tar zxf resnet50.tgz

#### 获得 resnet50 目录结构如下
resnet50/
├── inference.pdmodel
├── inference.pdiparams
└── inference.pdiparams.info
3.准备预测执行目录
git clone github.com/paddlepaddle/paddle/paddle/fluid/inference/javaapi
3. 编译动态链接库和jar包
在javaapi目录下执行

./build.sh {c预测库目录} {jni头文件目录} {jni系统头文件目录}

以笔者的目录结构为例
./build.sh /root/paddle_c/paddle_inference_c_2.2/paddle_inference_c/ /usr/lib/jvm/java-8-openjdk-amd64/include /usr/lib/jvm/java-8-openjdk-amd64/include/linux

执行完成后,会在当前目录下生成JavaInference.jar和libpaddle_inference.so
5.运行单测,验证
在javaapi目录下执行

./test.sh {c预测库目录} {.pdmodel文件目录} {.pdiparams文件目录}

以笔者的目录结构为例
./test.sh "/root/paddle_c/paddle_inference_c_2.2/paddle_inference_c"  "/root/paddle_c/resnet50/inference.pdmodel" "/root/paddle_c/resnet50/inference.pdiparams"

在Java中使用Paddle预测

首先创建预测配置

Config config = new Config();
config.setCppModel(model_file, params_file);

创建predictor

Predictor predictor = Predictor.createPaddlePredictor(config);

获取输入Tensor

String inNames = predictor.getInputNameById(0);
Tensor inHandle = predictor.getInputHandle(inNames);

设置输入数据(假设只有一个输入)

inHandle.Reshape(4, new int[]{1, 3, 224, 224});
float[] inData = new float[1*3*224*224];
inHandle.CopyFromCpu(inData);

运行预测

predictor.Run();

获取输出Tensor

String outNames = predictor.getOutputNameById(0);
Tensor outHandle = predictor.getOutputHandle(outNames);
float[] outData = new float[outHandle.GetSize()];
outHandle.CopyToCpu(outData);

@chenyanlann
Copy link
Contributor Author

@TCChenlong

@TCChenlong
Copy link
Contributor

还在review中哈~

@DannyIsFunny
Copy link
Contributor

编译过程的中间产物需要从PR中删除: JavaInference.jarlibpaddle_inference.so Config.class 等等

@DannyIsFunny
Copy link
Contributor

PaddlePaddle 编译体系基于CMake工具构建、建议将build.sh中的 gcc 的编译流程修改为CMakeLists.txt
eg.

# CMakeLists.txt
add_library(paddle_inference SHARED com_baidu_paddle_inference_Predictor.cpp com_baidu_paddle_inference_Config.cpp com_baidu_paddle_inference_Tensor.cpp)
target_link_libraries(paddle_inference paddle_inference_c)

@DannyIsFunny
Copy link
Contributor

JAVA API 请 @winter-wang review一下

@DannyIsFunny
Copy link
Contributor

当前产出的java api动态库仍然需要动态依赖 C API的动态库、能否直接产出独立的JAVA API动态库

Copy link
Contributor

@winter-wang winter-wang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

我在所有java涉及到的接口中没有看见对 PD_TensorDestroy函数和PD_PredictorDestroy函数的调用。请问在java接口使用过程中,底层的Tensor对象和Predictor对象如何回收尼?

const char* params_file = PD_ConfigGetProgFile(
reinterpret_cast<PD_Config*>(cppPaddleConfigPointer));
jstring paramsFile = env->NewStringUTF(params_file);
free(const_cast<char*>(params_file));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

此处不能显式free。 params_file指向的内存是底层std::string对应的内存。


JNIEXPORT jstring JNICALL Java_com_baidu_paddle_inference_Config_paramsFile(
JNIEnv* env, jobject obj, jlong cppPaddleConfigPointer) {
const char* params_file = PD_ConfigGetProgFile(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

C接口函数调用错误。此处应该调用 PD_ConfigGetParamsFile函数。

const char* prog_file = PD_ConfigGetProgFile(
reinterpret_cast<PD_Config*>(cppPaddleConfigPointer));
jstring progFile = env->NewStringUTF(prog_file);
free(const_cast<char*>(prog_file));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

此处不能显式free. prog_file对应的内存是底层std::string的内存。

const char* model_dir = PD_ConfigGetModelDir(
reinterpret_cast<PD_Config*>(cppPaddleConfigPointer));
jstring modelDir = env->NewStringUTF(model_dir);
free(const_cast<char*>(model_dir));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

此处不能显式free. model_dir对应的内存是底层std::string的内存

const char* prog_file = env->GetStringUTFChars(progFile, 0);
PD_ConfigSetProgFile(reinterpret_cast<PD_Config*>(cppPaddleConfigPointer),
prog_file);
free(const_cast<char*>(prog_file));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这种GetStringUTFChars得到的内存不能用free函数是释放吧?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

其它地方也类似。

@paddle-bot-old
Copy link

Sorry to inform you that 5cf5259's CIs have passed for more than 7 days. To prevent PR conflicts, you need to re-run all CIs manually.

@CLAassistant
Copy link

CLAassistant commented Nov 26, 2021

CLA assistant check
All committers have signed the CLA.

@chenyanlann
Copy link
Contributor Author

@winter-wang 感谢,已修复

@chenyanlann
Copy link
Contributor Author

@DannyIsFunny 感谢,已删除中间产物,添加CMakeLists.txt

@paddle-bot-old
Copy link

paddle-bot-old bot commented Dec 7, 2021

Sorry to inform you that 34ec990's CIs have passed for more than 7 days. To prevent PR conflicts, you need to re-run all CIs manually.

winter-wang
winter-wang previously approved these changes Dec 10, 2021
Copy link
Contributor

@winter-wang winter-wang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@Shixiaowei02
Copy link
Contributor

Shixiaowei02 commented Dec 13, 2021

@chenyanlann 您好!首先感谢您为飞桨贡献代码!

作为修改建议的最后一步,请将相关代码移入 experimental 名称空间或目录中,并参考已有的 C++ 部分 贡献 Java 接口文档和示例

用户接口是框架最重要的代码,我们将发起评审并正式合入。谢谢!

Copy link
Contributor

@Superjomn Superjomn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@winter-wang winter-wang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Collaborator

@raindrops2sea raindrops2sea left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@Superjomn Superjomn merged commit 238f3c8 into PaddlePaddle:develop Feb 10, 2022
Shixiaowei02 added a commit that referenced this pull request Feb 16, 2022
* 【Pten】Adjust the Empyt dev_api (#39143)

* adjust the Empyt dev_api

* fix merge conflict

* fix sparse_utils_kernel

* Fix code conflict of empty dev_api (#39430)

* fix code conflict

* clear cache

* just try

* [PluggableDevice] custom kernel supports multi cpp_dtype registering (#39385)

* [PTen] Add standard kernel suffix set (#39404)

* add standard_suffix_set_and_remove_reshape_with_xshape

* revert reshape change

* polish reduce name

* [pten] update isnan registration (#39419)

* update isnan registration

* fix compile

* [bf16] add bf16 kernel: dropout & reshape & slice (#39395)

* add dropout

* add reshape

* add slice

* refien slice unittest

* refine slice unittest

* add cpu bf16 kernel

* [bf16] add bf16 kernel: squeeze & unsqueeze & stack (#39402)

* add squeeze unsqueeze stack

* add unittest

* add cpu kernel

* Modify the unsqueeze dimension of input data in conv1d NCL And NLC format (#38425)

* optimize conv1d forward

* add conv opt

* Optimize memory copy

* delete share data with

* set num_filters=512

* add nlc optimize

* Optimize num_filter=512 data on A100 and V100

* Fix the workspace_size size setting of filter

* 【Pten】Refactor C++ API code-gen (#39408)

* refactor C++ API code-gen

* fix windows problem of C++ API

* Refactored Python-C Attributes Parsing Functions (#39328)

* Add _get_parameter method to Lamb optimizer (#39416)

* add _get_parameter func to lamb

* remove duplicate code

* mkldnn layout issue fix (#39422)

* mkldnn conv fix

* definetion

* fix compile error on jetson (#39441)

* move Masked select to pten (#39193)

* move masked select cpu kernel

* add masked selected gpu kernel; test=develop

* fix bugs; test=develop

* bug fix; test=develop

* bug fix; test=develop

* add namespace to set mask array; test=develop

* fix bug; test=develop

* fix bugs; test=develop

* fix ddim bug; test=develop

* fix npu op bug; test=develop

* fix xpu dependecy bug; test=develop

* move kernel args to sig.cc; test=develop

* 【PaddlePaddle Hackathon】31. Add Java frontend for Paddle Inference  (#37162)

* fix check error of ResetHolder (#39439)

* Added python-c code generation for final state Eager Dygraph (#39233)

* Removed debug info

* Added automatic code generation for final state Eager Dygraph

* Modified backward yaml

* Added EagerUtils helper functions for final state CodeGen

* Adjusted CMakeFiles to support compilation for final state auto generated codes

* Added python-c code generation for final state Eager Dygraph

* Fixed minor issue

* Fixed yaml.load() method failure

* Fixed minor issues

* Refactored Python-C Attributes Parsing Functions

* Fixed minor issue with Python-C AddFunctions

* Fixed issues from merge

* Fixed merge issues

* change dtype of pooling mask to 'int32' for Paddle2ONNX (#39314)

* change dtype of pooling mask to 'int32' for Paddle2ONNX

* empty commit to rerun ci

* fix format

* share MemOptVarInfos of external variables into cinn_launch subgraph (#39209)

* add a graph pass to share MemOptVarInfos of external variables into subgraph

* update pass name

* fix compile failed

* add share_mem_opt_info_to_subgraph_pass test

* share_mem_opt_info_to_subgraph_pass_test pass

* modify some codes for better style and more robust

* update cmake

* [NPU] add reduce_min (#39019)

[NPU] add reduce_min

* [MLU] add mlu kernel for accuracy op (#39337)

* [MLU] add mlu kernel for accuracy op

* fix license format

* fix error message

* [Dy2St]Handle `a, b = paddle.shape(x)` in Static Analysis (#39245)

* refine Assign

* add UT

* 【Pten】Auto-Generate InterMeta register (#39436)

* fix code conflict

* generate inter_meta register

* clear cache

* just try

* add sign c++ api

* polish some code

* Support different dtypes of inputs for elementwise ops (#38859)

* improve backward performance

* support different dtypes for elementwise ops

* Add profiler node tree implementation (#39316)

* add event node implementation

* modify profiler.stop interface

* fix according to review

* fix file mode

* modify class method name in event_node.cc

* modify LLONG_MAX to ULLONG_MAX

* fix ci error

* fix ci error

* add print pten kernel tool (#39371)

* test=document_fix;add print pten kernel tool

* test=document_fix

* test=document_fix

* test=document_fix

* test=document_fix

* add print_pten_kernels tool

* add print_pten_kernels tool

* fix windows complie

* notest,test=rocm_ci

* add merge tool

* add comments

* [new-exec] set type of op-kernel op by place (#39458)

* Add log for executor (#39459)

* add align for WorkQueue

* add spinlock

* merge develop

* merge

* Add EventsWaiter

* Revert "Add EventsWaiter"

This reverts commit e206173.

* add log for Executor

Co-authored-by: liutiexing <liutiexing@google.com>

* [Paddle Inference] support ernie quant model with interleaved (#39424)

* support ernie quant model with interleaved

* support ernie quant model with interleaved

* support ernie quant model with interleaved

* support ernie quant model with interleaved

* support ernie quant model with interleaved

* support ernie quant model with interleaved

* support ernie quant model with interleaved

* 统一 ps 开发 - python (#39431)

* delete gloo connect retry

* the_one_ps dirs reconstruct

* .

* .

* create the_one_ps dirs

* create the_one_ps dirs

* create the_one_ps dirs

* create the_one_ps dirs

* create the_one_ps dirs

* create the_one_ps dirs

* the one ps dirs modify

* the one ps dirs modify

* the one ps dirs modify

* the one ps dirs modify

* refactor ps optimize

* refactor ps optimize

* refactor ps optimize

* .

* .

* .

* .

* .

* .

* refactor theoneps

* the_one_ps

* add ps pass unittest

* add ps pass unittest

* ps unitest frame

* ps unittest frame

* ps unittest frame

* ps unittest frame

* ps unittest frame

* ps unittest frame

* ps unittest frame

* ps unittest frame

* ps unittest frame

* ps unittest frame

* ps unittest frame

* ps unittest frame

* ps unittest frame

* ps unittest frame

* ps unittest frame

* ps unittest frame

* ps unittest frame

* add cpu_async_ps_mode test

* add cpu_async_ps_mode test

* add cpu_async_ps_mode test

* ps unittest ready

* ps unittest ready

* solve dist_pass init conflict

* solve import CommContext error

* unittest ok

* implement AllocateFrom

* solve setup.py.in conflict

* solve conflict

* solve conflict

* solve conflict

* .

* .

* cpu-async-ps minimize test ok & gpu minimize test ok

Co-authored-by: zkh2016 <zhangkaihuo@baidu.com>

* [PTen] Move grad GetExpectedPtenKernelArgs into pten (#39418)

* move grad get expected pten kernel args

* fix reduce sum error

* fix element_sub_grad failed

* revert kernel judge change

* fix compilation warning on mac (#39438)

* get build time (#39368)

* fix prelu trt convert (#39389)

* Optimize bilinear interpolation foward (#39243)

* bilinear_fw init

* optimize code

* pre-compute linear_interp input index

* Optimize performance of softmax_bwd when axis!=-1 (#38609)

* Optimize performance of softmax_bwd when axis!=-1

* fix

* fix

* fix

* fix

* [PTen] Remove pten core's dependency on fluid xxx_info.h (#39401)

* ermove xxx_info include

* fix namespace error

* resolve conflict

* skip xpu context in registry

* fix macro error

* resolve conflict

* resolve conflict

* revert xpu convert

* remove trans to fluid place

* remove useless headers

* [Pten] move operators/math/math_function_* to pten/kernels/func (#39300)

* move operators/math/math_function_* to pten/kernels/func
* namespace from `paddle::operators::math` to `pten::funcs`

* [MLU] add pool2d and pool2d_grad mlu kernel (#39453)

* [MLU]support c_gen_cncl_id_op run on MLU device (#39336)

Co-authored-by: zhangna <zhangna@cambricon.com>

* [bf16] add bf16 kernel: transpose & unbind (#39457)

* add transpose unbind

* add unittest

* refine transpose unittest

* uniform_random op for mlu (#39450)

* [MLU] add pool2d pytest (#39454)

* Added shape (U)INT8/BF16/FP32 oneDNN kernel (#36033)

* added shape oneDNN kernel

* removed unnecessary import from test

* added skipping tests for GPU

* refactoring

* refactored shape kernel

* added tests in new framework

* removed one line

* minor change

* added newline at EOF

* added formatting

* added attributes as extra

* move memcpy.h into cc file (#39469)

* Add TensorRT inspector into Paddle-TRT (#38362)

* Fix add profiler node tree implementation cmake error (#39474)

* add event node implementation

* modify profiler.stop interface

* fix according to review

* fix file mode

* modify class method name in event_node.cc

* modify LLONG_MAX to ULLONG_MAX

* fix ci error

* fix ci error

* fix dependency error

* unify naming style (#39481)

* [Pten] Generate Wrapped InferMeta by Yaml (#39482)

* generate wrapped_infer_meta

* add test for wrapped_infer_meta

* Update test_meta_fn_utils.cc

* change the dir of generated file

Co-authored-by: Chen Weihang <chenweihang@baidu.com>
Co-authored-by: Chen Weihang <chenwhpro@163.com>

* Adjusted python-level trace_op to accomodate final state Eager Dygraph (#39319)

* Removed debug info

* Added automatic code generation for final state Eager Dygraph

* Modified backward yaml

* Added EagerUtils helper functions for final state CodeGen

* Adjusted CMakeFiles to support compilation for final state auto generated codes

* Added python-c code generation for final state Eager Dygraph

* Fixed minor issue

* Fixed yaml.load() method failure

* Fixed minor issues

* Refactored Python-C Attributes Parsing Functions

* Fixed minor issue with Python-C AddFunctions

* Adjusted python-level trace_op to accomodate final state Eager Dygraph

* Added Logs for final state Eager Dygraph

* Fixed merge issues

* Fixed minor issue

* Fixed get_tensor method for EagerTensor (#39414)

* Enabled Eager OpTest #1

* Enabled Eager OpTest #1

* Fixed get_tensor method for EagerTensor

* [Approver Update] update check approver of qili93, test=document_fix (#39483)

* [MLU] add mlu kernel for c_broadcast op (#39470)

* update xpu test build script and fix get_test_cover_info, *test=kunlun (#39235)

* fix gather_nd, *test=kunlun (#39283)

* [pten] add split kernel (#39060)

* add split kernel

* add split kernel signature

* fix split bug

* modify MakePtenScalarArrayFromVarList

* modify MakePtenScalarArrayFromVarList

* fix split windows register error

* add test case for split kernel

* replace raw split kernel with pten kernel

* fix makeScalar/ScalarArray bug

* remove debug log

* remove int64_t type in buildPtcontext

* update by code review

* fix split dev test failed

* change DenseTensorMeta to MetaTensor

* change split api code from auto gen to manual

* split cuda kernel support bfloat16 type

* fix conflict

* rm raw split kernel

* merge develop branch

* change to pten::errors

* new may of test cases, *test=kunlun (#39444)

* new may of test cases, *test=kunlun

* new may of test cases, *test=kunlun

* new may of test cases, *test=kunlun

* [PTen] Add HasAttr for ArgumentMappingContext (#39464)

* add has_attr for arg map context

* skip useless attr now

* skip attr if not exists

* fix typo

* [ROCm] fix missing dcu kernel in operator.cmake, test=develop (#39480)

Co-authored-by: zyfncg <zhangyunfei07@baidu.com>
Co-authored-by: Aganlengzi <aganlengzi@gmail.com>
Co-authored-by: Chen Weihang <chenweihang@baidu.com>
Co-authored-by: Leo Chen <chenqiuliang@baidu.com>
Co-authored-by: zhangbo9674 <82555433+zhangbo9674@users.noreply.github.com>
Co-authored-by: crystal <62974595+Zjq9409@users.noreply.github.com>
Co-authored-by: Zhanlue Yang <jim19930609@gmail.com>
Co-authored-by: sneaxiy <32832641+sneaxiy@users.noreply.github.com>
Co-authored-by: wenbin <wang3323032@qq.com>
Co-authored-by: Wilber <jiweibo@baidu.com>
Co-authored-by: hong <43953930+phlrain@users.noreply.github.com>
Co-authored-by: chenyanlann <62465397+chenyanlann@users.noreply.github.com>
Co-authored-by: Wei Shengyu <weisy11@163.com>
Co-authored-by: TeFeng Chen <ctfeng66@163.com>
Co-authored-by: furnace <34057289+windstamp@users.noreply.github.com>
Co-authored-by: fwenguang <95677191+fwenguang@users.noreply.github.com>
Co-authored-by: 0x45f <23097963+0x45f@users.noreply.github.com>
Co-authored-by: Zhang Ting <zhangting_2017@163.com>
Co-authored-by: chenjian <chenjian26@baidu.com>
Co-authored-by: Shang Zhizhou <shangzhizhou@baidu.com>
Co-authored-by: liutiexing <74819124+liutiexing@users.noreply.github.com>
Co-authored-by: liutiexing <liutiexing@google.com>
Co-authored-by: Wangzheee <634486483@qq.com>
Co-authored-by: ziyoujiyi <73728031+ziyoujiyi@users.noreply.github.com>
Co-authored-by: zkh2016 <zhangkaihuo@baidu.com>
Co-authored-by: zhangchunle <clzhang_cauc@163.com>
Co-authored-by: JingZhuangzhuang <75348594+JZZ-NOTE@users.noreply.github.com>
Co-authored-by: Lijunhui <1578034415@qq.com>
Co-authored-by: Zhang Zheng <32410583+ZzSean@users.noreply.github.com>
Co-authored-by: Feiyu Chan <chenfeiyu@baidu.com>
Co-authored-by: zn <96479180+kangna-qi@users.noreply.github.com>
Co-authored-by: zhangna <zhangna@cambricon.com>
Co-authored-by: joeqiao12 <45232181+joeqiao12@users.noreply.github.com>
Co-authored-by: jakpiase <jakpia21@gmail.com>
Co-authored-by: Leo Chen <39020268+leo0519@users.noreply.github.com>
Co-authored-by: Chen Weihang <chenwhpro@163.com>
Co-authored-by: Qi Li <qili93@qq.com>
Co-authored-by: maxhuiy <1508399706@qq.com>
Co-authored-by: TTerror <tangzhiyi11@users.noreply.github.com>
Co-authored-by: chentianyu03 <chentianyu03@baidu.com>
Co-authored-by: helen88 <z8hanghuan@126.com>
winter-wang pushed a commit to winter-wang/Paddle that referenced this pull request Feb 16, 2022
* 【Pten】Adjust the Empyt dev_api (PaddlePaddle#39143)

* adjust the Empyt dev_api

* fix merge conflict

* fix sparse_utils_kernel

* Fix code conflict of empty dev_api (PaddlePaddle#39430)

* fix code conflict

* clear cache

* just try

* [PluggableDevice] custom kernel supports multi cpp_dtype registering (PaddlePaddle#39385)

* [PTen] Add standard kernel suffix set (PaddlePaddle#39404)

* add standard_suffix_set_and_remove_reshape_with_xshape

* revert reshape change

* polish reduce name

* [pten] update isnan registration (PaddlePaddle#39419)

* update isnan registration

* fix compile

* [bf16] add bf16 kernel: dropout & reshape & slice (PaddlePaddle#39395)

* add dropout

* add reshape

* add slice

* refien slice unittest

* refine slice unittest

* add cpu bf16 kernel

* [bf16] add bf16 kernel: squeeze & unsqueeze & stack (PaddlePaddle#39402)

* add squeeze unsqueeze stack

* add unittest

* add cpu kernel

* Modify the unsqueeze dimension of input data in conv1d NCL And NLC format (PaddlePaddle#38425)

* optimize conv1d forward

* add conv opt

* Optimize memory copy

* delete share data with

* set num_filters=512

* add nlc optimize

* Optimize num_filter=512 data on A100 and V100

* Fix the workspace_size size setting of filter

* 【Pten】Refactor C++ API code-gen (PaddlePaddle#39408)

* refactor C++ API code-gen

* fix windows problem of C++ API

* Refactored Python-C Attributes Parsing Functions (PaddlePaddle#39328)

* Add _get_parameter method to Lamb optimizer (PaddlePaddle#39416)

* add _get_parameter func to lamb

* remove duplicate code

* mkldnn layout issue fix (PaddlePaddle#39422)

* mkldnn conv fix

* definetion

* fix compile error on jetson (PaddlePaddle#39441)

* move Masked select to pten (PaddlePaddle#39193)

* move masked select cpu kernel

* add masked selected gpu kernel; test=develop

* fix bugs; test=develop

* bug fix; test=develop

* bug fix; test=develop

* add namespace to set mask array; test=develop

* fix bug; test=develop

* fix bugs; test=develop

* fix ddim bug; test=develop

* fix npu op bug; test=develop

* fix xpu dependecy bug; test=develop

* move kernel args to sig.cc; test=develop

* 【PaddlePaddle Hackathon】31. Add Java frontend for Paddle Inference  (PaddlePaddle#37162)

* fix check error of ResetHolder (PaddlePaddle#39439)

* Added python-c code generation for final state Eager Dygraph (PaddlePaddle#39233)

* Removed debug info

* Added automatic code generation for final state Eager Dygraph

* Modified backward yaml

* Added EagerUtils helper functions for final state CodeGen

* Adjusted CMakeFiles to support compilation for final state auto generated codes

* Added python-c code generation for final state Eager Dygraph

* Fixed minor issue

* Fixed yaml.load() method failure

* Fixed minor issues

* Refactored Python-C Attributes Parsing Functions

* Fixed minor issue with Python-C AddFunctions

* Fixed issues from merge

* Fixed merge issues

* change dtype of pooling mask to 'int32' for Paddle2ONNX (PaddlePaddle#39314)

* change dtype of pooling mask to 'int32' for Paddle2ONNX

* empty commit to rerun ci

* fix format

* share MemOptVarInfos of external variables into cinn_launch subgraph (PaddlePaddle#39209)

* add a graph pass to share MemOptVarInfos of external variables into subgraph

* update pass name

* fix compile failed

* add share_mem_opt_info_to_subgraph_pass test

* share_mem_opt_info_to_subgraph_pass_test pass

* modify some codes for better style and more robust

* update cmake

* [NPU] add reduce_min (PaddlePaddle#39019)

[NPU] add reduce_min

* [MLU] add mlu kernel for accuracy op (PaddlePaddle#39337)

* [MLU] add mlu kernel for accuracy op

* fix license format

* fix error message

* [Dy2St]Handle `a, b = paddle.shape(x)` in Static Analysis (PaddlePaddle#39245)

* refine Assign

* add UT

* 【Pten】Auto-Generate InterMeta register (PaddlePaddle#39436)

* fix code conflict

* generate inter_meta register

* clear cache

* just try

* add sign c++ api

* polish some code

* Support different dtypes of inputs for elementwise ops (PaddlePaddle#38859)

* improve backward performance

* support different dtypes for elementwise ops

* Add profiler node tree implementation (PaddlePaddle#39316)

* add event node implementation

* modify profiler.stop interface

* fix according to review

* fix file mode

* modify class method name in event_node.cc

* modify LLONG_MAX to ULLONG_MAX

* fix ci error

* fix ci error

* add print pten kernel tool (PaddlePaddle#39371)

* test=document_fix;add print pten kernel tool

* test=document_fix

* test=document_fix

* test=document_fix

* test=document_fix

* add print_pten_kernels tool

* add print_pten_kernels tool

* fix windows complie

* notest,test=rocm_ci

* add merge tool

* add comments

* [new-exec] set type of op-kernel op by place (PaddlePaddle#39458)

* Add log for executor (PaddlePaddle#39459)

* add align for WorkQueue

* add spinlock

* merge develop

* merge

* Add EventsWaiter

* Revert "Add EventsWaiter"

This reverts commit e206173.

* add log for Executor

Co-authored-by: liutiexing <liutiexing@google.com>

* [Paddle Inference] support ernie quant model with interleaved (PaddlePaddle#39424)

* support ernie quant model with interleaved

* support ernie quant model with interleaved

* support ernie quant model with interleaved

* support ernie quant model with interleaved

* support ernie quant model with interleaved

* support ernie quant model with interleaved

* support ernie quant model with interleaved

* 统一 ps 开发 - python (PaddlePaddle#39431)

* delete gloo connect retry

* the_one_ps dirs reconstruct

* .

* .

* create the_one_ps dirs

* create the_one_ps dirs

* create the_one_ps dirs

* create the_one_ps dirs

* create the_one_ps dirs

* create the_one_ps dirs

* the one ps dirs modify

* the one ps dirs modify

* the one ps dirs modify

* the one ps dirs modify

* refactor ps optimize

* refactor ps optimize

* refactor ps optimize

* .

* .

* .

* .

* .

* .

* refactor theoneps

* the_one_ps

* add ps pass unittest

* add ps pass unittest

* ps unitest frame

* ps unittest frame

* ps unittest frame

* ps unittest frame

* ps unittest frame

* ps unittest frame

* ps unittest frame

* ps unittest frame

* ps unittest frame

* ps unittest frame

* ps unittest frame

* ps unittest frame

* ps unittest frame

* ps unittest frame

* ps unittest frame

* ps unittest frame

* ps unittest frame

* add cpu_async_ps_mode test

* add cpu_async_ps_mode test

* add cpu_async_ps_mode test

* ps unittest ready

* ps unittest ready

* solve dist_pass init conflict

* solve import CommContext error

* unittest ok

* implement AllocateFrom

* solve setup.py.in conflict

* solve conflict

* solve conflict

* solve conflict

* .

* .

* cpu-async-ps minimize test ok & gpu minimize test ok

Co-authored-by: zkh2016 <zhangkaihuo@baidu.com>

* [PTen] Move grad GetExpectedPtenKernelArgs into pten (PaddlePaddle#39418)

* move grad get expected pten kernel args

* fix reduce sum error

* fix element_sub_grad failed

* revert kernel judge change

* fix compilation warning on mac (PaddlePaddle#39438)

* get build time (PaddlePaddle#39368)

* fix prelu trt convert (PaddlePaddle#39389)

* Optimize bilinear interpolation foward (PaddlePaddle#39243)

* bilinear_fw init

* optimize code

* pre-compute linear_interp input index

* Optimize performance of softmax_bwd when axis!=-1 (PaddlePaddle#38609)

* Optimize performance of softmax_bwd when axis!=-1

* fix

* fix

* fix

* fix

* [PTen] Remove pten core's dependency on fluid xxx_info.h (PaddlePaddle#39401)

* ermove xxx_info include

* fix namespace error

* resolve conflict

* skip xpu context in registry

* fix macro error

* resolve conflict

* resolve conflict

* revert xpu convert

* remove trans to fluid place

* remove useless headers

* [Pten] move operators/math/math_function_* to pten/kernels/func (PaddlePaddle#39300)

* move operators/math/math_function_* to pten/kernels/func
* namespace from `paddle::operators::math` to `pten::funcs`

* [MLU] add pool2d and pool2d_grad mlu kernel (PaddlePaddle#39453)

* [MLU]support c_gen_cncl_id_op run on MLU device (PaddlePaddle#39336)

Co-authored-by: zhangna <zhangna@cambricon.com>

* [bf16] add bf16 kernel: transpose & unbind (PaddlePaddle#39457)

* add transpose unbind

* add unittest

* refine transpose unittest

* uniform_random op for mlu (PaddlePaddle#39450)

* [MLU] add pool2d pytest (PaddlePaddle#39454)

* Added shape (U)INT8/BF16/FP32 oneDNN kernel (PaddlePaddle#36033)

* added shape oneDNN kernel

* removed unnecessary import from test

* added skipping tests for GPU

* refactoring

* refactored shape kernel

* added tests in new framework

* removed one line

* minor change

* added newline at EOF

* added formatting

* added attributes as extra

* move memcpy.h into cc file (PaddlePaddle#39469)

* Add TensorRT inspector into Paddle-TRT (PaddlePaddle#38362)

* Fix add profiler node tree implementation cmake error (PaddlePaddle#39474)

* add event node implementation

* modify profiler.stop interface

* fix according to review

* fix file mode

* modify class method name in event_node.cc

* modify LLONG_MAX to ULLONG_MAX

* fix ci error

* fix ci error

* fix dependency error

* unify naming style (PaddlePaddle#39481)

* [Pten] Generate Wrapped InferMeta by Yaml (PaddlePaddle#39482)

* generate wrapped_infer_meta

* add test for wrapped_infer_meta

* Update test_meta_fn_utils.cc

* change the dir of generated file

Co-authored-by: Chen Weihang <chenweihang@baidu.com>
Co-authored-by: Chen Weihang <chenwhpro@163.com>

* Adjusted python-level trace_op to accomodate final state Eager Dygraph (PaddlePaddle#39319)

* Removed debug info

* Added automatic code generation for final state Eager Dygraph

* Modified backward yaml

* Added EagerUtils helper functions for final state CodeGen

* Adjusted CMakeFiles to support compilation for final state auto generated codes

* Added python-c code generation for final state Eager Dygraph

* Fixed minor issue

* Fixed yaml.load() method failure

* Fixed minor issues

* Refactored Python-C Attributes Parsing Functions

* Fixed minor issue with Python-C AddFunctions

* Adjusted python-level trace_op to accomodate final state Eager Dygraph

* Added Logs for final state Eager Dygraph

* Fixed merge issues

* Fixed minor issue

* Fixed get_tensor method for EagerTensor (PaddlePaddle#39414)

* Enabled Eager OpTest PaddlePaddle#1

* Enabled Eager OpTest PaddlePaddle#1

* Fixed get_tensor method for EagerTensor

* [Approver Update] update check approver of qili93, test=document_fix (PaddlePaddle#39483)

* [MLU] add mlu kernel for c_broadcast op (PaddlePaddle#39470)

* update xpu test build script and fix get_test_cover_info, *test=kunlun (PaddlePaddle#39235)

* fix gather_nd, *test=kunlun (PaddlePaddle#39283)

* [pten] add split kernel (PaddlePaddle#39060)

* add split kernel

* add split kernel signature

* fix split bug

* modify MakePtenScalarArrayFromVarList

* modify MakePtenScalarArrayFromVarList

* fix split windows register error

* add test case for split kernel

* replace raw split kernel with pten kernel

* fix makeScalar/ScalarArray bug

* remove debug log

* remove int64_t type in buildPtcontext

* update by code review

* fix split dev test failed

* change DenseTensorMeta to MetaTensor

* change split api code from auto gen to manual

* split cuda kernel support bfloat16 type

* fix conflict

* rm raw split kernel

* merge develop branch

* change to pten::errors

* new may of test cases, *test=kunlun (PaddlePaddle#39444)

* new may of test cases, *test=kunlun

* new may of test cases, *test=kunlun

* new may of test cases, *test=kunlun

* [PTen] Add HasAttr for ArgumentMappingContext (PaddlePaddle#39464)

* add has_attr for arg map context

* skip useless attr now

* skip attr if not exists

* fix typo

* [ROCm] fix missing dcu kernel in operator.cmake, test=develop (PaddlePaddle#39480)

Co-authored-by: zyfncg <zhangyunfei07@baidu.com>
Co-authored-by: Aganlengzi <aganlengzi@gmail.com>
Co-authored-by: Chen Weihang <chenweihang@baidu.com>
Co-authored-by: Leo Chen <chenqiuliang@baidu.com>
Co-authored-by: zhangbo9674 <82555433+zhangbo9674@users.noreply.github.com>
Co-authored-by: crystal <62974595+Zjq9409@users.noreply.github.com>
Co-authored-by: Zhanlue Yang <jim19930609@gmail.com>
Co-authored-by: sneaxiy <32832641+sneaxiy@users.noreply.github.com>
Co-authored-by: wenbin <wang3323032@qq.com>
Co-authored-by: Wilber <jiweibo@baidu.com>
Co-authored-by: hong <43953930+phlrain@users.noreply.github.com>
Co-authored-by: chenyanlann <62465397+chenyanlann@users.noreply.github.com>
Co-authored-by: Wei Shengyu <weisy11@163.com>
Co-authored-by: TeFeng Chen <ctfeng66@163.com>
Co-authored-by: furnace <34057289+windstamp@users.noreply.github.com>
Co-authored-by: fwenguang <95677191+fwenguang@users.noreply.github.com>
Co-authored-by: 0x45f <23097963+0x45f@users.noreply.github.com>
Co-authored-by: Zhang Ting <zhangting_2017@163.com>
Co-authored-by: chenjian <chenjian26@baidu.com>
Co-authored-by: Shang Zhizhou <shangzhizhou@baidu.com>
Co-authored-by: liutiexing <74819124+liutiexing@users.noreply.github.com>
Co-authored-by: liutiexing <liutiexing@google.com>
Co-authored-by: Wangzheee <634486483@qq.com>
Co-authored-by: ziyoujiyi <73728031+ziyoujiyi@users.noreply.github.com>
Co-authored-by: zkh2016 <zhangkaihuo@baidu.com>
Co-authored-by: zhangchunle <clzhang_cauc@163.com>
Co-authored-by: JingZhuangzhuang <75348594+JZZ-NOTE@users.noreply.github.com>
Co-authored-by: Lijunhui <1578034415@qq.com>
Co-authored-by: Zhang Zheng <32410583+ZzSean@users.noreply.github.com>
Co-authored-by: Feiyu Chan <chenfeiyu@baidu.com>
Co-authored-by: zn <96479180+kangna-qi@users.noreply.github.com>
Co-authored-by: zhangna <zhangna@cambricon.com>
Co-authored-by: joeqiao12 <45232181+joeqiao12@users.noreply.github.com>
Co-authored-by: jakpiase <jakpia21@gmail.com>
Co-authored-by: Leo Chen <39020268+leo0519@users.noreply.github.com>
Co-authored-by: Chen Weihang <chenwhpro@163.com>
Co-authored-by: Qi Li <qili93@qq.com>
Co-authored-by: maxhuiy <1508399706@qq.com>
Co-authored-by: TTerror <tangzhiyi11@users.noreply.github.com>
Co-authored-by: chentianyu03 <chentianyu03@baidu.com>
Co-authored-by: helen88 <z8hanghuan@126.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants