Add viterbi decode #35778

joey12300 · 2021-09-15T14:41:04Z

PR types

New features

PR changes

OPs

Describe

Add viterbi decode op kernel and api.

API description

Example

import numpy as np
import paddle
paddle.seed(102)
batch_size, seq_len, num_tags = 2, 4, 3
emission = paddle.rand((batch_size, seq_len, num_tags), dtype='float32')
length = paddle.randint(1, seq_len + 1, [batch_size])
tags = paddle.randint(0, num_tags, [batch_size, seq_len])
transition = paddle.rand((num_tags, num_tags), dtype='float32')
scores, path = paddle.text.ops.crf_decode(emission, transition, length, False)
# scores: Tensor(shape=[2], dtype=float32, place=CUDAPlace(0), stop_gradient=True, [3.37089300, 1.56825531])
# path: Tensor(shape=[2, 3], dtype=int64, place=CUDAPlace(0), stop_gradient=True, [[1, 0, 0], [1, 1, 0]])

paddle-bot-old · 2021-09-15T14:41:11Z

✅ This PR's description meets the template requirements!
Please wait for other CI results.

paddle-bot-old · 2021-09-15T14:41:22Z

Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

…er frequently

wawltor · 2021-10-15T02:39:17Z

paddle/fluid/operators/viterbi_decode_op.cc

+  void Make() override {
+    AddInput(
+        "Input",
+        "The unary emission tensor. The shape of Input MUST be ( batch_size,"


wawltor · 2021-10-15T02:39:30Z

paddle/fluid/operators/viterbi_decode_op.cc

+        "The unary emission tensor. The shape of Input MUST be ( batch_size,"
+        "sequence_length, num_tags). ");
+    AddInput("Transition",
+             "The transition matrix. The shape of Transition MUST be ( "


同上，都改一下吧

wawltor · 2021-10-15T02:42:53Z

paddle/fluid/operators/viterbi_decode_op.cu

+REGISTER_OP_CUDA_KERNEL(
+    viterbi_decode,
+    ops::ViterbiDecodeKernel<platform::CUDADeviceContext, float>,
+    ops::ViterbiDecodeKernel<platform::CUDADeviceContext, double>);


这块要调研是否能支持fp16，为后面的优化做点准备，如果组合API不支持，可以先不支持

wawltor · 2021-10-15T02:43:36Z

paddle/fluid/operators/viterbi_decode_op.cc

+      PADDLE_ENFORCE_EQ(
+          in_dims[2], transition_dims[0],
+          platform::errors::InvalidArgument(
+              "The number of tags of Input and Transition should be equal."));


这里是不是提示一下错误的信息，例如现在tags是多少，

wawltor · 2021-10-18T02:15:21Z

python/paddle/text/ops.py

+            tensor with shape of [batch_size, sequence_length, num_tags]. The data type is float32 or float64. 
+        transition_params (Tensor): The input tensor of transition matrix. This is a 2-D
+            tensor with shape of [num_tags, num_tags]. The data type is float32 or float64. 
+        sequence_length (Tensor):  The input tensor of real length of each sequence. This is a 1-D


The input tensor of real length -> The input tensor of length

wawltor · 2021-10-18T02:24:58Z

python/paddle/text/ops.py

+            and the data type is float32 or float64.
+        paths(Tensor): The output tensor containing the highest scoring tag indices.  The shape is [batch_size, sequence_length]
+            and  the data type is int64.
+


这里的nn.layer的API是不是不符合规范，在forward里面没有相关文档，同时类的文档中应该是不需要Returns相关的信息

可参考paddle.nn.LSTM，把所有相关文档写在了__init__函数之上。所以应该是符合规范的

wawltor · 2021-10-18T02:27:56Z

paddle/fluid/operators/viterbi_decode_op.h

+    // create int tensor buffer
+    int buffer_size = batch_size * seq_len + batch_size * n_labels * seq_len +
+                      9 * batch_size + 10;
+    LoDTensor int_buffer;


这里的buffer_size的计算方式，以及一些超参是不是要要注释一下？ & 同时解释一下使用buffer的原因吧

wawltor · 2021-10-18T02:32:37Z

paddle/fluid/operators/viterbi_decode_op.h

+    int_buffer.mutable_data<int64_t>(ctx.GetPlace());
+    TensorBuffer int_tensor_buffer(int_buffer);
+    // create float tensor buffer
+    buffer_size = seq_len * batch_size * n_labels + 5 * batch_size * n_labels +


已解释超参含义

wawltor · 2021-10-18T02:35:17Z

paddle/fluid/operators/viterbi_decode_op.cu

+        1, input.numel(), 1, input.data<int64_t>(), nullptr,
+        out_data.data<int64_t>());
+    Tensor max_value_tensor;
+    framework::TensorCopy(out_data, platform::CPUPlace(), &max_value_tensor);


想确认一下 TensorCopy之前不需要对max_value_tensor分配内存吗？

不需要了，TensorCopy会调用mutable_data分配显存

wawltor · 2021-10-18T02:38:28Z

paddle/fluid/operators/viterbi_decode_op.cu

+    Tensor out_data;
+    out_data.Resize(framework::make_ddim({1}));
+    out_data.mutable_data<T>(platform::CUDAPlace());
+    ArgmaxCUDAKernel<T, T, 32><<<1, 32, 0, dev_ctx.stream()>>>(


这里的grid和block数为啥1和32了？

已使用ComputeBlockSize设置

wawltor · 2021-10-18T02:41:12Z

paddle/fluid/operators/viterbi_decode_op.cu

+    const T* in_data = input.data<T>();
+    IndType* out_idx_data = out_idx->data<IndType>();
+    T* out_data = out->data<T>();
+    CUDA_ARGMAX(128);


这里的block_dim为啥是128，不是应该当前的设备的情况设定block_dim吗?

已使用ComputeBlockSize设置

wawltor · 2021-10-18T02:49:58Z

paddle/fluid/operators/viterbi_decode_op.h

+    }
+    SubInt(dev_ctx, left_length, one, &left_length);
+    Argmax<DeviceContext, T, int64_t> argmax;
+    for (int64_t i = 1; i < max_seq_len; ++i) {


如果最大的max_seq_len = 1，这种情况是否考虑了？

下面在进行路径推导前，已经先设置一个last_ids，保证即使max_seq_len=1，路径仍然存在。经测试，该op也能处理max_seq_len=1的情况

wawltor · 2021-10-18T02:50:34Z

paddle/fluid/operators/viterbi_decode_op.h

+      auto alpha_argmax_temp = alpha_argmax_unbind[i - 1];
+      alpha_argmax_temp.Resize({batch_size, n_labels});
+      argmax(ctx, alpha_trn_sum, &alpha_argmax_temp, &alpha_max, 1);
+      historys.push_back(alpha_argmax_temp);


这里尝试用一下emplace_back

wawltor · 2021-10-18T02:52:07Z

paddle/fluid/operators/viterbi_decode_op.h

+           &batch_path[actual_len - last_ids_index]);
+    ARange<DeviceContext> arange;
+    arange(dev_ctx, batch_offset.data<int64_t>(), batch_size, n_labels);
+    Gather<DeviceContext, int64_t, int64_t> gather;


这里的逻辑相对比较复杂，可以将PaddleNLP实现的python版本的链接复制过来，后续的同学能看得懂这块的代码逻辑

wawltor · 2021-10-18T02:54:41Z

paddle/fluid/operators/viterbi_decode_op.cu

+struct ARange<platform::CUDADeviceContext> {
+  void operator()(const platform::CUDADeviceContext& dev_ctx, int64_t* data,
+                  int end, int64_t scale) {
+    ARangeKernel<<<1, 128, 0, dev_ctx.stream()>>>(data, end, scale);


如上，这里设置成128看起来不太正常

已使用ComputeBlockSize设置

wawltor · 2021-10-18T02:56:28Z

python/paddle/text/ops.py

+    Shape:
+        potentials (Tensor): The input tensor of unary emission. This is a 3-D
+            tensor with shape of [batch_size, sequence_length, num_tags]. The data type is float32 or float64. 
+        length (Tensor):  The input tensor of real length of each sequence. This is a 1-D


real去掉吧

wawltor · 2021-10-18T02:58:16Z

python/paddle/text/ops.py

+            the last row and the last column of transitions will be considered as start tag, the the penultimate row and 
+            the penultimate column of transitions will be considered as stop tag. Else, all the rows and columns will be
+            considered as the real tag. Defaults to ``True``.
+        name (str|None) – A name for this layer(optional). If set None, the layer will be named automatically.


name(str|None) -> name(str, optional) , default value is None

… add_viterbi_decode

XiaoguangHu01 · 2021-10-19T04:04:11Z

python/paddle/text/ops.py

+    return scores, path
+
+
+class ViterbiDecoder(Layer):


ViterbiDecoder和crf_decode，看起来调用的是同一个函数，是不是可以用统一的名字？

已将crf_decode改为viterbi_decode

XiaoguangHu01 · 2021-10-19T04:10:32Z

python/paddle/text/ops.py

+def crf_decode(potentials,
+               transition_params,
+               lengths,
+               include_start_end_tag=True,


参数名称一般start对应stop, begin对应end，
如果是用来表示范围时推荐用[start, stop)跟python和numpy的命名一致，
如果是用来表示句子的开始和结束符号，习惯上一般用 begin of sentence 和 end of sentence

include_start_end_tag已改为include_bos_eos_tag

jzhang533 · 2021-10-21T03:09:02Z

python/paddle/text/viterbi_decode.py

+        lengths (Tensor):  The input tensor of length of each sequence. This is a 1-D tensor with shape of [batch_size]. The data type is int64. 
+        include_bos_eos_tag (`bool`, optional): If set to True, the last row and the last column of transitions will be considered
+            as start tag, the penultimate row and the penultimate column of transitions will be considered as stop tag. Defaults to ``True``.
+        name(str, optional): Default value is None.


name参数的说明需要完整。

已改为

name (str, optional): The default value is None. Normally there is no need for user to set this property. For more information, please refer to :ref:`api_guide_Name`.

jzhang533 · 2021-10-21T03:10:15Z

python/paddle/text/viterbi_decode.py

+            tensor with shape of [num_tags, num_tags]. The data type is float32 or float64. 
+        lengths (Tensor):  The input tensor of length of each sequence. This is a 1-D tensor with shape of [batch_size]. The data type is int64. 
+        include_bos_eos_tag (`bool`, optional): If set to True, the last row and the last column of transitions will be considered
+            as start tag, the penultimate row and the penultimate column of transitions will be considered as stop tag. Defaults to ``True``.


penultimate 这个词比较少见呀。
是不是用second to last更好。

已修改second to last

jzhang533 · 2021-10-21T03:10:41Z

python/paddle/text/viterbi_decode.py

+    Example:
+        .. code-block:: python
+
+            import numpy as np


numpy在示例代码中没用到吧

已移除numpy

jzhang533 · 2021-10-21T03:11:09Z

python/paddle/text/viterbi_decode.py

+        transitions (`Tensor`): The transition matrix.  Its dtype is float32 and has a shape of `[num_tags, num_tags]`.
+        include_bos_eos_tag (`bool`, optional): If set to True, the last row and the last column of transitions will be considered
+            as start tag, the penultimate row and the penultimate column of transitions will be considered as stop tag. Defaults to ``True``.
+        name(str, optional): Default value is None.


已改为

name (str, optional): The default value is None. Normally there is no need for user to set this property. For more information, please refer to :ref:`api_guide_Name`.

jzhang533 · 2021-10-21T03:11:27Z

python/paddle/text/viterbi_decode.py

+    Example:
+        .. code-block:: python
+
+            import numpy as np


已移除numpy

jzhang533

LGTM

XiaoguangHu01

LGTM

wawltor

LGTM

* add viterbi decode cpu kernel * add viterbi decoder api in paddle.text * add a data buffer once to avoid create many small pieces of data buffer frequently * fix viterbi max_seq_length bug * fix seq_len=1 bug * fix device context * move split out of for loop * remove INVERSE_SUB * remove 2 GET_CAST_MASK * remove 1 loop * remove Functor * add to_static deploy code * use MAX_FUNC instead of ELE_MAX * add MaxFunctor * impl max_func * remove MaxFunctor * remove cast op * use REGISTER_OP_WITHOUT_GRADIENT * add viterbi cuda kernel * add FIX_BLOCKDIM_CASE macro * add MKL add, mul; add get data mask * add arange mkl impl * add CPU Argmax * add cpu gather * use EXECUTE_MKL_ELEMENT_BINARY_OP instead of some ADD, MUL * use SameDimsBinaryOP instead of EXECUTE_MKL_ELEMENT_BINARY_OP * use SAME_DIMS_ELEMENT_BINARY_OP * add SimpleBroadcastBinaryOP * use int instead of int64_t to accelerate * optimize SimpleBroadcastBinaryOP * optimize SimpleBroadcastBinaryOP * optimize performance in both single thread and multithread situation * remove useless line * remove useless code * add CREATE_TENSOR_BUFFER macro * add INIT_REQUIRED_TENSOR macro * add comment * fix windows ci * add viterbi unittest * remove cuda add functor * remove cuda equal * remove a template function * fix windows ci * fix windows dtype * remove some template instance * remove useless header file * remove some blockdim * remove transpose impl * accelerate cpu performance on single thread situation * viterbi_decode->crf_decode * rename crf params name * add viterbi api test * remove useless import * add enable_static * use viterbi decoder * fix viterbi len=1 * fix viterbi unittest * remove useless comments * reconstruct viterbi decode * remove ADD,SUB,MUL structure * fix coverage * remove CREATE_TENSOR * add name args * crf.py->ops.py; with_start_stop_tag->include_start_end_tag * update crf_decode en docs * fix viterbi decode en docs * fix some review comments * add FIXED_BLOCK_DIM_CASE in cuda * push_back->emplace_back * crf_decode->viterbi_decode; include_start_end_tag->include_bos_eos_tag * paddle.text.ops.viterbi_decode->paddle.text.viterbi_decode * fix viterbi_decode en docs

Cppowboy · 2021-11-03T08:51:22Z

请教一下，这里返回的score是(batch_size, seq_len, num_tags)的shape吗？

joey12300 · 2021-11-03T08:54:24Z

请教一下，这里返回的score是(batch_size, seq_len, num_tags)的shape吗？

不是，返回的是(batch_size)的shape，表示每一个样本最后一步的最高得分

add viterbi decode cpu kernel

93e70a9

joey12300 added 3 commits September 16, 2021 10:37

add viterbi decoder api in paddle.text

3f33b76

add a data buffer once to avoid create many small pieces of data buff…

c176fca

…er frequently

fix viterbi max_seq_length bug

f137f6b

joey12300 force-pushed the add_viterbi_decode branch from 2dadd39 to f137f6b Compare September 16, 2021 04:44

joey12300 force-pushed the add_viterbi_decode branch from bf5a586 to 0a8b21d Compare September 16, 2021 07:23

fix seq_len=1 bug

dcbc972

joey12300 force-pushed the add_viterbi_decode branch from 0a8b21d to dcbc972 Compare September 16, 2021 07:26

joey12300 added 8 commits September 17, 2021 10:43

fix device context

d258e07

move split out of for loop

f75a4be

remove INVERSE_SUB

3ec4097

remove 2 GET_CAST_MASK

188d933

remove 1 loop

69e1f85

remove Functor

face1f1

add to_static deploy code

08daa51

use MAX_FUNC instead of ELE_MAX

a0777ff

add MaxFunctor

6ddc7d4

joey12300 force-pushed the add_viterbi_decode branch from f031df6 to 6ddc7d4 Compare September 19, 2021 16:12

joey12300 added 2 commits September 20, 2021 00:52

impl max_func

36f371b

remove MaxFunctor

9525039

joey12300 force-pushed the add_viterbi_decode branch from 54217f5 to 9525039 Compare September 19, 2021 17:03

joey12300 added 3 commits September 20, 2021 09:04

remove cast op

2698874

use REGISTER_OP_WITHOUT_GRADIENT

8d0b3f6

add viterbi cuda kernel

5d2259b

fix viterbi decode en docs

f6fc897

wawltor reviewed Oct 18, 2021

View reviewed changes

joey12300 added 4 commits October 18, 2021 11:32

fix some review comments

1c68da2

add FIXED_BLOCK_DIM_CASE in cuda

d116447

push_back->emplace_back

c91752d

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

c0841f1

… add_viterbi_decode

XiaoguangHu01 reviewed Oct 19, 2021

View reviewed changes

joey12300 added 2 commits October 19, 2021 18:37

crf_decode->viterbi_decode; include_start_end_tag->include_bos_eos_tag

b788042

paddle.text.ops.viterbi_decode->paddle.text.viterbi_decode

2586df5

joey12300 changed the title ~~[WIP] Add viterbi decode~~ Add viterbi decode Oct 20, 2021

jzhang533 reviewed Oct 21, 2021

View reviewed changes

fix viterbi_decode en docs

76e31b5

jzhang533 approved these changes Oct 21, 2021

View reviewed changes

XiaoguangHu01 approved these changes Oct 21, 2021

View reviewed changes

wawltor approved these changes Oct 21, 2021

View reviewed changes

joey12300 merged commit 6072aec into PaddlePaddle:develop Oct 21, 2021

WenmuZhou mentioned this pull request Oct 25, 2021

crf_decoding不支持返回预测的概率 #36586

Closed

Add viterbi decode #35778

Add viterbi decode #35778

Conversation

joey12300 commented Sep 15, 2021 • edited Loading

PR types

PR changes

Describe

API description

Example

paddle-bot-old bot commented Sep 15, 2021 • edited Loading

paddle-bot-old bot commented Sep 15, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wawltor Oct 18, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wawltor Oct 18, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

joey12300 Oct 21, 2021 • edited Loading

Choose a reason for hiding this comment

jzhang533 left a comment

Choose a reason for hiding this comment

XiaoguangHu01 left a comment

Choose a reason for hiding this comment

wawltor left a comment

Choose a reason for hiding this comment

Cppowboy commented Nov 3, 2021

joey12300 commented Nov 3, 2021

joey12300 commented Sep 15, 2021 •

edited

Loading

paddle-bot-old bot commented Sep 15, 2021 •

edited

Loading

wawltor Oct 18, 2021 •

edited

Loading

wawltor Oct 18, 2021 •

edited

Loading

joey12300 Oct 21, 2021 •

edited

Loading