Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add viterbi decode #35778

Merged
merged 74 commits into from
Oct 21, 2021
Merged

Conversation

joey12300
Copy link
Contributor

@joey12300 joey12300 commented Sep 15, 2021

PR types

New features

PR changes

OPs

Describe

Add viterbi decode op kernel and api.

API description

Example

import numpy as np
import paddle
paddle.seed(102)
batch_size, seq_len, num_tags = 2, 4, 3
emission = paddle.rand((batch_size, seq_len, num_tags), dtype='float32')
length = paddle.randint(1, seq_len + 1, [batch_size])
tags = paddle.randint(0, num_tags, [batch_size, seq_len])
transition = paddle.rand((num_tags, num_tags), dtype='float32')
scores, path = paddle.text.ops.crf_decode(emission, transition, length, False)
# scores: Tensor(shape=[2], dtype=float32, place=CUDAPlace(0), stop_gradient=True, [3.37089300, 1.56825531])
# path: Tensor(shape=[2, 3], dtype=int64, place=CUDAPlace(0), stop_gradient=True, [[1, 0, 0], [1, 1, 0]])

@paddle-bot-old
Copy link

paddle-bot-old bot commented Sep 15, 2021

✅ This PR's description meets the template requirements!
Please wait for other CI results.

@paddle-bot-old
Copy link

Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

void Make() override {
AddInput(
"Input",
"The unary emission tensor. The shape of Input MUST be ( batch_size,"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MUST->must

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

"The unary emission tensor. The shape of Input MUST be ( batch_size,"
"sequence_length, num_tags). ");
AddInput("Transition",
"The transition matrix. The shape of Transition MUST be ( "
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

同上,都改一下吧

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

REGISTER_OP_CUDA_KERNEL(
viterbi_decode,
ops::ViterbiDecodeKernel<platform::CUDADeviceContext, float>,
ops::ViterbiDecodeKernel<platform::CUDADeviceContext, double>);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这块要调研是否能支持fp16,为后面的优化做点准备,如果组合API不支持,可以先不支持

PADDLE_ENFORCE_EQ(
in_dims[2], transition_dims[0],
platform::errors::InvalidArgument(
"The number of tags of Input and Transition should be equal."));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里是不是提示一下错误的信息,例如现在tags是多少,

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

tensor with shape of [batch_size, sequence_length, num_tags]. The data type is float32 or float64.
transition_params (Tensor): The input tensor of transition matrix. This is a 2-D
tensor with shape of [num_tags, num_tags]. The data type is float32 or float64.
sequence_length (Tensor): The input tensor of real length of each sequence. This is a 1-D
Copy link
Contributor

@wawltor wawltor Oct 18, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The input tensor of real length -> The input tensor of length

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

and the data type is float32 or float64.
paths(Tensor): The output tensor containing the highest scoring tag indices. The shape is [batch_size, sequence_length]
and the data type is int64.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里的nn.layer的API是不是不符合规范,在forward里面没有相关文档,同时类的文档中应该是不需要Returns相关的信息

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

可参考paddle.nn.LSTM,把所有相关文档写在了__init__函数之上。所以应该是符合规范的

// create int tensor buffer
int buffer_size = batch_size * seq_len + batch_size * n_labels * seq_len +
9 * batch_size + 10;
LoDTensor int_buffer;
Copy link
Contributor

@wawltor wawltor Oct 18, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里的buffer_size的计算方式,以及一些超参是不是要要注释一下? & 同时解释一下使用buffer的原因吧

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已解释

int_buffer.mutable_data<int64_t>(ctx.GetPlace());
TensorBuffer int_tensor_buffer(int_buffer);
// create float tensor buffer
buffer_size = seq_len * batch_size * n_labels + 5 * batch_size * n_labels +
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

如上

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已解释超参含义

1, input.numel(), 1, input.data<int64_t>(), nullptr,
out_data.data<int64_t>());
Tensor max_value_tensor;
framework::TensorCopy(out_data, platform::CPUPlace(), &max_value_tensor);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

想确认一下 TensorCopy之前不需要对max_value_tensor分配内存吗?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

不需要了,TensorCopy会调用mutable_data分配显存

Tensor out_data;
out_data.Resize(framework::make_ddim({1}));
out_data.mutable_data<T>(platform::CUDAPlace());
ArgmaxCUDAKernel<T, T, 32><<<1, 32, 0, dev_ctx.stream()>>>(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里的grid和block数为啥1和32了?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已使用ComputeBlockSize设置

const T* in_data = input.data<T>();
IndType* out_idx_data = out_idx->data<IndType>();
T* out_data = out->data<T>();
CUDA_ARGMAX(128);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里的block_dim为啥是128,不是应该当前的设备的情况设定block_dim吗?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已使用ComputeBlockSize设置

}
SubInt(dev_ctx, left_length, one, &left_length);
Argmax<DeviceContext, T, int64_t> argmax;
for (int64_t i = 1; i < max_seq_len; ++i) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

如果最大的max_seq_len = 1,这种情况是否考虑了?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image
下面在进行路径推导前,已经先设置一个last_ids,保证即使max_seq_len=1,路径仍然存在。经测试,该op也能处理max_seq_len=1的情况

auto alpha_argmax_temp = alpha_argmax_unbind[i - 1];
alpha_argmax_temp.Resize({batch_size, n_labels});
argmax(ctx, alpha_trn_sum, &alpha_argmax_temp, &alpha_max, 1);
historys.push_back(alpha_argmax_temp);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里尝试用一下emplace_back

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已替换

&batch_path[actual_len - last_ids_index]);
ARange<DeviceContext> arange;
arange(dev_ctx, batch_offset.data<int64_t>(), batch_size, n_labels);
Gather<DeviceContext, int64_t, int64_t> gather;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里的逻辑相对比较复杂,可以将PaddleNLP实现的python版本的链接复制过来,后续的同学能看得懂这块的代码逻辑

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

struct ARange<platform::CUDADeviceContext> {
void operator()(const platform::CUDADeviceContext& dev_ctx, int64_t* data,
int end, int64_t scale) {
ARangeKernel<<<1, 128, 0, dev_ctx.stream()>>>(data, end, scale);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

如上,这里设置成128看起来不太正常

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已使用ComputeBlockSize设置

Shape:
potentials (Tensor): The input tensor of unary emission. This is a 3-D
tensor with shape of [batch_size, sequence_length, num_tags]. The data type is float32 or float64.
length (Tensor): The input tensor of real length of each sequence. This is a 1-D
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

real去掉吧

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已去掉

the last row and the last column of transitions will be considered as start tag, the the penultimate row and
the penultimate column of transitions will be considered as stop tag. Else, all the rows and columns will be
considered as the real tag. Defaults to ``True``.
name (str|None) – A name for this layer(optional). If set None, the layer will be named automatically.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

name(str|None) -> name(str, optional) , default value is None

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

return scores, path


class ViterbiDecoder(Layer):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ViterbiDecoder和crf_decode,看起来调用的是同一个函数,是不是可以用统一的名字?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已将crf_decode改为viterbi_decode

def crf_decode(potentials,
transition_params,
lengths,
include_start_end_tag=True,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

参数名称一般start对应stop, begin对应end,
如果是用来表示范围时推荐用[start, stop)跟python和numpy的命名一致,
如果是用来表示句子的开始和结束符号,习惯上一般用 begin of sentence 和 end of sentence

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

include_start_end_tag已改为include_bos_eos_tag

@joey12300 joey12300 changed the title [WIP] Add viterbi decode Add viterbi decode Oct 20, 2021
lengths (Tensor): The input tensor of length of each sequence. This is a 1-D tensor with shape of [batch_size]. The data type is int64.
include_bos_eos_tag (`bool`, optional): If set to True, the last row and the last column of transitions will be considered
as start tag, the penultimate row and the penultimate column of transitions will be considered as stop tag. Defaults to ``True``.
name(str, optional): Default value is None.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

name参数的说明需要完整。

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已改为

name (str, optional): The default value is None. Normally there is no need for user to set this property. For more information, please
            refer to :ref:`api_guide_Name`.

tensor with shape of [num_tags, num_tags]. The data type is float32 or float64.
lengths (Tensor): The input tensor of length of each sequence. This is a 1-D tensor with shape of [batch_size]. The data type is int64.
include_bos_eos_tag (`bool`, optional): If set to True, the last row and the last column of transitions will be considered
as start tag, the penultimate row and the penultimate column of transitions will be considered as stop tag. Defaults to ``True``.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

penultimate 这个词比较少见呀。
是不是用second to last更好。

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已修改second to last

Example:
.. code-block:: python

import numpy as np
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

numpy在示例代码中没用到吧

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已移除numpy

transitions (`Tensor`): The transition matrix. Its dtype is float32 and has a shape of `[num_tags, num_tags]`.
include_bos_eos_tag (`bool`, optional): If set to True, the last row and the last column of transitions will be considered
as start tag, the penultimate row and the penultimate column of transitions will be considered as stop tag. Defaults to ``True``.
name(str, optional): Default value is None.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

同上

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已改为

name (str, optional): The default value is None. Normally there is no need for user to set this property. For more information, please
            refer to :ref:`api_guide_Name`.

Example:
.. code-block:: python

import numpy as np
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

同上

Copy link
Contributor Author

@joey12300 joey12300 Oct 21, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已移除numpy

Copy link
Contributor

@jzhang533 jzhang533 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@XiaoguangHu01 XiaoguangHu01 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@wawltor wawltor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@joey12300 joey12300 merged commit 6072aec into PaddlePaddle:develop Oct 21, 2021
joey12300 added a commit to joey12300/Paddle that referenced this pull request Oct 21, 2021
* add viterbi decode cpu kernel

* add viterbi decoder api in paddle.text

* add a data buffer once to avoid create many small pieces of data buffer frequently

* fix viterbi max_seq_length bug

* fix seq_len=1 bug

* fix device context

* move split out of for loop

* remove INVERSE_SUB

* remove 2 GET_CAST_MASK

* remove 1 loop

* remove Functor

* add to_static deploy code

* use MAX_FUNC instead of ELE_MAX

* add MaxFunctor

* impl max_func

* remove MaxFunctor

* remove cast op

* use REGISTER_OP_WITHOUT_GRADIENT

* add viterbi cuda kernel

* add FIX_BLOCKDIM_CASE macro

* add MKL add, mul; add get data mask

* add arange mkl impl

* add CPU Argmax

* add cpu gather

* use EXECUTE_MKL_ELEMENT_BINARY_OP instead of some ADD, MUL

* use SameDimsBinaryOP instead of EXECUTE_MKL_ELEMENT_BINARY_OP

* use SAME_DIMS_ELEMENT_BINARY_OP

* add SimpleBroadcastBinaryOP

* use int instead of int64_t to accelerate

* optimize SimpleBroadcastBinaryOP

* optimize SimpleBroadcastBinaryOP

* optimize performance in both single thread and multithread situation

* remove useless line

* remove useless code

* add CREATE_TENSOR_BUFFER macro

* add INIT_REQUIRED_TENSOR macro

* add comment

* fix windows ci

* add viterbi unittest

* remove cuda add functor

* remove cuda equal

* remove a template function

* fix windows ci

* fix windows dtype

* remove some template instance

* remove useless header file

* remove some blockdim

* remove transpose impl

* accelerate cpu performance on single thread situation

* viterbi_decode->crf_decode

* rename crf params name

* add viterbi api test

* remove useless import

* add enable_static

* use viterbi decoder

* fix viterbi len=1

* fix  viterbi unittest

* remove useless comments

* reconstruct viterbi decode

* remove ADD,SUB,MUL structure

* fix coverage

* remove CREATE_TENSOR

* add name args

* crf.py->ops.py; with_start_stop_tag->include_start_end_tag

* update crf_decode en docs

* fix viterbi decode en docs

* fix some review comments

* add FIXED_BLOCK_DIM_CASE in cuda

* push_back->emplace_back

* crf_decode->viterbi_decode; include_start_end_tag->include_bos_eos_tag

* paddle.text.ops.viterbi_decode->paddle.text.viterbi_decode

* fix viterbi_decode en docs
XiaoguangHu01 pushed a commit that referenced this pull request Oct 23, 2021
* add viterbi decode cpu kernel

* add viterbi decoder api in paddle.text

* add a data buffer once to avoid create many small pieces of data buffer frequently

* fix viterbi max_seq_length bug

* fix seq_len=1 bug

* fix device context

* move split out of for loop

* remove INVERSE_SUB

* remove 2 GET_CAST_MASK

* remove 1 loop

* remove Functor

* add to_static deploy code

* use MAX_FUNC instead of ELE_MAX

* add MaxFunctor

* impl max_func

* remove MaxFunctor

* remove cast op

* use REGISTER_OP_WITHOUT_GRADIENT

* add viterbi cuda kernel

* add FIX_BLOCKDIM_CASE macro

* add MKL add, mul; add get data mask

* add arange mkl impl

* add CPU Argmax

* add cpu gather

* use EXECUTE_MKL_ELEMENT_BINARY_OP instead of some ADD, MUL

* use SameDimsBinaryOP instead of EXECUTE_MKL_ELEMENT_BINARY_OP

* use SAME_DIMS_ELEMENT_BINARY_OP

* add SimpleBroadcastBinaryOP

* use int instead of int64_t to accelerate

* optimize SimpleBroadcastBinaryOP

* optimize SimpleBroadcastBinaryOP

* optimize performance in both single thread and multithread situation

* remove useless line

* remove useless code

* add CREATE_TENSOR_BUFFER macro

* add INIT_REQUIRED_TENSOR macro

* add comment

* fix windows ci

* add viterbi unittest

* remove cuda add functor

* remove cuda equal

* remove a template function

* fix windows ci

* fix windows dtype

* remove some template instance

* remove useless header file

* remove some blockdim

* remove transpose impl

* accelerate cpu performance on single thread situation

* viterbi_decode->crf_decode

* rename crf params name

* add viterbi api test

* remove useless import

* add enable_static

* use viterbi decoder

* fix viterbi len=1

* fix  viterbi unittest

* remove useless comments

* reconstruct viterbi decode

* remove ADD,SUB,MUL structure

* fix coverage

* remove CREATE_TENSOR

* add name args

* crf.py->ops.py; with_start_stop_tag->include_start_end_tag

* update crf_decode en docs

* fix viterbi decode en docs

* fix some review comments

* add FIXED_BLOCK_DIM_CASE in cuda

* push_back->emplace_back

* crf_decode->viterbi_decode; include_start_end_tag->include_bos_eos_tag

* paddle.text.ops.viterbi_decode->paddle.text.viterbi_decode

* fix viterbi_decode en docs
@Cppowboy
Copy link

Cppowboy commented Nov 3, 2021

请教一下,这里返回的score是(batch_size, seq_len, num_tags)的shape吗?

@joey12300
Copy link
Contributor Author

请教一下,这里返回的score是(batch_size, seq_len, num_tags)的shape吗?

不是,返回的是(batch_size)的shape,表示每一个样本最后一步的最高得分

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants