Skip to content

Commit

Permalink
Dygraph fix3 (PaddlePaddle#457)
Browse files Browse the repository at this point in the history
* update readme

* update demo

* + 160G model

* qa model bugfix: models inherits docstrings

* Update README.zh.md

* Update README.en.md

* Update README.zh.md

* reorganize binaries

* Update README.zh.md

* Update README.en.md

* Update README.zh.md

* Update README.en.md
  • Loading branch information
Meiyim committed May 22, 2020
1 parent ef8879f commit fd360a7
Show file tree
Hide file tree
Showing 13 changed files with 75 additions and 44 deletions.
Binary file removed .metas/ERNIE_milestone_chn.png
Binary file not shown.
Binary file added .metas/ERNIE_milestone_en.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
File renamed without changes
Binary file removed .metas/dygraph_show.gif
Binary file not shown.
Binary file removed .metas/ernie-head-banner.gif
Binary file not shown.
Binary file removed .metas/ernie.png
Binary file not shown.
41 changes: 23 additions & 18 deletions README.en.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
English|[简体中文](./README.zh.md)

![./.metas/ERNIE_milestone.png](./.metas/ERNIE_milestone.png)
![./.metas/ERNIE_milestone.png](./.metas/ERNIE_milestone_en.png)


**Remind: This repo has been refactored, for paper re-production or backward compatibility; plase checkout to [repro branch](https://github.com/PaddlePaddle/ERNIE/tree/repro)**
Expand Down Expand Up @@ -89,23 +89,23 @@ pip install paddle-ernie
or

```shell
git clone -b dygraph https://github.com/PaddlePaddle/ERNIE.git --single-branch
git clone https://github.com/PaddlePaddle/ERNIE.git --depth 1
cd ERNIE
pip install -r requirements.txt
pip setup.py -e .

pip install -e .
```

##### 3. download pretrained models (optional)

| Model | Description |
| :------------------------------------------------- | :----------------------------------------------------------- |
| [ERNIE 1.0 Base for Chinese](https://ernie-github.cdn.bcebos.com/model-ernie1.0.1.tar.gz) | L12H768A12 |
| [ERNIE Tiny](https://ernie-github.cdn.bcebos.com/model-ernie_tiny.1.tar.gz) | L3H1024A16 |
| [ERNIE 2.0 Base for English](https://ernie-github.cdn.bcebos.com/model-ernie2.0-en.1.tar.gz) | L12H768A12 |
| [ERNIE 2.0 Large for English](https://ernie-github.cdn.bcebos.com/model-ernie2.0-large-en.1.tar.gz) | L24H1024A16 |
| [ERNIE Gen base for English](https://ernie-github.cdn.bcebos.com/model-ernie-gen-base-en.1.tar.gz) | L12H768A12 |
| [ERNIE Gen Large for English](https://ernie-github.cdn.bcebos.com/model-ernie-gen-large-en.1.tar.gz)| L24H1024A16 |
| Model | Description |abbreviation|
| :------------------------------------------------- | :----------------------------------------------------------- |:-----------|
| [ERNIE 1.0 Base for Chinese](https://ernie-github.cdn.bcebos.com/model-ernie1.0.1.tar.gz) | L12H768A12 |ernie-1.0|
| [ERNIE Tiny](https://ernie-github.cdn.bcebos.com/model-ernie_tiny.1.tar.gz) | L3H1024A16 |ernie-tiny|
| [ERNIE 2.0 Base for English](https://ernie-github.cdn.bcebos.com/model-ernie2.0-en.1.tar.gz) | L12H768A12 |ernie-2.0-en|
| [ERNIE 2.0 Large for English](https://ernie-github.cdn.bcebos.com/model-ernie2.0-large-en.1.tar.gz) | L24H1024A16 |ernie-2.0-large-en|
| [ERNIE Gen base for English](https://ernie-github.cdn.bcebos.com/model-ernie-gen-base-en.1.tar.gz) | L12H768A12 |ernie-gen-base-en|
| [ERNIE Gen Large for English](https://ernie-github.cdn.bcebos.com/model-ernie-gen-large-en.1.tar.gz)| L24H1024A16 | ernie-gen-large-en |
| [ERNIE Gen Large 160G for English](https://ernie-github.cdn.bcebos.com/model-ernie-gen-large-en.1.tar.gz)| Layer:24, Hidden:1024, Heads:16 + 160G pretrain corpus | ernie-gen-large-160g-en |

##### 4. download datasets

Expand Down Expand Up @@ -143,26 +143,31 @@ see [demo](https://ernie-github.cdn.bcebos.com/data-mnli-m.tar.gz) data for MNLI
- try eager execution with `dygraph model` :

```script
python3 ./demo/finetune_classifier_dygraph.py \
--from_pretrained ernie_1.0 \
--data_dir ./data/xnli
python3 ./ernie_d/demo/finetune_classifier_dygraph.py \
--from_pretrained ernie-1.0 \
--data_dir ./data/xnli
```

- Distributed finetune

`paddle.distributed.launch` is a process manager, we use it to launch python processes on each avalible GPU devices:

when in distributed training, `max_steps` is used as stopping criteria rather than `epoch` to prevent dead block.
also notice than we shard the train data according to device id to prevent over fitting.
When in distributed training, `max_steps` is used as stopping criteria rather than `epoch` to prevent dead block.
You could calculate `max_steps` with `EPOCH * NUM_TRAIN_EXAMPLES / TOTAL_BATCH`.
Also notice than we shard the train data according to device id to prevent over fitting.

demo:
(make sure you have more than 2 GPUs,
online model download can not work in `paddle.distributed.launch`,
you need to run single card finetuning first to get pretrained model, or donwload and extract one manualy from [here](#section-pretrained-models)):


```script
python3 -m paddle.distributed.launch \
./demo/finetune_classifier_dygraph_distributed.py \
--data_dir data/mnli \
--max_steps 10000 \
--from_pretrained ernie2.0-en
--from_pretrained ernie-2.0-en
```


Expand Down
37 changes: 20 additions & 17 deletions README.zh.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[English](./README.en.md)|简体中文

![./.metas/ERNIE_milestone.png](./.metas/ERNIE_milestone.png)
![./.metas/ERNIE_milestone.png](./.metas/ERNIE_milestone_zh.png)

ERNIE是百度开创性提出的基于知识增强的持续学习语义理解框架,该框架将大数据预训练与多源丰富知识相结合,通过持续学习技术,不断吸收海量文本数据中词汇、结构、语义等方面的知识,实现模型效果不断进化。ERNIE在情感分析、文本匹配、自然语言推理、词法分析、阅读理解、智能问答等16个公开数据集上全面显著超越世界领先技术,在国际权威的通用语言理解评估基准GLUE上,得分首次突破90分,获得全球第一。在今年3月落下帷幕的全球最大语义评测SemEval 2020上,ERNIE摘得5项世界冠军, 该技术也被全球顶级科技商业杂志《麻省理工科技评论》官方网站报道,相关创新成果也被国际顶级学术会议AAAI、IJCAI收录。ERNIE在工业界得到了大规模应用,如搜索引擎、新闻推荐、广告系统、语音交互、智能客服等。

Expand Down Expand Up @@ -87,24 +87,25 @@ pip install paddle-ernie
或者

```shell
git clone -b dygraph https://github.com/PaddlePaddle/ERNIE.git --single-branch
git clone https://github.com/PaddlePaddle/ERNIE.git --depth 1
cd ERNIE
pip install -r requirements.txt
pip setup.py -e .
pip install -e .

```

##### 3. 下载预训练模型(可选)
##### 3. 下载预训练模型(可选)<a name="section-pretrained-models"></a>


| Model | Description |
| :------------------------------------------------- | :----------------------------------------------------------- |
| [ERNIE 1.0 Base 中文](https://ernie-github.cdn.bcebos.com/model-ernie1.0.1.tar.gz) | L12H768A12 |
| [ERNIE Tiny](https://ernie-github.cdn.bcebos.com/model-ernie_tiny.1.tar.gz) | L3H1024A16 |
| [ERNIE 2.0 Base 英文](https://ernie-github.cdn.bcebos.com/model-ernie2.0-en.1.tar.gz) | base: L12H768A12 |
| [ERNIE 2.0 Large 英文](https://ernie-github.cdn.bcebos.com/model-ernie2.0-large-en.1.tar.gz) | large: L24H1024A16|
| [ERNIE Gen base 英文](https://ernie-github.cdn.bcebos.com/model-ernie-gen-base-en.1.tar.gz) | L12H768A12 |
| [ERNIE Gen Large 英文](https://ernie-github.cdn.bcebos.com/model-ernie-gen-large-en.1.tar.gz)| L24H1024A16 |
| Model | 细节参数 |下载简写|
| :------------------------------------------------- |:------------------------------------------------------------------------- |:-------|
| [ERNIE 1.0 Base 中文](https://ernie-github.cdn.bcebos.com/model-ernie1.0.1.tar.gz) | Layer:12, Hidden:768, Heads:12 |ernie-1.0|
| [ERNIE Tiny](https://ernie-github.cdn.bcebos.com/model-ernie_tiny.1.tar.gz) | Layer:3, Hdden:1024, Heads:16 |ernie-tiny|
| [ERNIE 2.0 Base 英文](https://ernie-github.cdn.bcebos.com/model-ernie2.0-en.1.tar.gz) | Layer:12, Hidden:768, Heads:12 |ernie-2.0-en|
| [ERNIE 2.0 Large 英文](https://ernie-github.cdn.bcebos.com/model-ernie2.0-large-en.1.tar.gz) | Layer:24, Hidden:1024, Heads16 |ernie-2.0-large-en|
| [ERNIE Gen Base 英文](https://ernie-github.cdn.bcebos.com/model-ernie-gen-base-en.1.tar.gz) | Layer:12, Hidden:768, Heads:12 |ernie-gen-base-en|
| [ERNIE Gen Large 英文](https://ernie-github.cdn.bcebos.com/model-ernie-gen-large-en.1.tar.gz)| Layer:24, Hidden:1024, Heads:16 |ernie-gen-large-en|
| [ERNIE Gen Large 160G英文](https://ernie-github.cdn.bcebos.com/model-ernie-gen-large-en.1.tar.gz)| Layer:24, Hidden:1024, Heads:16 + 额外160G 预训练语料 | ernie-gen-large-160g-en |

##### 4. 下载数据集

Expand Down Expand Up @@ -144,19 +145,21 @@ data/xnli
- 使用 `动态图` 模型进行finetune:

```script
python3 ./demo/finetune_classifier_dygraph.py \
--from_pretrained ernie_1.0 \
--data_dir ./data/xnli
python3 ./ernie_d/demo/finetune_classifier_dygraph.py \
--from_pretrained ernie-1.0 \
--data_dir ./data/xnli
```

- 分布式 finetune

`paddle.distributed.launch` 是一个进程管理器,我们采用它在每一张GPU上启动一个python进程,并配置相应的环境变量以进行分布式训练:

当采用分布式训练时,我们采用`max_steps`做为终止条件而非`epoch`, 这样处理是为了避免进程间死锁。
你可以通过`EPOCH * NUM_TRAIN_EXAMPLES / TOTAL_BATCH`的方式计算出所需执行的`max_steps`.
另外值得注意的是训练集需要在不同的进程间进行切分;以避免所有进程训练同一份数据造成的过拟合。

示例脚本(请确保你有两张以上GPU卡):
示例脚本(请确保你有两张以上GPU卡, 在线模型下载功能在`paddle.distributed.launch`下无法工作,
你可能需要一个先通过单卡finetune方式下载预训练模型,或者根据[这里](#section-pretrained-models)手动下载并解压预训练模型):

```script
python3 -m paddle.distributed.launch \
Expand Down Expand Up @@ -227,7 +230,7 @@ sids = np.expand_dims(sids, 0)
result = client(ids, sids)
```

你也可从[此处]((https://ernie.bj.bcebos.com/ernie1.0_zh_inference_model.tar.gz).)下载一个预先制作好的ernie-1.0 base模型的 `inference_model`.
你也可从[此处]((https://ernie.bj.bcebos.com/ernie1.0_zh_inference_model.tar.gz.)下载一个预先制作好的ernie-1.0 base模型的 `inference_model`.
该模型没有经过finetune,一般可以用做上层模型结构的 feature-base finetune或者做为一个文本特征抽取器。
因为该模行由老版API 产出,在进行客户端请求时需要在输入tensor后面追加一个维度:

Expand Down
14 changes: 10 additions & 4 deletions demo/finetune_classifier_dygraph.py
Original file line number Diff line number Diff line change
Expand Up @@ -51,11 +51,13 @@
parser.add_argument('--bsz', type=int, default=32, help='batchsize')
parser.add_argument('--epoch', type=int, default=3, help='epoch')
parser.add_argument('--data_dir', type=str, required=True, help='data directory includes train / develop data')
parser.add_argument('--max_steps', type=int, required=True, help='max_train_steps, set this to EPOCH * NUM_SAMPLES / BATCH_SIZE')
parser.add_argument('--warmup_proportion', type=float, default=0.1)
parser.add_argument('--use_lr_decay', action='store_true', help='if set, learning rate will decay to zero at `max_steps`')
parser.add_argument('--warmup_proportion', type=float, default=0.1, help='if use_lr_decay is set, '
'learning rate will raise to `lr` at `warmup_proportion` * `max_steps` and decay to 0. at `max_steps`')
parser.add_argument('--lr', type=float, default=5e-5, help='learning rate')
parser.add_argument('--inference_model_dir', type=str, default=None, help='inference model output directory')
parser.add_argument('--save_dir', type=str, default=None, help='model output directory')
parser.add_argument('--max_steps', type=int, default=None, help='max_train_steps, set this to EPOCH * NUM_SAMPLES / BATCH_SIZE')
parser.add_argument('--wd', type=float, default=0.01, help='weight decay, aka L2 regularizer')


Expand Down Expand Up @@ -102,7 +104,11 @@ def map_fn(seg_a, seg_b, label):
with FD.guard(place):
model = ErnieModelForSequenceClassification.from_pretrained(args.from_pretrained, num_labels=3, name='')

opt = AdamW(learning_rate=LinearDecay(args.lr, int(args.warmup_proportion * args.max_steps), args.max_steps), parameter_list=model.parameters(), weight_decay=args.wd)
if args.use_lr_decay:
opt = AdamW(learning_rate=LinearDecay(args.lr, int(args.warmup_proportion * args.max_steps), args.max_steps), parameter_list=model.parameters(), weight_decay=args.wd)
else:
opt = AdamW(args.lr, parameter_list=model.parameters(), weight_decay=args.wd)

g_clip = F.dygraph_grad_clip.GradClipByGlobalNorm(1.0) #experimental
for epoch in range(args.epoch):
for step, d in enumerate(tqdm(train_ds.start(place), desc='training')):
Expand All @@ -117,7 +123,7 @@ def map_fn(seg_a, seg_b, label):
acc = []
with FD.base._switch_tracer_mode_guard_(is_train=False):
model.eval()
for step, d in enumerate(tqdm(dev_ds.start(), desc='evaluating %d' % epoch)):
for step, d in enumerate(tqdm(dev_ds.start(place), desc='evaluating %d' % epoch)):
ids, sids, label = d
loss, logits = model(ids, sids, labels=label)
#print('\n'.join(map(str, logits.numpy().tolist())))
Expand Down
4 changes: 2 additions & 2 deletions demo/finetune_mrc_dygraph.py
Original file line number Diff line number Diff line change
Expand Up @@ -44,8 +44,8 @@
from ernie.tokenizing_ernie import ErnieTokenizer, ErnieTinyTokenizer
from ernie.optimization import AdamW, LinearDecay

from ernie.mrc import mrc_reader
from ernie.mrc import mrc_metrics
from demo.mrc import mrc_reader
from demo.mrc import mrc_metrics

log.setLevel(logging.DEBUG)
logging.getLogger().addHandler(log.handlers[0])
Expand Down
7 changes: 7 additions & 0 deletions ernie/file_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -41,3 +41,10 @@ def _fetch_from_remote(url, force_download=False):
log.debug('%s cached in %s' % (url, cached_dir))
return cached_dir


def add_docstring(doc):
def func(f):
f.__doc__ += ('\n======other docs from supper class ======\n%s' % doc)
return f
return func

15 changes: 12 additions & 3 deletions ernie/modeling_ernie.py
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@
import paddle.fluid as F
import paddle.fluid.layers as L

from ernie.file_utils import _fetch_from_remote
from ernie.file_utils import _fetch_from_remote, add_docstring

log = logging.getLogger(__name__)

Expand Down Expand Up @@ -288,6 +288,11 @@ def forward(self, src_ids, sent_ids=None, pos_ids=None, input_mask=None, attn_bi
Mask to avoid performing attention on the padding token indices of the encoder input.
attn_bias(optional, `Variable` of shape `[batch_size, seq_len, seq_len] or False`):
3D version of `input_mask`, if set, overrides `input_mask`; if set not False, will not apply attention mask
past_cache(optional, tuple of two lists: cached key and cached value,
each is a list of `Variable`s of shape `[batch_size, seq_len, hidden_size]`):
cached key/value tensor that will be concated to generated key/value when performing self attention.
if set, `attn_bias` should not be None.
Returns:
pooled (`Variable` of shape `[batch_size, hidden_size]`):
output logits of pooler classifier
Expand Down Expand Up @@ -360,6 +365,7 @@ def __init__(self, cfg, name=None):
prob = cfg.get('classifier_dropout_prob', cfg['hidden_dropout_prob'])
self.dropout = lambda i: L.dropout(i, dropout_prob=prob, dropout_implementation="upscale_in_train",) if self.training else i

@add_docstring(ErnieModel.forward.__doc__)
def forward(self, *args, **kwargs):
"""
Args:
Expand Down Expand Up @@ -400,6 +406,7 @@ def __init__(self, cfg, name=None):
prob = cfg.get('classifier_dropout_prob', cfg['hidden_dropout_prob'])
self.dropout = lambda i: L.dropout(i, dropout_prob=prob, dropout_implementation="upscale_in_train",) if self.training else i

@add_docstring(ErnieModel.forward.__doc__)
def forward(self, *args, **kwargs):
"""
Args:
Expand Down Expand Up @@ -441,6 +448,7 @@ def __init__(self, cfg, name=None):
prob = cfg.get('classifier_dropout_prob', cfg['hidden_dropout_prob'])
self.dropout = lambda i: L.dropout(i, dropout_prob=prob, dropout_implementation="upscale_in_train",) if self.training else i

@add_docstring(ErnieModel.forward.__doc__)
def forward(self, *args, **kwargs):
"""
Args:
Expand All @@ -460,7 +468,7 @@ def forward(self, *args, **kwargs):

start_pos = kwargs.pop('start_pos', None)
end_pos = kwargs.pop('end_pos', None)
pooled, encoded, _ = super(ErnieModelForQuestionAnswering, self).forward(*args, **kwargs)
pooled, encoded = super(ErnieModelForQuestionAnswering, self).forward(*args, **kwargs)
encoded = self.dropout(encoded)
encoded = self.classifier(encoded)
start_logit, end_logits = L.unstack(encoded, axis=-1)
Expand Down Expand Up @@ -529,6 +537,7 @@ def __init__(self, cfg, name=None):
is_bias=True,
)

@add_docstring(ErnieModel.forward.__doc__)
def forward(self, *args, **kwargs):
"""
Args:
Expand All @@ -550,7 +559,7 @@ def forward(self, *args, **kwargs):
mlm_labels = kwargs.pop('labels')
mlm_pos = kwargs.pop('mlm_pos')
nsp_labels = kwargs.pop('nsp_labels')
pooled, encoded, _ = super(ErnieModelForPretraining, self).forward(*args, **kwargs)
pooled, encoded = super(ErnieModelForPretraining, self).forward(*args, **kwargs)
if len(mlm_labels.shape) == 1:
mlm_labels = L.reshape(mlm_labels, [-1, 1])
if len(nsp_labels.shape) == 1:
Expand Down
1 change: 1 addition & 0 deletions experimental/seq2seq/modeling_ernie_gen.py
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,7 @@ class ErnieModelForGeneration(ErnieModel):
resource_map = {
'ernie-gen-base-en': ErnieModel.bce + 'model-ernie-gen-base-en.1.tar.gz',
'ernie-gen-large-en': ErnieModel.bce + 'model-ernie-gen-large-en.1.tar.gz',
'ernie-gen-large-160g-en': ErnieModel.bce + 'model-ernie-gen-large-160g-en.1.tar.gz',
'ernie-1.0': ErnieModel.bce + 'model-ernie1.0.1.tar.gz',
}
def __init__(self, cfg, name=None):
Expand Down

0 comments on commit fd360a7

Please sign in to comment.