diff --git a/.metas/ERNIE_milestone_chn.png b/.metas/ERNIE_milestone_chn.png deleted file mode 100644 index fc3844a6e00cd..0000000000000 Binary files a/.metas/ERNIE_milestone_chn.png and /dev/null differ diff --git a/.metas/ERNIE_milestone_en.png b/.metas/ERNIE_milestone_en.png new file mode 100644 index 0000000000000..160d3015f4fdb Binary files /dev/null and b/.metas/ERNIE_milestone_en.png differ diff --git a/.metas/ERNIE_milestone.png b/.metas/ERNIE_milestone_zh.png similarity index 100% rename from .metas/ERNIE_milestone.png rename to .metas/ERNIE_milestone_zh.png diff --git a/.metas/dygraph_show.gif b/.metas/dygraph_show.gif deleted file mode 100644 index 31aa75acda5e2..0000000000000 Binary files a/.metas/dygraph_show.gif and /dev/null differ diff --git a/.metas/ernie-head-banner.gif b/.metas/ernie-head-banner.gif deleted file mode 100644 index 4b3a645912756..0000000000000 Binary files a/.metas/ernie-head-banner.gif and /dev/null differ diff --git a/.metas/ernie.png b/.metas/ernie.png deleted file mode 100644 index 4a1ce443c7a46..0000000000000 Binary files a/.metas/ernie.png and /dev/null differ diff --git a/README.en.md b/README.en.md index 0a840d13e10fc..2d8ae732e4daf 100644 --- a/README.en.md +++ b/README.en.md @@ -1,6 +1,6 @@ English|[简体中文](./README.zh.md) -![./.metas/ERNIE_milestone.png](./.metas/ERNIE_milestone.png) +![./.metas/ERNIE_milestone.png](./.metas/ERNIE_milestone_en.png) **Remind: This repo has been refactored, for paper re-production or backward compatibility; plase checkout to [repro branch](https://github.com/PaddlePaddle/ERNIE/tree/repro)** @@ -89,23 +89,23 @@ pip install paddle-ernie or ```shell -git clone -b dygraph https://github.com/PaddlePaddle/ERNIE.git --single-branch +git clone https://github.com/PaddlePaddle/ERNIE.git --depth 1 cd ERNIE pip install -r requirements.txt -pip setup.py -e . - +pip install -e . ``` ##### 3. download pretrained models (optional) -| Model | Description | -| :------------------------------------------------- | :----------------------------------------------------------- | -| [ERNIE 1.0 Base for Chinese](https://ernie-github.cdn.bcebos.com/model-ernie1.0.1.tar.gz) | L12H768A12 | -| [ERNIE Tiny](https://ernie-github.cdn.bcebos.com/model-ernie_tiny.1.tar.gz) | L3H1024A16 | -| [ERNIE 2.0 Base for English](https://ernie-github.cdn.bcebos.com/model-ernie2.0-en.1.tar.gz) | L12H768A12 | -| [ERNIE 2.0 Large for English](https://ernie-github.cdn.bcebos.com/model-ernie2.0-large-en.1.tar.gz) | L24H1024A16 | -| [ERNIE Gen base for English](https://ernie-github.cdn.bcebos.com/model-ernie-gen-base-en.1.tar.gz) | L12H768A12 | -| [ERNIE Gen Large for English](https://ernie-github.cdn.bcebos.com/model-ernie-gen-large-en.1.tar.gz)| L24H1024A16 | +| Model | Description |abbreviation| +| :------------------------------------------------- | :----------------------------------------------------------- |:-----------| +| [ERNIE 1.0 Base for Chinese](https://ernie-github.cdn.bcebos.com/model-ernie1.0.1.tar.gz) | L12H768A12 |ernie-1.0| +| [ERNIE Tiny](https://ernie-github.cdn.bcebos.com/model-ernie_tiny.1.tar.gz) | L3H1024A16 |ernie-tiny| +| [ERNIE 2.0 Base for English](https://ernie-github.cdn.bcebos.com/model-ernie2.0-en.1.tar.gz) | L12H768A12 |ernie-2.0-en| +| [ERNIE 2.0 Large for English](https://ernie-github.cdn.bcebos.com/model-ernie2.0-large-en.1.tar.gz) | L24H1024A16 |ernie-2.0-large-en| +| [ERNIE Gen base for English](https://ernie-github.cdn.bcebos.com/model-ernie-gen-base-en.1.tar.gz) | L12H768A12 |ernie-gen-base-en| +| [ERNIE Gen Large for English](https://ernie-github.cdn.bcebos.com/model-ernie-gen-large-en.1.tar.gz)| L24H1024A16 | ernie-gen-large-en | +| [ERNIE Gen Large 160G for English](https://ernie-github.cdn.bcebos.com/model-ernie-gen-large-en.1.tar.gz)| Layer:24, Hidden:1024, Heads:16 + 160G pretrain corpus | ernie-gen-large-160g-en | ##### 4. download datasets @@ -143,26 +143,31 @@ see [demo](https://ernie-github.cdn.bcebos.com/data-mnli-m.tar.gz) data for MNLI - try eager execution with `dygraph model` : ```script -python3 ./demo/finetune_classifier_dygraph.py \ - --from_pretrained ernie_1.0 \ - --data_dir ./data/xnli +python3 ./ernie_d/demo/finetune_classifier_dygraph.py \ + --from_pretrained ernie-1.0 \ + --data_dir ./data/xnli ``` - Distributed finetune `paddle.distributed.launch` is a process manager, we use it to launch python processes on each avalible GPU devices: -when in distributed training, `max_steps` is used as stopping criteria rather than `epoch` to prevent dead block. -also notice than we shard the train data according to device id to prevent over fitting. +When in distributed training, `max_steps` is used as stopping criteria rather than `epoch` to prevent dead block. +You could calculate `max_steps` with `EPOCH * NUM_TRAIN_EXAMPLES / TOTAL_BATCH`. +Also notice than we shard the train data according to device id to prevent over fitting. demo: +(make sure you have more than 2 GPUs, +online model download can not work in `paddle.distributed.launch`, +you need to run single card finetuning first to get pretrained model, or donwload and extract one manualy from [here](#section-pretrained-models)): + ```script python3 -m paddle.distributed.launch \ ./demo/finetune_classifier_dygraph_distributed.py \ --data_dir data/mnli \ --max_steps 10000 \ - --from_pretrained ernie2.0-en + --from_pretrained ernie-2.0-en ``` diff --git a/README.zh.md b/README.zh.md index 55d8f2008e18a..866bc436978d9 100644 --- a/README.zh.md +++ b/README.zh.md @@ -1,6 +1,6 @@ [English](./README.en.md)|简体中文 -![./.metas/ERNIE_milestone.png](./.metas/ERNIE_milestone.png) +![./.metas/ERNIE_milestone.png](./.metas/ERNIE_milestone_zh.png) ERNIE是百度开创性提出的基于知识增强的持续学习语义理解框架,该框架将大数据预训练与多源丰富知识相结合,通过持续学习技术,不断吸收海量文本数据中词汇、结构、语义等方面的知识,实现模型效果不断进化。ERNIE在情感分析、文本匹配、自然语言推理、词法分析、阅读理解、智能问答等16个公开数据集上全面显著超越世界领先技术,在国际权威的通用语言理解评估基准GLUE上,得分首次突破90分,获得全球第一。在今年3月落下帷幕的全球最大语义评测SemEval 2020上,ERNIE摘得5项世界冠军, 该技术也被全球顶级科技商业杂志《麻省理工科技评论》官方网站报道,相关创新成果也被国际顶级学术会议AAAI、IJCAI收录。ERNIE在工业界得到了大规模应用,如搜索引擎、新闻推荐、广告系统、语音交互、智能客服等。 @@ -87,24 +87,25 @@ pip install paddle-ernie 或者 ```shell -git clone -b dygraph https://github.com/PaddlePaddle/ERNIE.git --single-branch +git clone https://github.com/PaddlePaddle/ERNIE.git --depth 1 cd ERNIE pip install -r requirements.txt -pip setup.py -e . +pip install -e . ``` -##### 3. 下载预训练模型(可选) +##### 3. 下载预训练模型(可选) -| Model | Description | -| :------------------------------------------------- | :----------------------------------------------------------- | -| [ERNIE 1.0 Base 中文](https://ernie-github.cdn.bcebos.com/model-ernie1.0.1.tar.gz) | L12H768A12 | -| [ERNIE Tiny](https://ernie-github.cdn.bcebos.com/model-ernie_tiny.1.tar.gz) | L3H1024A16 | -| [ERNIE 2.0 Base 英文](https://ernie-github.cdn.bcebos.com/model-ernie2.0-en.1.tar.gz) | base: L12H768A12 | -| [ERNIE 2.0 Large 英文](https://ernie-github.cdn.bcebos.com/model-ernie2.0-large-en.1.tar.gz) | large: L24H1024A16| -| [ERNIE Gen base 英文](https://ernie-github.cdn.bcebos.com/model-ernie-gen-base-en.1.tar.gz) | L12H768A12 | -| [ERNIE Gen Large 英文](https://ernie-github.cdn.bcebos.com/model-ernie-gen-large-en.1.tar.gz)| L24H1024A16 | +| Model | 细节参数 |下载简写| +| :------------------------------------------------- |:------------------------------------------------------------------------- |:-------| +| [ERNIE 1.0 Base 中文](https://ernie-github.cdn.bcebos.com/model-ernie1.0.1.tar.gz) | Layer:12, Hidden:768, Heads:12 |ernie-1.0| +| [ERNIE Tiny](https://ernie-github.cdn.bcebos.com/model-ernie_tiny.1.tar.gz) | Layer:3, Hdden:1024, Heads:16 |ernie-tiny| +| [ERNIE 2.0 Base 英文](https://ernie-github.cdn.bcebos.com/model-ernie2.0-en.1.tar.gz) | Layer:12, Hidden:768, Heads:12 |ernie-2.0-en| +| [ERNIE 2.0 Large 英文](https://ernie-github.cdn.bcebos.com/model-ernie2.0-large-en.1.tar.gz) | Layer:24, Hidden:1024, Heads16 |ernie-2.0-large-en| +| [ERNIE Gen Base 英文](https://ernie-github.cdn.bcebos.com/model-ernie-gen-base-en.1.tar.gz) | Layer:12, Hidden:768, Heads:12 |ernie-gen-base-en| +| [ERNIE Gen Large 英文](https://ernie-github.cdn.bcebos.com/model-ernie-gen-large-en.1.tar.gz)| Layer:24, Hidden:1024, Heads:16 |ernie-gen-large-en| +| [ERNIE Gen Large 160G英文](https://ernie-github.cdn.bcebos.com/model-ernie-gen-large-en.1.tar.gz)| Layer:24, Hidden:1024, Heads:16 + 额外160G 预训练语料 | ernie-gen-large-160g-en | ##### 4. 下载数据集 @@ -144,9 +145,9 @@ data/xnli - 使用 `动态图` 模型进行finetune: ```script -python3 ./demo/finetune_classifier_dygraph.py \ - --from_pretrained ernie_1.0 \ - --data_dir ./data/xnli +python3 ./ernie_d/demo/finetune_classifier_dygraph.py \ + --from_pretrained ernie-1.0 \ + --data_dir ./data/xnli ``` - 分布式 finetune @@ -154,9 +155,11 @@ python3 ./demo/finetune_classifier_dygraph.py \ `paddle.distributed.launch` 是一个进程管理器,我们采用它在每一张GPU上启动一个python进程,并配置相应的环境变量以进行分布式训练: 当采用分布式训练时,我们采用`max_steps`做为终止条件而非`epoch`, 这样处理是为了避免进程间死锁。 +你可以通过`EPOCH * NUM_TRAIN_EXAMPLES / TOTAL_BATCH`的方式计算出所需执行的`max_steps`. 另外值得注意的是训练集需要在不同的进程间进行切分;以避免所有进程训练同一份数据造成的过拟合。 -示例脚本(请确保你有两张以上GPU卡): +示例脚本(请确保你有两张以上GPU卡, 在线模型下载功能在`paddle.distributed.launch`下无法工作, +你可能需要一个先通过单卡finetune方式下载预训练模型,或者根据[这里](#section-pretrained-models)手动下载并解压预训练模型): ```script python3 -m paddle.distributed.launch \ @@ -227,7 +230,7 @@ sids = np.expand_dims(sids, 0) result = client(ids, sids) ``` -你也可从[此处]((https://ernie.bj.bcebos.com/ernie1.0_zh_inference_model.tar.gz).)下载一个预先制作好的ernie-1.0 base模型的 `inference_model`. +你也可从[此处]((https://ernie.bj.bcebos.com/ernie1.0_zh_inference_model.tar.gz.)下载一个预先制作好的ernie-1.0 base模型的 `inference_model`. 该模型没有经过finetune,一般可以用做上层模型结构的 feature-base finetune或者做为一个文本特征抽取器。 因为该模行由老版API 产出,在进行客户端请求时需要在输入tensor后面追加一个维度: diff --git a/demo/finetune_classifier_dygraph.py b/demo/finetune_classifier_dygraph.py index 41d357f586fff..995d2a8860fee 100644 --- a/demo/finetune_classifier_dygraph.py +++ b/demo/finetune_classifier_dygraph.py @@ -51,11 +51,13 @@ parser.add_argument('--bsz', type=int, default=32, help='batchsize') parser.add_argument('--epoch', type=int, default=3, help='epoch') parser.add_argument('--data_dir', type=str, required=True, help='data directory includes train / develop data') - parser.add_argument('--max_steps', type=int, required=True, help='max_train_steps, set this to EPOCH * NUM_SAMPLES / BATCH_SIZE') - parser.add_argument('--warmup_proportion', type=float, default=0.1) + parser.add_argument('--use_lr_decay', action='store_true', help='if set, learning rate will decay to zero at `max_steps`') + parser.add_argument('--warmup_proportion', type=float, default=0.1, help='if use_lr_decay is set, ' + 'learning rate will raise to `lr` at `warmup_proportion` * `max_steps` and decay to 0. at `max_steps`') parser.add_argument('--lr', type=float, default=5e-5, help='learning rate') parser.add_argument('--inference_model_dir', type=str, default=None, help='inference model output directory') parser.add_argument('--save_dir', type=str, default=None, help='model output directory') + parser.add_argument('--max_steps', type=int, default=None, help='max_train_steps, set this to EPOCH * NUM_SAMPLES / BATCH_SIZE') parser.add_argument('--wd', type=float, default=0.01, help='weight decay, aka L2 regularizer') @@ -102,7 +104,11 @@ def map_fn(seg_a, seg_b, label): with FD.guard(place): model = ErnieModelForSequenceClassification.from_pretrained(args.from_pretrained, num_labels=3, name='') - opt = AdamW(learning_rate=LinearDecay(args.lr, int(args.warmup_proportion * args.max_steps), args.max_steps), parameter_list=model.parameters(), weight_decay=args.wd) + if args.use_lr_decay: + opt = AdamW(learning_rate=LinearDecay(args.lr, int(args.warmup_proportion * args.max_steps), args.max_steps), parameter_list=model.parameters(), weight_decay=args.wd) + else: + opt = AdamW(args.lr, parameter_list=model.parameters(), weight_decay=args.wd) + g_clip = F.dygraph_grad_clip.GradClipByGlobalNorm(1.0) #experimental for epoch in range(args.epoch): for step, d in enumerate(tqdm(train_ds.start(place), desc='training')): @@ -117,7 +123,7 @@ def map_fn(seg_a, seg_b, label): acc = [] with FD.base._switch_tracer_mode_guard_(is_train=False): model.eval() - for step, d in enumerate(tqdm(dev_ds.start(), desc='evaluating %d' % epoch)): + for step, d in enumerate(tqdm(dev_ds.start(place), desc='evaluating %d' % epoch)): ids, sids, label = d loss, logits = model(ids, sids, labels=label) #print('\n'.join(map(str, logits.numpy().tolist()))) diff --git a/demo/finetune_mrc_dygraph.py b/demo/finetune_mrc_dygraph.py index 445e87df0b8a3..84e00247e8a16 100644 --- a/demo/finetune_mrc_dygraph.py +++ b/demo/finetune_mrc_dygraph.py @@ -44,8 +44,8 @@ from ernie.tokenizing_ernie import ErnieTokenizer, ErnieTinyTokenizer from ernie.optimization import AdamW, LinearDecay -from ernie.mrc import mrc_reader -from ernie.mrc import mrc_metrics +from demo.mrc import mrc_reader +from demo.mrc import mrc_metrics log.setLevel(logging.DEBUG) logging.getLogger().addHandler(log.handlers[0]) diff --git a/ernie/file_utils.py b/ernie/file_utils.py index 7d4dd5b2f518c..1d4fd904d45c6 100644 --- a/ernie/file_utils.py +++ b/ernie/file_utils.py @@ -41,3 +41,10 @@ def _fetch_from_remote(url, force_download=False): log.debug('%s cached in %s' % (url, cached_dir)) return cached_dir + +def add_docstring(doc): + def func(f): + f.__doc__ += ('\n======other docs from supper class ======\n%s' % doc) + return f + return func + diff --git a/ernie/modeling_ernie.py b/ernie/modeling_ernie.py index 9df7fb5228c74..25fe2fe5ef15d 100644 --- a/ernie/modeling_ernie.py +++ b/ernie/modeling_ernie.py @@ -29,7 +29,7 @@ import paddle.fluid as F import paddle.fluid.layers as L -from ernie.file_utils import _fetch_from_remote +from ernie.file_utils import _fetch_from_remote, add_docstring log = logging.getLogger(__name__) @@ -288,6 +288,11 @@ def forward(self, src_ids, sent_ids=None, pos_ids=None, input_mask=None, attn_bi Mask to avoid performing attention on the padding token indices of the encoder input. attn_bias(optional, `Variable` of shape `[batch_size, seq_len, seq_len] or False`): 3D version of `input_mask`, if set, overrides `input_mask`; if set not False, will not apply attention mask + past_cache(optional, tuple of two lists: cached key and cached value, + each is a list of `Variable`s of shape `[batch_size, seq_len, hidden_size]`): + cached key/value tensor that will be concated to generated key/value when performing self attention. + if set, `attn_bias` should not be None. + Returns: pooled (`Variable` of shape `[batch_size, hidden_size]`): output logits of pooler classifier @@ -360,6 +365,7 @@ def __init__(self, cfg, name=None): prob = cfg.get('classifier_dropout_prob', cfg['hidden_dropout_prob']) self.dropout = lambda i: L.dropout(i, dropout_prob=prob, dropout_implementation="upscale_in_train",) if self.training else i + @add_docstring(ErnieModel.forward.__doc__) def forward(self, *args, **kwargs): """ Args: @@ -400,6 +406,7 @@ def __init__(self, cfg, name=None): prob = cfg.get('classifier_dropout_prob', cfg['hidden_dropout_prob']) self.dropout = lambda i: L.dropout(i, dropout_prob=prob, dropout_implementation="upscale_in_train",) if self.training else i + @add_docstring(ErnieModel.forward.__doc__) def forward(self, *args, **kwargs): """ Args: @@ -441,6 +448,7 @@ def __init__(self, cfg, name=None): prob = cfg.get('classifier_dropout_prob', cfg['hidden_dropout_prob']) self.dropout = lambda i: L.dropout(i, dropout_prob=prob, dropout_implementation="upscale_in_train",) if self.training else i + @add_docstring(ErnieModel.forward.__doc__) def forward(self, *args, **kwargs): """ Args: @@ -460,7 +468,7 @@ def forward(self, *args, **kwargs): start_pos = kwargs.pop('start_pos', None) end_pos = kwargs.pop('end_pos', None) - pooled, encoded, _ = super(ErnieModelForQuestionAnswering, self).forward(*args, **kwargs) + pooled, encoded = super(ErnieModelForQuestionAnswering, self).forward(*args, **kwargs) encoded = self.dropout(encoded) encoded = self.classifier(encoded) start_logit, end_logits = L.unstack(encoded, axis=-1) @@ -529,6 +537,7 @@ def __init__(self, cfg, name=None): is_bias=True, ) + @add_docstring(ErnieModel.forward.__doc__) def forward(self, *args, **kwargs): """ Args: @@ -550,7 +559,7 @@ def forward(self, *args, **kwargs): mlm_labels = kwargs.pop('labels') mlm_pos = kwargs.pop('mlm_pos') nsp_labels = kwargs.pop('nsp_labels') - pooled, encoded, _ = super(ErnieModelForPretraining, self).forward(*args, **kwargs) + pooled, encoded = super(ErnieModelForPretraining, self).forward(*args, **kwargs) if len(mlm_labels.shape) == 1: mlm_labels = L.reshape(mlm_labels, [-1, 1]) if len(nsp_labels.shape) == 1: diff --git a/experimental/seq2seq/modeling_ernie_gen.py b/experimental/seq2seq/modeling_ernie_gen.py index cee4f1c557306..06c2a6d57a1d2 100644 --- a/experimental/seq2seq/modeling_ernie_gen.py +++ b/experimental/seq2seq/modeling_ernie_gen.py @@ -32,6 +32,7 @@ class ErnieModelForGeneration(ErnieModel): resource_map = { 'ernie-gen-base-en': ErnieModel.bce + 'model-ernie-gen-base-en.1.tar.gz', 'ernie-gen-large-en': ErnieModel.bce + 'model-ernie-gen-large-en.1.tar.gz', + 'ernie-gen-large-160g-en': ErnieModel.bce + 'model-ernie-gen-large-160g-en.1.tar.gz', 'ernie-1.0': ErnieModel.bce + 'model-ernie1.0.1.tar.gz', } def __init__(self, cfg, name=None):