Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

【hydra No.14】rossler #588

Closed
wants to merge 5 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
96 changes: 60 additions & 36 deletions docs/zh/examples/rossler.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,30 @@

<a href="https://aistudio.baidu.com/aistudio/projectdetail/6209280?sUid=455441&shared=1&ts=1684495132419" class="md-button md-button--primary" style>AI Studio快速体验</a>

=== "模型训练命令"

``` sh
# linux
wget https://paddle-org.bj.bcebos.com/paddlescience/datasets/transformer_physx/rossler_training.hdf5
wget https://paddle-org.bj.bcebos.com/paddlescience/datasets/transformer_physx/rossler_valid.hdf5
# windows
# curl https://paddle-org.bj.bcebos.com/paddlescience/datasets/transformer_physx/rossler_training.hdf5 --output rossler_training.hdf5
# curl https://paddle-org.bj.bcebos.com/paddlescience/datasets/transformer_physx/rossler_valid.hdf5 --output rossler_valid.hdf5
python train_enn.py
```

=== "模型评估命令"

``` sh
# linux
wget https://paddle-org.bj.bcebos.com/paddlescience/datasets/transformer_physx/rossler_training.hdf5
wget https://paddle-org.bj.bcebos.com/paddlescience/datasets/transformer_physx/rossler_valid.hdf5
# windows
# curl https://paddle-org.bj.bcebos.com/paddlescience/datasets/transformer_physx/rossler_training.hdf5 --output rossler_training.hdf5
# curl https://paddle-org.bj.bcebos.com/paddlescience/datasets/transformer_physx/rossler_valid.hdf5 --output rossler_valid.hdf5
python train_enn.py mode=eval EVAL.pretrained_model_path=https://paddle-org.bj.bcebos.com/paddlescience/models/rossler/rossler_pretrained.pdparams
```

## 1. 背景简介

Rossler System,最早由德国科学家 Rossler 提出,也是常见的混沌系统。该系统在混沌理论的研究中具有重要地位,为混沌现象提供了一种数学描述和理解方法。同时由于该系统对数值扰动极为敏感,因此也是是评估机器学习(深度学习)模型准确性的良好基准。
Expand Down Expand Up @@ -43,19 +67,19 @@ $$\omega = 1.0, \alpha = 0.165, \beta = 0.2, \gamma = 10$$

首先展示代码中定义的各个参数变量,每个参数的具体含义会在下面使用到时进行解释。

``` py linenums="44" title="examples/rossler/train_enn.py"
``` yaml linenums="22" title="examples/rossler/conf/enn.yaml"
--8<--
examples/rossler/train_enn.py:44:59
examples/rossler/conf/enn.yaml:22:34
--8<--
```

#### 3.2.1 约束构建

本案例基于数据驱动的方法求解问题,因此需要使用 PaddleScience 内置的 `SupervisedConstraint` 构建监督约束。在定义约束之前,需要首先指定监督约束中用于数据加载的各个参数,代码如下:

``` py linenums="64" title="examples/rossler/train_enn.py"
``` py linenums="57" title="examples/rossler/train_enn.py"
--8<--
examples/rossler/train_enn.py:64:81
examples/rossler/train_enn.py:55:74
--8<--
```

Expand All @@ -74,9 +98,9 @@ examples/rossler/train_enn.py:64:81

定义监督约束的代码如下:

``` py linenums="83" title="examples/rossler/train_enn.py"
``` py linenums="76" title="examples/rossler/train_enn.py"
--8<--
examples/rossler/train_enn.py:83:91
examples/rossler/train_enn.py:76:86
--8<--
```

Expand All @@ -99,37 +123,37 @@ examples/rossler/train_enn.py:83:91

用 PaddleScience 代码表示如下:

``` py linenums="96" title="examples/rossler/train_enn.py"
``` py linenums="89" title="examples/rossler/train_enn.py"
--8<--
examples/rossler/train_enn.py:96:100
examples/rossler/train_enn.py:93:99
--8<--
```

其中,`RosslerEmbedding` 的前两个参数在前文中已有描述,这里不再赘述,网络模型的第三、四个参数是训练数据集的均值和方差,用于归一化输入数据。计算均值、方差的的代码表示如下:

``` py linenums="29" title="examples/rossler/train_enn.py"
``` py linenums="32" title="examples/rossler/train_enn.py"
--8<--
examples/rossler/train_enn.py:29:40
examples/rossler/train_enn.py:32:43
--8<--
```

#### 3.2.3 学习率与优化器构建

本案例中使用的学习率方法为 `ExponentialDecay` ,学习率大小设置为0.001。优化器使用 `Adam`,梯度裁剪使用了 Paddle 内置的 `ClipGradByGlobalNorm` 方法。用 PaddleScience 代码表示如下

``` py linenums="102" title="examples/rossler/train_enn.py"
``` py linenums="101" title="examples/rossler/train_enn.py"
--8<--
examples/rossler/train_enn.py:102:116
examples/rossler/train_enn.py:101:110
--8<--
```

#### 3.2.4 评估器构建

本案例训练过程中会按照一定的训练轮数间隔,使用验证集评估当前模型的训练情况,需要使用 `SupervisedValidator` 构建评估器。代码如下:

``` py linenums="118" title="examples/rossler/train_enn.py"
``` py linenums="114" title="examples/rossler/train_enn.py"
--8<--
examples/rossler/train_enn.py:118:145
examples/rossler/train_enn.py:114:133
--8<--
```

Expand All @@ -139,39 +163,39 @@ examples/rossler/train_enn.py:118:145

完成上述设置之后,只需要将上述实例化的对象按顺序传递给 `ppsci.solver.Solver`,然后启动训练、评估。

``` py linenums="147" title="examples/rossler/train_enn.py"
``` py linenums="143" title="examples/rossler/train_enn.py"
--8<--
examples/rossler/train_enn.py:147:
examples/rossler/train_enn.py:143:157
--8<--
```

### 3.3 Transformer 模型

上文介绍了如何构建 Embedding 模型的训练、评估,在本节中将介绍如何使用训练好的 Embedding 模型训练 Transformer 模型。因为训练 Transformer 模型的步骤与训练 Embedding 模型的步骤基本相似,因此本节在两者的重复部分的各个参数不再详细介绍。首先将代码中定义的各个参数变量展示如下,每个参数的具体含义会在下面使用到时进行解释。

``` py linenums="54" title="examples/rossler/train_transformer.py"
``` yaml linenums="23" title="examples/rossler/conf/transformer.yaml"
--8<--
examples/rossler/train_transformer.py:54:76
examples/rossler/conf/transformer.yaml:23:33
--8<--
```

#### 3.3.1 约束构建

Transformer 模型同样基于数据驱动的方法求解问题,因此需要使用 PaddleScience 内置的 `SupervisedConstraint` 构建监督约束。在定义约束之前,需要首先指定监督约束中用于数据加载的各个参数,代码如下:

``` py linenums="84" title="examples/rossler/train_transformer.py"
``` py linenums="67" title="examples/rossler/train_transformer.py"
--8<--
examples/rossler/train_transformer.py:84:101
examples/rossler/train_transformer.py:65:82
--8<--
```

数据加载的各个参数与 Embedding 模型中的基本一致,不再赘述。需要说明的是由于 Transformer 模型训练的输入数据是 Embedding 模型 Encoder 模块的输出数据,因此我们将训练好的 Embedding 模型作为 `RosslerDataset` 的一个参数,在初始化时首先将训练数据映射到编码空间。

定义监督约束的代码如下:

``` py linenums="103" title="examples/rossler/train_transformer.py"
``` py linenums="84" title="examples/rossler/train_transformer.py"
--8<--
examples/rossler/train_transformer.py:103:108
examples/rossler/train_transformer.py:84:89
--8<--
```

Expand All @@ -186,9 +210,9 @@ examples/rossler/train_transformer.py:103:108

用 PaddleScience 代码表示如下:

``` py linenums="113" title="examples/rossler/train_transformer.py"
``` py linenums="95" title="examples/rossler/train_transformer.py"
--8<--
examples/rossler/train_transformer.py:113:121
examples/rossler/train_transformer.py:95:95
--8<--
```

Expand All @@ -198,19 +222,19 @@ examples/rossler/train_transformer.py:113:121

本案例中使用的学习率方法为 `CosineWarmRestarts`,学习率大小设置为0.001。优化器使用 `Adam`,梯度裁剪使用了 Paddle 内置的 `ClipGradByGlobalNorm` 方法。用 PaddleScience 代码表示如下:

``` py linenums="123" title="examples/rossler/train_transformer.py"
``` py linenums="97" title="examples/rossler/train_transformer.py"
--8<--
examples/rossler/train_transformer.py:123:137
examples/rossler/train_transformer.py:97:104
--8<--
```

#### 3.3.4 评估器构建

训练过程中会按照一定的训练轮数间隔,使用验证集评估当前模型的训练情况,需要使用 `SupervisedValidator` 构建评估器。用 PaddleScience 代码表示如下:

``` py linenums="139" title="examples/rossler/train_transformer.py"
``` py linenums="107" title="examples/rossler/train_transformer.py"
--8<--
examples/rossler/train_transformer.py:139:165
examples/rossler/train_transformer.py:107:124
--8<--
```

Expand All @@ -220,25 +244,25 @@ examples/rossler/train_transformer.py:139:165

在本文中首先定义了对 Transformer 模型输出数据变换到物理状态空间的代码:

``` py linenums="32" title="examples/rossler/train_transformer.py"
``` py linenums="34" title="examples/rossler/train_transformer.py"
--8<--
examples/rossler/train_transformer.py:32:50
examples/rossler/train_transformer.py:34:52
--8<--
```

``` py linenums="80" title="examples/rossler/train_transformer.py"
``` py linenums="63" title="examples/rossler/train_transformer.py"
--8<--
examples/rossler/train_transformer.py:80:81
examples/rossler/train_transformer.py:63:64
--8<--
```

可以看到,程序首先载入了训练好的 Embedding 模型,然后在 `OutputTransform` 的 `__call__` 函数内实现了编码向量到物理状态空间的变换。

在定义好了以上代码之后,就可以实现可视化器代码的构建了:

``` py linenums="167" title="examples/rossler/train_transformer.py"
``` py linenums="134" title="examples/rossler/train_transformer.py"
--8<--
examples/rossler/train_transformer.py:167:185
examples/rossler/train_transformer.py:134:152
--8<--
```

Expand All @@ -248,9 +272,9 @@ examples/rossler/train_transformer.py:167:185

完成上述设置之后,只需要将上述实例化的对象按顺序传递给 `ppsci.solver.Solver`,然后启动训练、评估。

``` py linenums="187" title="examples/rossler/train_transformer.py"
``` py linenums="154" title="examples/rossler/train_transformer.py"
--8<--
examples/rossler/train_transformer.py:187:
examples/rossler/train_transformer.py:154:172
--8<--
```

Expand Down
54 changes: 54 additions & 0 deletions examples/rossler/conf/enn.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
hydra:
run:
# dynamic output directory according to running time and override name
dir: outputs_rossler_enn/${now:%Y-%m-%d}/${now:%H-%M-%S}/${hydra.job.override_dirname}
job:
name: ${mode} # name of logfile
chdir: false # keep current working direcotry unchaned
config:
override_dirname:
exclude_keys:
- TRAIN.checkpoint_path
- TRAIN.pretrained_model_path
- EVAL.pretrained_model_path
- mode
- output_dir
- log_freq
sweep:
# output directory for multirun
dir: ${hydra.run.dir}
subdir: ./

# general settings
mode: train # running mode: train/eval
seed: 6
output_dir: ${hydra:run.dir}
TRAIN_BLOCK_SIZE: 16
VALID_BLOCK_SIZE: 32
TRAIN_FILE_PATH: ./datasets/rossler_training.hdf5
VALID_FILE_PATH: ./datasets/rossler_valid.hdf5

# model settings
MODEL:
input_keys: ["states"]
output_keys: ["pred_states", "recover_states"]

# training settings
TRAIN:
epochs: 300
batch_size:
train: 256
eval: 8
lr_scheduler:
epochs: ${TRAIN.epochs}
learning_rate: 0.001
gamma: 0.995
by_epoch: true
optimizer:
weight_decay: 1e-8
pretrained_model_path: null
checkpoint_path: null

# evaluation settings
EVAL:
pretrained_model_path: null
65 changes: 65 additions & 0 deletions examples/rossler/conf/transformer.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,65 @@
hydra:
run:
# dynamic output directory according to running time and override name
dir: outputs_rossler_transformer/${now:%Y-%m-%d}/${now:%H-%M-%S}/${hydra.job.override_dirname}
job:
name: ${mode} # name of logfile
chdir: false # keep current working direcotry unchaned
config:
override_dirname:
exclude_keys:
- TRAIN.checkpoint_path
- TRAIN.pretrained_model_path
- EVAL.pretrained_model_path
- mode
- output_dir
- log_freq
sweep:
# output directory for multirun
dir: ${hydra.run.dir}
subdir: ./

# general settings
mode: train # running mode: train/eval
seed: 42
output_dir: ${hydra:run.dir}
TRAIN_BLOCK_SIZE: 32
VALID_BLOCK_SIZE: 256
TRAIN_FILE_PATH: ./datasets/rossler_training.hdf5
VALID_FILE_PATH: ./datasets/rossler_valid.hdf5

# set working condition
EMBEDDING_MODEL_PATH: ./outputs_lorenz_enn/checkpoints/latest
VIS_DATA_NUMS: 16

# model settings
MODEL:
input_keys: ["embeds"]
output_keys: ["pred_embeds"]
num_layers: 4
num_ctx: 64
embed_size: 32
num_heads: 4

# training settings
TRAIN:
epochs: 200
batch_size:
train: 64
eval: 16
lr_scheduler:
epochs: ${TRAIN.epochs}
learning_rate: 0.001
T_0: 14
T_mult: 2
eta_min: 1.0e-9
optimizer:
weight_decay: 1.0e-8
eval_during_train: true
eval_freq: 50
pretrained_model_path: null
checkpoint_path: null

# evaluation settings
EVAL:
pretrained_model_path: null
Loading