Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AMP]split minimize and add unscale_ for GradScaler #35825

Merged
merged 11 commits into from
Sep 22, 2021

Conversation

zhangbo9674
Copy link
Contributor

@zhangbo9674 zhangbo9674 commented Sep 17, 2021

PR types

New features

PR changes

APIs

Describe

1、Split function GradScaler::minimize() to GradScaler::step() + GradScaler::update()

GradScaler::minimize():

    scaler = paddle.amp.GradScaler(init_loss_scaling=1024)

    with paddle.amp.auto_cast():
        output = model(data)
        loss = mse(output, label)

    scaled = scaler.scale(loss)
    scaled.backward()            
    scaler.minimize(optimizer, scaled)
    optimizer.clear_grad()

GradScaler::step() + GradScaler::update():

    scaler = paddle.amp.GradScaler(init_loss_scaling=1024)

    with paddle.amp.auto_cast():
        output = model(data)
        loss = mse(output, label)

    scaled = scaler.scale(loss)
    scaled.backward() 
    scaler.step(optimizer)
    scaler.update()
    optimizer.clear_grad()
  • minimize() and step()+update() are two methods of parameter gradient updating in amp. In paddle 2.0, we recommend using step()+update().

  • If optimizer belongs to paddle 1.0, only minimize() can be used.

2、Add GradScaler::unscale_(optimizer):

    scaler = paddle.amp.GradScaler(init_loss_scaling=1024)

    with paddle.amp.auto_cast():
        output = model(data)
        loss = mse(output, label)

    scaled = scaler.scale(loss)
    scaled.backward() 
    scaled.unscale_(optimizer)
    scaler.step(optimizer)
    scaler.update()
    optimizer.clear_grad()
  • This API is used to unscale the gradients of parameters, multiplies the gradients of parameters by 1/(loss scaling ratio).
  • If unscale_ is not called, minimize() or step() will call this api, else this call will not be repeated.

3、docs review:

GradScaler:
图片
step+update:
图片
unscale_:
图片
中文文档pr链接:
PaddlePaddle/docs#3897

@paddle-bot-old
Copy link

Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

@zhangbo9674 zhangbo9674 changed the title Dev/split minimize and unscale [AMP]split minimize and unscale Sep 17, 2021
@zhangbo9674 zhangbo9674 changed the title [AMP]split minimize and unscale [AMP]split minimize and add unscale_ for GradScaler Sep 22, 2021
Copy link
Contributor

@zhiqiu zhiqiu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@zhiqiu zhiqiu merged commit bf6f0e5 into PaddlePaddle:develop Sep 22, 2021
zhangbo9674 added a commit to zhangbo9674/Paddle that referenced this pull request Sep 22, 2021
* split minimize() to step() + update()

* add unscale and step for grad_scaler

* add unittest

* refine code in minimize

* delete step in loss_scaler

* fix example bug

* refine comment

* refine unittest

* add unittest
AnnaTrainingG pushed a commit to AnnaTrainingG/Paddle that referenced this pull request Sep 29, 2021
* split minimize() to step() + update()

* add unscale and step for grad_scaler

* add unittest

* refine code in minimize

* delete step in loss_scaler

* fix example bug

* refine comment

* refine unittest

* add unittest
@zhangbo9674 zhangbo9674 deleted the dev/split-minimize branch March 2, 2023 02:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants