Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mechanism that converts startup_program initializers to BF16 #32720

Merged
merged 3 commits into from
May 7, 2021

Conversation

wozna
Copy link
Contributor

@wozna wozna commented Apr 30, 2021

PR types

New features

PR changes

OPs

Describe

This PR adds a mechanism to BF16 training that converts initializers from startup_program to BF16.
The mechanism is added only to pure_bf16 mode.

The important thing is that if you want to change the initiators to bf16, you need to use the startup_program argument when defining the model in the minimize function.

@paddle-bot-old
Copy link

Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

@@ -232,7 +232,52 @@ def bf16_guard():
yield


def cast_model_to_bf16(program, amp_lists=None, use_bf16_guard=True):
def correct_post_op(post_ops, keep_fp32_ops):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name of this function doesn't align with what it does. Please rename it.

Comment on lines 223 to 224
if search_all:
idx = -1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add some comment explaining why you need to go through all ops instead of only looking after current op node.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I rather thought about inplace comment explaining why you set idx = -1 and in which situations you have to search in the way you do it right now. That would be useful note for future developers to understand the behavior.
Actually the comment you wrote is misleading since even you use search_all you are still going through ops set, but from the beginning. I thought about noting why you do it like that, just for future.

Comment on lines 158 to 160
res = amp.bf16.amp_utils.find_true_post_op(
block.ops, inititializer_op, "X", search_all=True)
assert (res == [op1])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can also add an assert that without search_all=True the result is an empty array.

@wozna wozna changed the title [DO NOT MERGE] Mechanism that converts startup_program initializers to BF16 Mechanism that converts startup_program initializers to BF16 May 5, 2021
@wozna wozna marked this pull request as ready for review May 6, 2021 13:02
Copy link
Contributor

@arlesniak arlesniak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@jczaja jczaja left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@arlesniak
Copy link
Contributor

@luotao1 Could you start your review, please?

@luotao1 luotao1 merged commit ce2bdb0 into PaddlePaddle:develop May 7, 2021
@luotao1
Copy link
Contributor

luotao1 commented May 7, 2021

@lidanqing-intel Does this PR cherry-pick to release/2.1?

lidanqing-intel pushed a commit to lidanqing-intel/Paddle that referenced this pull request May 7, 2021
…addle#32720)

* Add casting initializers for bf16 training

* Changes after review

* Correct test and add comment
lanxianghit pushed a commit that referenced this pull request May 7, 2021
…o BF16 (#32720) (#32764)

* Add casting initializers for bf16 training

* Changes after review

* Correct test and add comment

Co-authored-by: joanna.wozna.intel <joanna.wozna@intel.com>
@wozna wozna deleted the bf16_init_amp branch February 24, 2023 16:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants