Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AMP] Unify the static amp codes of fp16 and bf16. #52694

Merged
merged 26 commits into from
Apr 14, 2023

Conversation

Xreki
Copy link
Contributor

@Xreki Xreki commented Apr 9, 2023

PR types

Others

PR changes

APIs

Describe

  • 统一静态图AMP FP16、BF16训练代码和接口。因python原生不支持函数重载,故暂时先新增一个内部接口amp_decorate
  • 合并[AMP] Add operators stats collection tools for static program. #52488 ,通过分析Program中的算子、输入输出数据类型,统计Program中算子FP16、BF16、FP32、Other的调用次数。

@paddle-bot
Copy link

paddle-bot bot commented Apr 9, 2023

你的PR提交成功,感谢你对开源项目的贡献!
请关注后续CI自动化测试结果,详情请参考Paddle-CI手册
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

@paddle-bot
Copy link

paddle-bot bot commented Apr 9, 2023

❌ The PR is not created using PR's template. You can refer to this Demo.
Please use PR's template, it helps save our maintainers' time so that more developers get helped.

Xreki added a commit to Xreki/Paddle that referenced this pull request Apr 9, 2023
@Xreki Xreki changed the title Unify the static amp codes of fp16 and bf16. [AMP] Unify the static amp codes of fp16 and bf16. Apr 10, 2023
)
self._amp_dtype = core.VarDesc.VarType.BF16
else:
self._amp_dtype = core.VarDesc.VarType.FP16
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个应该是未对用户暴露的接口,可以把原来的参数列表调整下,用dtype和level?

# op_stats_list = amp.debugging._get_op_stats_list(main_program)

# op_stats_dict = op_stats_list[0]
# expected_bf16_calls = {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

使用AMP后,Program的算子类型在不同环境下,存在一些异常,导致CI出现check不通过。本PR暂时移除该check,后续PR中打开并修复。

Xreki added a commit that referenced this pull request Apr 12, 2023
@Xreki
Copy link
Contributor Author

Xreki commented Apr 14, 2023

关于覆盖率,当前PR存在如下代码没覆盖到:

  • decorator.py中

    • 代码非本PR新增,为支持test_program自动AMP转换,后续支持推理将添加单测
      image

    • amp_base_models.py中统一指定了黑白名单,后续将修改、添加使用默认黑白名单的单测,并添加非法参数的单测
      image

  • fp16_utils.py,因CI机器不是Ampere架构,不支持bfloat16,因此单测不易构造。后续PR会补充和完善。
    image

@Xreki Xreki requested a review from XieYunshen April 14, 2023 03:11
Copy link
Contributor

@XieYunshen XieYunshen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM for
set_tests_properties(test_model_cast_to_bf16 PROPERTIES TIMEOUT 300)

@Xreki Xreki requested a review from ZzSean April 14, 2023 03:16
Copy link
Contributor

@ZzSean ZzSean left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM for skipIf

@Xreki Xreki merged commit dfcba7f into PaddlePaddle:develop Apr 14, 2023
@Xreki Xreki deleted the amp/static_bf16_support branch April 14, 2023 03:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants