Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[HybridParallel]Support 1f1b for PipelineParallel #34483

Merged
merged 14 commits into from
Aug 2, 2021

Conversation

ForFishes
Copy link
Member

@ForFishes ForFishes commented Jul 29, 2021

PR types

New features

PR changes

Others

Describe

[HybridParallel]Support 1f1b for PipelineParallel

修改当前流水线并行的调度方式,采用更省显存的1f1b的调度方式,类似于Megatron的 https://github.com/NVIDIA/Megatron-LM/blob/main/megatron/schedules.py。
具体的调度图如下:
image

GPT-117M模型,V100-32G,PP=8, mircrobatch=2

global batch 优化前显存 优化后显存
128 OOM 5876
512 OOM 5882
1024 OOM 5886

@paddle-bot-old
Copy link

Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

@wangxicoding wangxicoding self-requested a review August 2, 2021 05:21
Copy link
Contributor

@wangxicoding wangxicoding left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

paddle.autograd.backward(
self.scaler.scale(self.caches['outputs'][cache_id]))
input_tensor_grad = self._backward_step(input_tensor, output_tensor,
output_tensor_grad)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

output_tensor和output_tensor_grad用完了,貌似可以先手工释放一下

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

应该不能手动设置为None,让它释放吧。host端提前释放,可能device还没开始计算。

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

没问题的,gpu kernel调度拿到了地址,运行时候不被覆盖及被别人覆盖就行,可以试试🌚

paddle.distributed.send(dtype, dst=1, group=group)

def send_meta(self, tensor, group):
if isinstance(tensor, paddle.Tensor):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

提个建议,在pipeline_parallel.py里也有一大堆isinstance(tensor, tuple)的逻辑,不如把单个的paddle.Tensor封装成tuple,统一走tuple的逻辑

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

确实!这个后面重写代码的时候,可以改的优美一些。

Copy link

@sandyhouse sandyhouse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ForFishes ForFishes merged commit 9e0bb91 into PaddlePaddle:develop Aug 2, 2021
@ForFishes ForFishes deleted the support_1f1b branch August 2, 2021 14:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants