Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AutoParallel] Recompute Pass #38920

Merged
merged 5 commits into from
Jan 18, 2022

Conversation

zhaoyinglia
Copy link
Contributor

@zhaoyinglia zhaoyinglia commented Jan 13, 2022

PR types

New features

PR changes

Others

Describe

  • add AutoParallel recompute pass.
  • this recompute pass modify the complete program (including forward, backward and update), which is different with fleet.meta_optimizers.recompute_optimizer
  • the performance is same as fleet.meta_optimizers.recompute_optimizer

@paddle-bot-old
Copy link

Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

@JZ-LIANG JZ-LIANG self-requested a review January 13, 2022 07:36
def init(self):
if paddle.is_compiled_with_cuda():
paddle.set_flags({'FLAGS_cudnn_deterministic': 1})
self.rtol = 1e-5
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

重计算能做到精度逐位对齐吧?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

基本可以做到逐位对齐,但会偶尔有几个step第6位开始对不齐,误差在1e-6

optimizer = paddle.fluid.optimizer.AdamOptimizer(
learning_rate=0.00001,
beta1=0.9,
beta2=0.999,
epsilon=1e-08,
grad_clip=clip)
grad_clip=None)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

为什么不支持 clip

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已支持。

JZ-LIANG
JZ-LIANG previously approved these changes Jan 17, 2022
Copy link
Contributor

@JZ-LIANG JZ-LIANG left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

aoyulong
aoyulong previously approved these changes Jan 17, 2022
@@ -190,7 +193,7 @@ def _get_dist_program(self, rank, dist_context=None, relaunch_phase=False):
# serial forward pass
self._apply_pre_optimization_passed(completed_main_program,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rename _apply_pre_optimization_passed to _apply_pre_optimization_passes and _apply_post_optimization_passed to _apply_post_optimization_passes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

@@ -26,6 +26,9 @@
from .dist_attribute import OperatorDistributedAttribute, TensorDistributedAttribute
from .process_group import new_process_group, ProcessGroup, _g_process_group_map

# NOTE: If op in SPECIAL_OPS, it will not be resharded.
SPECIAL_OPS = ['check_finite_and_unscale', 'update_loss_scaling']
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The global variable should use _g_xxxx. Please rename SPECIAL_OPS to _g_special_ops.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

self._ops = ops
self.var_op_deps = {}

def build_stats(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

build_stats is build_state?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function inherits from ProgramStats of backward.py.


return segments

def modify_forward_desc_for_recompute(self, dist_context):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add more comments. For example, what is the purpose of modify_forward_desc_for_recompute?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

@zhaoyinglia zhaoyinglia dismissed stale reviews from aoyulong and JZ-LIANG via 9f719b8 January 17, 2022 13:21
Copy link
Collaborator

@sneaxiy sneaxiy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@JZ-LIANG JZ-LIANG merged commit 3084573 into PaddlePaddle:develop Jan 18, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants