[Dygraph]Integration sharding stage2 function #38151

Baibaifan · 2021-12-15T03:43:24Z

PR types

Performance optimization

PR changes

Others

Describe

Integration sharding stage2 function
1.Support group = None
2.Support param_groups for optimizer

import paddle
from paddle.distributed.fleet.meta_optimizers.dygraph_optimizer.sharding_optimizer_stage2 import ShardingOptimizerStage2
from paddle.distributed.fleet.meta_parallel.sharding.sharding_stage2 import ShardingStage2

fleet.init(is_collective=True)
group = paddle.distributed.new_group([0, 1])

# wrap model & optimizer 
model = model_class(...)
oss_optimizer = ShardingOptimizer(params=model.parameters(), optim=optimizer, group=group)
model = ShardingStage2(model, oss_optimizer, group=group)

# use optimizer as normal
img, label = data
label.stop_gradient = True
img.stop_gradient = True
out = model(img)

loss = paddle.nn.functional.cross_entropy(input=out, label=label)
oss_optimizer.step()
oss_optimizer.clear_grad()

paddle-bot-old · 2021-12-15T03:43:34Z

Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

ForFishes · 2021-12-17T11:22:10Z

python/paddle/distributed/fleet/meta_optimizers/dygraph_optimizer/sharding_optimizer_stage2.py

        self._rank_buffer_size = {}  # {dtype: {rank: numel+alignment}}
        self._param2align = {}  # {param.name: align}

        # Default information
        self._optim_defaults = kw
        self._optim = optim
+        self._ori_parameter_list = copy.deepcopy(self._optim._parameter_list)
+        self._ori_param_groups = copy.deepcopy(self._optim._param_groups)


deepcopy increase memory..

已修复，改成引用传递。

ForFishes · 2021-12-17T11:23:03Z

python/paddle/distributed/fleet/meta_optimizers/dygraph_optimizer/sharding_optimizer_stage2.py

@@ -94,7 +94,7 @@ def __init__(self,
                filter(lambda x: x.trainable and x.dtype == Type.fp16.value,
                       self._local_params))) > 0

-        assert group is not None, "Distributed communication group is must be gived"
+        assert group is not None, "Distributed communication group is must be given"


need support global group if group=None

ForFishes

LGTM

ForFishes

LGTM

Baibaifan force-pushed the integration_stage2_function branch 3 times, most recently from b1bf3cc to 7d5ad2e Compare December 15, 2021 06:35

Integration sharding stage2 function

576a132

Baibaifan force-pushed the integration_stage2_function branch from 7d5ad2e to 576a132 Compare December 15, 2021 09:42

Baibaifan force-pushed the integration_stage2_function branch from 4db80ec to c1bf4fc Compare December 16, 2021 06:25

Baibaifan force-pushed the integration_stage2_function branch 3 times, most recently from 04d6d9f to 5d6cc91 Compare December 17, 2021 06:14

Baibaifan force-pushed the integration_stage2_function branch from 5d6cc91 to 7b26ec9 Compare December 17, 2021 08:44

Baibaifan force-pushed the integration_stage2_function branch from 7b26ec9 to cf9b633 Compare December 17, 2021 11:27

ForFishes reviewed Dec 17, 2021

View reviewed changes

Baibaifan force-pushed the integration_stage2_function branch 3 times, most recently from 2715035 to 0f53247 Compare December 17, 2021 13:02

Baibaifan changed the title ~~Integration sharding stage2 function~~ [Dygraph]Integration sharding stage2 function Dec 18, 2021

ForFishes previously approved these changes Dec 18, 2021

View reviewed changes

Merge branch 'develop' into integration_stage2_function

bfee70c

Baibaifan dismissed ForFishes’s stale review via bfee70c December 18, 2021 17:33

Baibaifan force-pushed the integration_stage2_function branch from 0f53247 to bfee70c Compare December 18, 2021 17:33

ForFishes approved these changes Dec 19, 2021

View reviewed changes

Baibaifan merged commit 327e505 into PaddlePaddle:develop Dec 19, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Dygraph]Integration sharding stage2 function #38151

[Dygraph]Integration sharding stage2 function #38151

Baibaifan commented Dec 15, 2021 •

edited

Loading

paddle-bot-old bot commented Dec 15, 2021

ForFishes Dec 17, 2021

Baibaifan Dec 17, 2021

ForFishes Dec 17, 2021

Baibaifan Dec 17, 2021

ForFishes left a comment

ForFishes left a comment

[Dygraph]Integration sharding stage2 function #38151

[Dygraph]Integration sharding stage2 function #38151

Conversation

Baibaifan commented Dec 15, 2021 • edited Loading

PR types

PR changes

Describe

paddle-bot-old bot commented Dec 15, 2021

ForFishes Dec 17, 2021

Choose a reason for hiding this comment

Baibaifan Dec 17, 2021

Choose a reason for hiding this comment

ForFishes Dec 17, 2021

Choose a reason for hiding this comment

Baibaifan Dec 17, 2021

Choose a reason for hiding this comment

ForFishes left a comment

Choose a reason for hiding this comment

ForFishes left a comment

Choose a reason for hiding this comment

Baibaifan commented Dec 15, 2021 •

edited

Loading