Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Auto Parallel] Improve the fine-grained APIs #46552

Merged
merged 18 commits into from
Oct 12, 2022

Conversation

aoyulong
Copy link
Contributor

@aoyulong aoyulong commented Sep 27, 2022

PR types

Others

PR changes

APIs

Describe

In the pr, we proposed a fine-grained APIs to satisfy users who want to control the execution logic. This pr is used to further improve these fine-grained APIs as following:

  • Improved Distributed loader
    • Add distributed dataloader (recommended) and dataloader_from_generator (for legacy support) methods.
    • Align the interface to the serial dataloader as close as possible.
  • Improved Fine-grained APIs: dataloader + prepare + run
    import paddle
    import paddle.vision.transforms as T
    import paddle.distributed.auto_parallel as auto
    from paddle.vision.datasets import MNIST
    
    transform = T.Compose([
       T.Transpose(),
       T.Normalize([127.5], [127.5])
    ])
    train_dataset = MNIST(mode='train', transform=transform)
    valid_dataset = MNIST(mode='test', transform=transform)
    
    model = paddle.vision.models.LeNet()
    loss = paddle.nn.CrossEntropyLoss() 
    optimizer = paddle.optimizer.Adam(
       learning_rate=0.001, parameters=model.parameters())
    metrics = paddle.metric.Accuracy(topk=(1, 2))
    
    engine = auto.Engine(model, loss, optimizer, metrics)
    
    # Step 1: build distributed dataloader
    dataloader = engine.dataloader(train_dataset,
                                  epochs=2, 
                                  batch_size=64, 
                                  mode="train")
    
    # Step 2: build the distributed program
    engine.prepare(mode="train")
    
    feed_dict = ...
    fetch_list = ...
    # Step 3: run the distributed program by using the distributed dataloader
     for data in train_dataloader:
         outs = engine.run(data, feed=feed_dict, fetch_list=fetch_list, mode="train")
    • Add the prepare API to explicitly control the partition.
    • Replace __call__ with run to conform with the serial executor interface.

@paddle-bot
Copy link

paddle-bot bot commented Sep 27, 2022

你的PR提交成功,感谢你对开源项目的贡献!
请关注后续CI自动化测试结果,详情请参考Paddle-CI手册
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

@aoyulong aoyulong changed the title [Auto Parallel] Improve the distributed loader [Auto Parallel] Improve the fine-grained APIs Oct 12, 2022
Copy link
Contributor

@JZ-LIANG JZ-LIANG left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@JZ-LIANG JZ-LIANG merged commit 686fa07 into PaddlePaddle:develop Oct 12, 2022
zhaoyinglia pushed a commit to zhaoyinglia/Paddle that referenced this pull request Oct 19, 2022
* [Auto Parallel] Suppport different dataloaders

* [Auto Parallel] Add num_shards config for dataset

* [Auto Parallel] Unify the logger and outputs of Engine API

* [Auto Parallel] Fix the bugs of to_static

* [Auto Parallel] Adjust the test_to_static.py

* [Auto Parallel] Add the prepare API and replace __call__ with run

* [Auto Parallel] Improve the private implementations of Engine

* [Auto Parallel] Set capacity of dataloader for opt tuning

* [Auto Parallel] [WIP] Change the fine-grained API

* [Auto Parallel] Improve APIs to support different user cases

* [Auto Parallel] Add removed config

* [Auto Parallel] Add imports

* [Auto Parallel] Fix bugs for to_static

* [Auto Parallel] Remove unnecessary imports
zhaoyinglia pushed a commit to zhaoyinglia/Paddle that referenced this pull request Oct 19, 2022
* [Auto Parallel] Suppport different dataloaders

* [Auto Parallel] Add num_shards config for dataset

* [Auto Parallel] Unify the logger and outputs of Engine API

* [Auto Parallel] Fix the bugs of to_static

* [Auto Parallel] Adjust the test_to_static.py

* [Auto Parallel] Add the prepare API and replace __call__ with run

* [Auto Parallel] Improve the private implementations of Engine

* [Auto Parallel] Set capacity of dataloader for opt tuning

* [Auto Parallel] [WIP] Change the fine-grained API

* [Auto Parallel] Improve APIs to support different user cases

* [Auto Parallel] Add removed config

* [Auto Parallel] Add imports

* [Auto Parallel] Fix bugs for to_static

* [Auto Parallel] Remove unnecessary imports
zhaoyinglia pushed a commit to zhaoyinglia/Paddle that referenced this pull request Oct 19, 2022
* [Auto Parallel] Suppport different dataloaders

* [Auto Parallel] Add num_shards config for dataset

* [Auto Parallel] Unify the logger and outputs of Engine API

* [Auto Parallel] Fix the bugs of to_static

* [Auto Parallel] Adjust the test_to_static.py

* [Auto Parallel] Add the prepare API and replace __call__ with run

* [Auto Parallel] Improve the private implementations of Engine

* [Auto Parallel] Set capacity of dataloader for opt tuning

* [Auto Parallel] [WIP] Change the fine-grained API

* [Auto Parallel] Improve APIs to support different user cases

* [Auto Parallel] Add removed config

* [Auto Parallel] Add imports

* [Auto Parallel] Fix bugs for to_static

* [Auto Parallel] Remove unnecessary imports
XiaoguangHu01 pushed a commit that referenced this pull request Oct 19, 2022
…47145)

* [Auto Parallel] Make Engine class callable (#46416)

* [Auto Parallel] Imporve the user-defined fetches and logging

* [Auto Parallel] Make Engine class callable

* [Auto Parallel] Update the data loading of tuner

* Print IPS in auto parallel Engine (#46554)

* [AutoParallel] fix dist_split (#46505)

* [AutoParallel] fix dist_split

* add unittest

* update cmakelist

* [AutoParallel] fix sharding (#46572)

* [AutoParallel] fix process_mesh (#46583)

* [AutoParallel] fix reshard when train with eval (#46605)

* [AutoParallel] fix reshard when train with eval

* fix mppp

* [AutoParallel] fix amp when predict (#46637)

* [Auto Parallel]Update comp cost and completion for gpt auto search (#46387)

* update comp cost and completion for gpt auto search

* add unittest

* [Auto Parallel] Fix bugs caused by the inconsistent outputs of Engine API (#46633)

* [Auto Parallel] Unify the logger and outputs of Engine API

* [Auto Parallel] Fix the bugs of to_static

* [Auto Parallel] Adjust the test_to_static.py

* [Auto Parallel] Improve the fine-grained APIs (#46552)

* [Auto Parallel] Suppport different dataloaders

* [Auto Parallel] Add num_shards config for dataset

* [Auto Parallel] Unify the logger and outputs of Engine API

* [Auto Parallel] Fix the bugs of to_static

* [Auto Parallel] Adjust the test_to_static.py

* [Auto Parallel] Add the prepare API and replace __call__ with run

* [Auto Parallel] Improve the private implementations of Engine

* [Auto Parallel] Set capacity of dataloader for opt tuning

* [Auto Parallel] [WIP] Change the fine-grained API

* [Auto Parallel] Improve APIs to support different user cases

* [Auto Parallel] Add removed config

* [Auto Parallel] Add imports

* [Auto Parallel] Fix bugs for to_static

* [Auto Parallel] Remove unnecessary imports

* bugfix (#46921)

* [Auto Parallel] Fix the bug for None labels (#46987)

* [AutoParallel] adapt for gpt-gen (#46771)

* for gpt-gen

* fix reshard

* adapt assign and shape op

* add dist_assign & unittest

* add conditional block unittest

* rename unittest

* [Auto Parallel] Fix the bug of completion (#47056)

* [Auto Parallel] Fix the bug for None labels

* [Auto Parallel] Fix the completion bug

* [AutoParallel] add callbacks (#47014)

* [AutoParallel] add callbacks

* fix unittest

* fix dist_context

* fix engine

* fix cmakelist

* fix unittest's returns

* fix cmakelist

* [Auto Parallel] Add cost interface (#47043)

* add cost interface

* update inferface and add unittest

* update unittest

* update inferface

* [Auto Parallel]Add parallel tuner (#46189)

* add parallel tuner

* add unittest

* fix unittest

* set timeout of unittest

* set unittest timeout

* fix auto_mode setting

* update unittest

* sync from develop and update unittest

* remove unused import

* update unittest

* update cmakelist

* add unittests

Co-authored-by: Yulong Ao <aoyulong@baidu.com>
Co-authored-by: Ruibiao Chen <chenruibiao@baidu.com>
Co-authored-by: caozhou <48191911+Caozhou1995@users.noreply.github.com>
Co-authored-by: JZ-LIANG <jianzhongliang10@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants