-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimizer Design #4656
Merged
wangkuiyi
merged 5 commits into
PaddlePaddle:develop
from
jacquesqiao:optimizer-on-block
Oct 11, 2017
Merged
Optimizer Design #4656
Changes from all commits
Commits
Show all changes
5 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,105 @@ | ||
## Optimizer Design | ||
|
||
### The Problem | ||
|
||
A PaddlePaddle program, or a block, is a sequence of operators operating variables. A training program needs to do three kinds of works: | ||
|
||
1. the forward pass, which computes intermediate results and the cost(s), | ||
1. the backward pass, which derives gradients from intermediate results and costs, and | ||
1. the optimization pass, which update model parameters to optimize the cost(s). | ||
|
||
These works rely on three kinds of operators: | ||
|
||
1. forward operators, | ||
1. gradient operators, and | ||
1. optimization operators. | ||
|
||
It's true that users should be able to create all these operators manually by calling some low-level API, but it would be much more convenient if they could only describe the forward pass and let PaddlePaddle create the backward and optimization operators automatically. | ||
|
||
In this design, we propose a high-level API that automatically derives the optimisation pass and operators from the forward pass. | ||
|
||
|
||
### High-level Python API to describe the training process | ||
|
||
1. User write code to describe the network: | ||
|
||
```python | ||
images = layer.data("images") | ||
labels = layer.data("labels") | ||
w1 = pd.var("w1") | ||
b1 = pd.var("b1") | ||
hidden = layer.fc(images, w=w1, b=b1) | ||
cost = layer.mse(hidden, labels) | ||
``` | ||
|
||
The above code snippet will create forward operators in [Block](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/block.md). | ||
|
||
|
||
2. Users create a certain kind of Optimizer with some argument. | ||
|
||
```python | ||
optimizer = AdagradOptimizer(learing_rate=0.001) | ||
``` | ||
|
||
3. Users use the optimizer to `minimize` a certain `cost` through updating parameters in parameter_list. | ||
|
||
```python | ||
opt_op_list = optimizer.minimize(cost, parameter_list=[w1, b1]) | ||
``` | ||
The above code snippet will create gradient and optimization operators in Block. The return value of `minimize()` is list of optimization operators that will be run by session. | ||
|
||
4. Users use Session/Executor to run this opt_op_list as target to do training. | ||
|
||
```python | ||
sess.run(target= opt_op_list, ...) | ||
``` | ||
|
||
#### Optimizer Python interface: | ||
|
||
```python | ||
class Optimizer(object): | ||
"""Optimizer Base class. | ||
|
||
""" | ||
|
||
def __init__(self): | ||
pass | ||
|
||
def create_backward_pass(self, loss, parameter_list=None): | ||
""" | ||
create and add gradient Operators in BlockDesc to Compute gradients of `loss` | ||
for parameters in parameter_list | ||
|
||
Args: | ||
loss: an variable generated by cost function. | ||
parameter_list: parameters that need to compute gradient and update to optimize the lost. | ||
|
||
Returns: | ||
list of (parameters, gradients) pair. | ||
""" | ||
return None | ||
|
||
def create_optimization_pass(self, parameters_and_grads): | ||
"""Add optimization operators to update gradients to variables. | ||
|
||
Args: | ||
parameters_and_grads: a list of (variable, gradient) pair to update. | ||
|
||
Returns: | ||
optmization_op_list: a list of optimization operator that will update parameter using gradient. | ||
""" | ||
return None | ||
|
||
def minimize(self, loss, parameter_list): | ||
"""Add operations to minimize `loss` by updating `parameter_list`. | ||
|
||
This method combines interface `create_backward_pass()` and | ||
`create_optimization_pass()` into one. | ||
""" | ||
params_grads = self.create_backward_pass(loss, parameter_list) | ||
update_ops = self.create_optimization_pass(params_grads) | ||
return update_ops | ||
|
||
``` | ||
|
||
Users can inherit the Optimizer above to create their own Optimizer with some special logic, such as AdagradOptimizer. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the pseudo code here not format well.