-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimizer Design #4656
Optimizer Design #4656
Conversation
doc/design/optimizer.md
Outdated
op related. | ||
""" | ||
... | ||
return update_op |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When a user wants to update twice, the update_op need to trace the first update_op
and all update, backward op related. Maybe we need to write some guides to point out it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, after discussing with @dzhwinter , this can be done in the current design, but not the most important thing to consider now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
have already make the backward interface public, user and call it directly multiple times to create more gradient operators in the graph.
doc/design/optimizer.md
Outdated
@@ -0,0 +1,85 @@ | |||
## Optimizer Design | |||
In deeplearning system, `Optimizer` is used to optimize(minimize) loss thow updating a list of parameters. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thow
is a typo?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
doc/design/optimizer.md
Outdated
|
||
### A typical training process: | ||
|
||
1. run forward to calculate activation using data and parameter. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not think this typical training process fits our current design.
Currently, we put every operator into one ProgramDesc
. There are not three running stages explicitly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a general abstract training process, no matter how complex the training process is, they are all composed of these stages.
In Our design, we also have functions like backward and optimize to put related operators into ProgramDesc. Here we just put the interface into Optimizer
as high level API.
doc/design/optimizer.md
Outdated
|
||
```python | ||
class Optimizer(object): | ||
def _backward(loss): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
backward and update should be public.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
doc/design/optimizer.md
Outdated
3. User use the optimizer to `minimize` a certain `cost` thow updating parameters in parameter_list. | ||
|
||
```python | ||
opt = optimizer.minimize(cost, parameter_list=[w1, ...]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
opt should as a list.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
doc/design/optimizer.md
Outdated
@@ -0,0 +1,85 @@ | |||
## Optimizer Design | |||
In deeplearning system, `Optimizer` is used to optimize(minimize) loss thow updating a list of parameters. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This design doc doesn't explain the challenge.
It looks to me that the challenge is
The Problem
A PaddlePaddle program, or a block, is a sequence of operators operating variables. A training program needs to do three kinds of works:
- the forward pass, which computes intermediate results and the cost(s),
- the backward pass, which derives gradients from intermediate and costs, and
- the optimization pass, which update model parameters.
These works rely on three kinds of operators:
- forward operators,
- gradient operators, and
- optimization operators.
It's true that users should be able to create all these operators manually by calling some low-level API, but it would be much more convenient if they could only describe the forward pass and let PaddlePaddle create the backward and optimization operators automatically.
In this design, we propose a high-level API that automatically derives the optimisation pass and operators from the forward pass.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
doc/design/optimizer.md
Outdated
## Optimizer Design | ||
In deeplearning system, `Optimizer` is used to optimize(minimize) loss thow updating a list of parameters. | ||
|
||
### A typical training process: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the above proposed section ## The Problem
is accepted, this paragraph of three bullets can be removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
doc/design/optimizer.md
Outdated
|
||
1. User write code to describe the network: | ||
|
||
```python |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This Python program needs to be properly indented -- to the right of 1.
in the above line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
doc/design/optimizer.md
Outdated
cost = layer.mse(hidden, labels) | ||
``` | ||
|
||
the code above will generate forward operators in [block](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/block.md). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the => The
the code above => The above code snippet
will generate => creates
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
doc/design/optimizer.md
Outdated
the code above will generate forward operators in [block](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/block.md). | ||
|
||
|
||
2. User create a Optimizer and set parameter list that it need to update. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Either The user creates
or Users create
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
doc/design/optimizer.md
Outdated
|
||
2. User create a Optimizer and set parameter list that it need to update. | ||
|
||
```python |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct code snippet indentation in the Markdown doc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
doc/design/optimizer.md
Outdated
|
||
### What does optimizer do: | ||
|
||
In PaddlePaddle, we use block of operators to describe computation. From the Python Interface we described above, we can see that `Optimizer` should add some operators to the computation block: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
block of operators => blocks of operators
we use => PaddlePaddle uses
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done, removed
doc/design/optimizer.md
Outdated
|
||
```python | ||
class Optimizer(object): | ||
def _backward(loss): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_backward => create_backward_pass
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
doc/design/optimizer.md
Outdated
... | ||
return variables | ||
|
||
def _update(var_list): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_update => create_optimization_pass
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
|
||
1. User write code to describe the network: | ||
|
||
```python |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the pseudo code here not format well.
doc/design/optimizer.md
Outdated
|
||
```python | ||
class Optimizer(object): | ||
def create_backward_pass(loss, parameter_list=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Parameter and variable look like interchangeable in Python API. Not sure they are referred to the same concept.
doc/design/optimizer.md
Outdated
This method simply combines calls `create_backward_pass()` and | ||
`create_optimization_pass()`. | ||
""" | ||
vars_grads = create_backward_pass(loss) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo create_backward_pass(loss)
=> create_backward_pass(loss, parameter_list)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
The reason of use a uniform interface for Optimizer.
|
issue: #4679