-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
optimizer design #3711
optimizer design #3711
Conversation
This method simply combines calls `_backward()` and | ||
`_update()`. | ||
""" | ||
backward_net = _backward(loss) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the type of backward_net? NetOp, Block or var_list?
And does class optimizer has some private member? Is backward_net the member of Optimizer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This part has some problems, I will update it.
because we do not want users to know the step of `_backward` and `_update`, so we decide to export only `minimize()` to users. | ||
|
||
## Three situation in parameter update. | ||
1. One machine no card/one GPU card. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are four situations:
- single thread or single GPU
- multi-threads
- multi-GPUs
- multi-nodes
In multi-threads, we only have one copy of parameters and gradients in memory. But in multi-GPUS, we have parameters and gradients in every GPU card.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, I will update this
return update_op | ||
``` | ||
|
||
because we do not want users to know the step of `_backward` and `_update`, so we decide to export only `minimize()` to users. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have backward first, and then update later. And will we implement the stragety that backward and update op by op at the same time?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
backward will take a loss as input and add gradient op to all the related variable, so backward will be a whole step, but optimizer take a parameter var_list
, so you can add update op to one var at a time and add different update op to different paramets
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean when we run the network, can we have gradients computing and gradients updating in parallel.
In paddle, we use callback to implement this. Once a Op's backward gradient is calculated, the callback updating function is executed to update the gradient. At the same time, next Op's gradients is calculating.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be done be re arrange the order of operators in the netop/block when run.
fixed by #4656 |
No description provided.