You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Schedule free optimizer (especially schedule-free SGD) offers increased convergence over the existing implementation with no additional memory requirements. There's some variability on the end gradients but this is an equitable engineering tradeoff in many cases. In the same vein, sophia has been shown to converge twice as fast as adam on some language modeling tasks.
Your contribution
These would add in additional dependencies if included straight on, I'm not sure what the typical approach for this is. I'd be willing to implement both features but I'm unsure how the addition of dependencies should go.
The text was updated successfully, but these errors were encountered:
Feature request
See https://github.com/facebookresearch/schedule_free and https://github.com/Liuhong99/Sophia -- These optimizers have very different properties and are often useful over the existing choices.
Motivation
Schedule free optimizer (especially schedule-free SGD) offers increased convergence over the existing implementation with no additional memory requirements. There's some variability on the end gradients but this is an equitable engineering tradeoff in many cases. In the same vein, sophia has been shown to converge twice as fast as adam on some language modeling tasks.
Your contribution
These would add in additional dependencies if included straight on, I'm not sure what the typical approach for this is. I'd be willing to implement both features but I'm unsure how the addition of dependencies should go.
The text was updated successfully, but these errors were encountered: