New New Optimisers #637

MikeInnes · 2019-02-22T15:53:26Z

If we go full steam ahead with more functional-style AD (#628), then we'll need to rework optimisers a little. I think at the core the update step will look very similar, if a bit more functional, something like:

apply(::Optimiser, x, dx, state = ...) -> (dx', state')

Then we have an update function which actually applies the gradient. In general, the x, dx and state can be structs and we'll rebuild x by recursing over it (somewhat like mapleaves, though we won't need to depend on treelike since we can just use reflection here).

I think this is all reasonably straightforward; the main wrinkle in my mind is how we enable in-place updates as an optimisation (since it's not ideal to have two copies of ResNet at once). I'm not aware of any great solution to this right now, so we might need to define an ismutable trait and declare it for types we care about.

The text was updated successfully, but these errors were encountered:

staticfloat · 2019-04-09T01:17:32Z

Can you spell out the difficulties regarding in-place updates? I don't see it as obvious.

MikeInnes · 2019-07-11T15:28:21Z

One difficulty is that mutability is not part of the array API; there isn't an automatic way to discover whether an array is mutable without trying it and checking for an error. So that's where we'll need an ismutable trait that effectively records a database of things we are allowed to mutate, and users with new array types will have to overload Flux.ismutable (which isn't the end of the world, just kind of ugly).

The second difficulty is working out the semantic issues around mutability. If you have a ref around an immutable model, presumably "do this update in place" actually means update the ref. What if the model actually has a mix of immutable and mutable arrays?

Then we need to figure out how to expose this choice as an API, while also sharing mutating and non-mutating code as much as possible so that plugging in your own types is easy (not having to do everything twice). And that both for "leaf types" that get updated themselves and containers that only update their contents.

It's all doable, but quite a lot fiddlier than it initially looks, and will probably take some time to work out well.

669: using Zygote r=MikeInnes a=MikeInnes Otherwise known as "break all the things". This will be a huge change so I'm beginning to prepare now, even though Zygote is still a couple of months off from being really ready. **Do not try this at home** (yet) – this branch is eventually aimed at beta testers, but isn't even ready for that yet. The idea is to break as little code as possible, which means supporting the current `Params` API; but I also want to start prototyping the nicer things discussed in #628 and other issues. Blocking issues: * [x] Get the tests passing. * [x] Check tests on GPU. * [x] Rewrite all the docs. * [x] Cache invalidation (JuliaLabs/Cassette.jl#6). * [x] Moving over adjoints (FluxML/Zygote.jl#81). * [x] General Zygote robustness. Nice to have: * [ ] Robust nested AD (may not be a blocker if one can still use Tracker with Flux). * [x] Zygote support for modules / globals as discussed in #628, along with #637. * [x] Better train/test mode as in #643. If you're the kind of person who ignores triangular road signs, you can try this with ```julia ]add Flux#zygote Zygote#master ``` Co-authored-by: Mike J Innes <mike.j.innes@gmail.com> Co-authored-by: Elliot Saba <staticfloat@gmail.com> Co-authored-by: thebhatman <manjunathbhat9920@gmail.com>

Roger-luo · 2020-03-06T01:10:39Z

So come from slack, I think it would be quite useful if we could move the optimizers to a single package (even share code with other things like Optim, but move to a package first maybe). We are currently using the Flux optimizer in Yao, but Flux itself is a quite heavy dependency for just Optimizers.

ToucheSir · 2020-08-21T02:03:33Z

Now that https://github.com/SciML/ArrayInterface.jl#ismutablex exists, at least part of this is in place. WRT avoiding the "two copies of resnet in memory" problem, however, is there a more convenient API (both in terms of implementation and usage complexity) than the explicit param-passing that jax, haiku and co. use?

Roger-luo · 2021-01-14T17:36:26Z

I'd like to bump this. I realize we recently needs to implement some of our own gradient based optimizer for Yao, I'm wondering if there are people interested in splitting the Optimise module out as a package? and perhaps with the new interface?

ToucheSir · 2021-01-14T17:39:13Z

@Roger-luo you may be interested in https://github.com/FluxML/Optimisers.jl (bit of discussion in the issues) and FluxML/FluxML-Community-Call-Minutes#22.

darsnack · 2022-07-26T13:27:46Z

Should we close this issue? We have the proposed API via Optimisers.jl now which has a mechanism for handling in-place updates within an immutable API.

MikeInnes mentioned this issue Feb 22, 2019

Optimisers make me sad #234

Closed

MikeInnes added the discussion label Feb 26, 2019

This was referenced Mar 6, 2019

Flux Integration dfdx/Yota.jl#26

Open

using Zygote #669

Merged

MikeInnes mentioned this issue Apr 5, 2019

Design FluxML/Optimisers.jl#1

Open

MikeInnes mentioned this issue Jun 10, 2019

Optimisers not work for real number #790

Closed

MikeInnes mentioned this issue Aug 9, 2019

Flux Optimizers should define equality #823

Open

MikeInnes mentioned this issue Sep 2, 2019

Custom serialization pass for intermediate states #845

Closed

DrChainsaw mentioned this issue Sep 29, 2019

Memory leak with stateful optimizers DrChainsaw/NaiveNASflux.jl#18

Open

MikeInnes mentioned this issue Oct 3, 2019

better printing for optimizers #844

Closed

MikeInnes mentioned this issue Dec 9, 2019

Roadmap to Flux 1.0 #961

Open

Roger-luo mentioned this issue Mar 6, 2020

WIP: Make optimize work on structs #1073

Closed

Roger-luo mentioned this issue Mar 6, 2020

RFC: return full gradient of all arguments in gradient FluxML/Zygote.jl#535

Closed

DhairyaLGandhi mentioned this issue Jan 29, 2021

Move all optimizers to Optimisers.jl FluxML/Optimisers.jl#9

Merged

2 tasks

darsnack linked a pull request Jun 14, 2021 that will close this issue

Use Optimisers.jl #1481

Open

ToucheSir mentioned this issue Mar 5, 2022

Proposal: Move params to Zygote #1900

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New New Optimisers #637

New New Optimisers #637

MikeInnes commented Feb 22, 2019

staticfloat commented Apr 9, 2019

MikeInnes commented Jul 11, 2019

Roger-luo commented Mar 6, 2020

ToucheSir commented Aug 21, 2020 •

edited

Loading

Roger-luo commented Jan 14, 2021

ToucheSir commented Jan 14, 2021

darsnack commented Jul 26, 2022

New New Optimisers #637

New New Optimisers #637

Comments

MikeInnes commented Feb 22, 2019

staticfloat commented Apr 9, 2019

MikeInnes commented Jul 11, 2019

Roger-luo commented Mar 6, 2020

ToucheSir commented Aug 21, 2020 • edited Loading

Roger-luo commented Jan 14, 2021

ToucheSir commented Jan 14, 2021

darsnack commented Jul 26, 2022

ToucheSir commented Aug 21, 2020 •

edited

Loading