Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New New Optimisers #637

Open
MikeInnes opened this issue Feb 22, 2019 · 7 comments · May be fixed by #1481
Open

New New Optimisers #637

MikeInnes opened this issue Feb 22, 2019 · 7 comments · May be fixed by #1481

Comments

@MikeInnes
Copy link
Member

If we go full steam ahead with more functional-style AD (#628), then we'll need to rework optimisers a little. I think at the core the update step will look very similar, if a bit more functional, something like:

apply(::Optimiser, x, dx, state = ...) -> (dx', state')

Then we have an update function which actually applies the gradient. In general, the x, dx and state can be structs and we'll rebuild x by recursing over it (somewhat like mapleaves, though we won't need to depend on treelike since we can just use reflection here).

I think this is all reasonably straightforward; the main wrinkle in my mind is how we enable in-place updates as an optimisation (since it's not ideal to have two copies of ResNet at once). I'm not aware of any great solution to this right now, so we might need to define an ismutable trait and declare it for types we care about.

@staticfloat
Copy link
Contributor

Can you spell out the difficulties regarding in-place updates? I don't see it as obvious.

@MikeInnes
Copy link
Member Author

One difficulty is that mutability is not part of the array API; there isn't an automatic way to discover whether an array is mutable without trying it and checking for an error. So that's where we'll need an ismutable trait that effectively records a database of things we are allowed to mutate, and users with new array types will have to overload Flux.ismutable (which isn't the end of the world, just kind of ugly).

The second difficulty is working out the semantic issues around mutability. If you have a ref around an immutable model, presumably "do this update in place" actually means update the ref. What if the model actually has a mix of immutable and mutable arrays?

Then we need to figure out how to expose this choice as an API, while also sharing mutating and non-mutating code as much as possible so that plugging in your own types is easy (not having to do everything twice). And that both for "leaf types" that get updated themselves and containers that only update their contents.

It's all doable, but quite a lot fiddlier than it initially looks, and will probably take some time to work out well.

bors bot added a commit that referenced this issue Sep 11, 2019
669: using Zygote r=MikeInnes a=MikeInnes

Otherwise known as "break all the things". This will be a huge change so I'm beginning to prepare now, even though Zygote is still a couple of months off from being really ready. **Do not try this at home** (yet) – this branch is eventually aimed at beta testers, but isn't even ready for that yet.

The idea is to break as little code as possible, which means supporting the current `Params` API; but I also want to start prototyping the nicer things discussed in #628 and other issues.

Blocking issues:

* [x] Get the tests passing.
* [x] Check tests on GPU.
* [x] Rewrite all the docs.
* [x] Cache invalidation (JuliaLabs/Cassette.jl#6).
* [x] Moving over adjoints (FluxML/Zygote.jl#81).
* [x] General Zygote robustness.

Nice to have:

* [ ] Robust nested AD (may not be a blocker if one can still use Tracker with Flux).
* [x] Zygote support for modules / globals as discussed in #628, along with #637.
* [x] Better train/test mode as in #643.

If you're the kind of person who ignores triangular road signs, you can try this with

```julia
]add Flux#zygote Zygote#master
```

Co-authored-by: Mike J Innes <mike.j.innes@gmail.com>
Co-authored-by: Elliot Saba <staticfloat@gmail.com>
Co-authored-by: thebhatman <manjunathbhat9920@gmail.com>
@Roger-luo
Copy link
Contributor

So come from slack, I think it would be quite useful if we could move the optimizers to a single package (even share code with other things like Optim, but move to a package first maybe). We are currently using the Flux optimizer in Yao, but Flux itself is a quite heavy dependency for just Optimizers.

@ToucheSir
Copy link
Member

ToucheSir commented Aug 21, 2020

Now that https://github.com/SciML/ArrayInterface.jl#ismutablex exists, at least part of this is in place. WRT avoiding the "two copies of resnet in memory" problem, however, is there a more convenient API (both in terms of implementation and usage complexity) than the explicit param-passing that jax, haiku and co. use?

@Roger-luo
Copy link
Contributor

I'd like to bump this. I realize we recently needs to implement some of our own gradient based optimizer for Yao, I'm wondering if there are people interested in splitting the Optimise module out as a package? and perhaps with the new interface?

@ToucheSir
Copy link
Member

@Roger-luo you may be interested in https://github.com/FluxML/Optimisers.jl (bit of discussion in the issues) and FluxML/FluxML-Community-Call-Minutes#22.

@darsnack
Copy link
Member

Should we close this issue? We have the proposed API via Optimisers.jl now which has a mechanism for handling in-place updates within an immutable API.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Development

Successfully merging a pull request may close this issue.

5 participants