Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Berlin discussion follow up #1

Open
mathias-brandewinder opened this issue Sep 30, 2023 · 17 comments
Open

Berlin discussion follow up #1

mathias-brandewinder opened this issue Sep 30, 2023 · 17 comments

Comments

@mathias-brandewinder
Copy link
Contributor

Berlin conference hackathon follow up

@dgrimmin
Copy link
Collaborator

Ready to start creating a project scaffold!

@muehlhaus
Copy link
Member

typing....

@kevmal
Copy link

kevmal commented Sep 30, 2023

...

@dgrimmin
Copy link
Collaborator

Hi all, I just uploaded a bit of a scaffold for building an FsTensor package, and I wrote in the Wiki a summary of our discussions. At this stage it probably makes more sense to still play around scripting, but if the implementation details become more concrete I will iron out the last details to make it a package ready for publishing.

The last 3 days were fun, and I hope next year we have something to show for it :) .

@kMutagene
Copy link
Member

tagging @WhiteBlackGoose and @Happypig375 because i remember you guys discussing this topic some time back, maybe you guys have some thoughts on this?

@dennisgrimminck
Copy link
Contributor

I remember @pkese and @matthewcrews also having a lot of inspiration :)

@dsyme
Copy link
Member

dsyme commented Oct 2, 2023

Can recommend the DiffSharp RawTensor implementation. @muehlhaus knows the gig, I went over it with him.

@Happypig375
Copy link

Happypig375 commented Oct 2, 2023

@kMutagene -

let x: Tensor<int, 2, 3> = tensor [[1; 2; 3]; [4; 5; 6]]
let y = x |> Tensor.map ((+) 1) // tensor [[2; 3; 4]; [5; 6; 7]]
let z = y[3,3] // Error: constant indexing out of bounds

This can further be improved by fsharp/fslang-suggestions#1086

let x = [[1; 2; 3]; [4; 5; 6]] // type inferred as tensor
let y = x |> Tensor.map ((+) 1) // tensor [[2; 3; 4]; [5; 6; 7]]
let z = y[3,3] // Error: constant indexing out of bounds

@matthewcrews
Copy link

I think one of the important things that we need to flesh out is the access patterns we want to be able to support for the various operations we want to support. The way we need to access data will determine the API we want to create and the underlying storage format.

My ideal scenario for the base Tensor library is to describe how we need to be able to access data (item lookup, iteration, slicing, subsetting, etc.). We can provide a default implementation that is entirely managed, and Providers can provide optimized implementations based on the underlying constraints.

Another critical thing is deciding whether we consider a Tensor a mutable or immutable collection. I can argue either way, but I strongly veer toward immutability, given the ethos of the F# language. Supporting random updates of a Tensor would make the backend much more complex. Can we do it? Yes. Do I want do? No 😂.

Another aspect that I would like to support is surfacing the dimensionality of the Tensor in the type system but without having to hand-code N different Tensor types (Tensor1D, Tensor2D, Tensor3D, ... TensorND). I've taken this approach in the past, and it leads to a significant amount of boilerplate that is difficult to maintain.

The true ideal would be that the F# compiler would be able to check the dimensionality and the length of the dimensions. I do not believe this is feasible, though. I believe it would require something more exotic than what is currently in F#, something like Dependent Types.

I also believe we should start with having the programmer specify the type of Tensor, Dense vs. Sparse. Later, we could possibly abstract this away and allow the library to choose a format based on the shape of the data. I don't think that is a good place to start, though.

This means we would start with a Tensor<'T>, which describes the API for accessing values. DenseTensor<'T> and SparseTensor<'T> would be concrete implementations that actually know how to store data. Personally, I almost exclusively work with sparse data, but I know a large part of the community works with dense.

Feel free to disagree with any of this. I've already bled quite a bit on this problem, so my opinion is informed by scars that may no longer be correct.

@dsyme
Copy link
Member

dsyme commented Oct 2, 2023

My experience from TorchSharp and DiffSharp is different

  1. In core library, do not paramterize by T. Do that in a derived library if you like via casting but the core library and extension model should be raw. Use "dtype" property instead ranging over limited types.
  2. Minimize dimensions of extensibility. This is crucial. Each dimension of extensibility creates havoc. Have one "backend" dimension mediated by Backend DLLs that are auto-detected in a solution. The other dimension of extensibility is derived functionality that needs new native dependencies
  3. Tensors should carry dtype, backend, shape values via abstract properties.
  4. Tensors should support broadcasting per LibTorch semantics. This is really important and can't be captured in the F# type system. Embrace it.
  5. LibTorch native tensors should be a valid backend implementation via TorchSharp. We live in a LibTorch world and all significant models and operations are being built around LibTorch and related ecosystem. It gives you superb GPU perf. It gives you differentiability built in. Embrace it while also allowing a managed backend. DiffSharp RawTensor does this well enough.
  6. You should have one another native backend with smaller footprint.
  7. Use Python/LibTorch naming. Don't spend your life fighting this. It's good, standard, massively reduces the learning curve, allows you to port code and samples, allows beginners to participate, allows people to learn relevant skills, allows reuse of documentation and training and tests. For TorchSharp/DiffSharp it's the best decision I made and it's actually very F#
  8. In the core library, don't try to differentiate scalars, vectors matrices etc by rank in the type system. If people want that on top they csn code it up but nothing in the core system should assume this.

I know this is all counter-cultural but I've been down this rabbit hole more times than I care to count and what we did in DiffSharp RawTensor was the best solution I've come up with at the base level when you balance the factors in this difficult space. All the above are essential, and pushing any more complexity or concerns into the base layer kills you.

@matthewcrews
Copy link

matthewcrews commented Oct 2, 2023

  1. In core library, do not paramterize by T. Do that in a derived library if you like via casting but the core library and extension model should be raw. Use "dtype" property instead ranging over limited types.

If this is the direction we decide to go (which I have no problem with), it limits the utility for my applications. Again, this is absolutely fine, but if we restrict the types that a Tensor can contain to a finite set, it's not useful for me. Because I work with things like expressions (LinearExpression, BooleanExpression, etc.). If that is what is best for the community though, that's what should be done. I've long supported my own collection so I'm no worse off.

If I'm misunderstanding this statement, please let me know 😊. I believe I'm the only person who would receive any value from this so I propose that we drop it as a requirement.

@dsyme
Copy link
Member

dsyme commented Oct 2, 2023

(DiffSharp RawTensor doesn't actually use Python naming - it's not really user-facing - but both the user-facing TorchSharp and DiffSharp Tensor types do)

@dsyme
Copy link
Member

dsyme commented Oct 3, 2023

Again, this is absolutely fine, but if we restrict the types that a Tensor can contain to a finite set, it's not useful for me. Because I work with things like expressions (LinearExpression, BooleanExpression, etc.). If that is what is best for the community though, that's what should be done. I've long supported my own collection so I'm no worse off.

Tensor programming is popular for one main reason: the enormous efficiency of batch processing of videos (4D), images (3D), matrices (2D) with GPUs - the batch adding one dimension to each of these, giving 5D, 4D, 3D objects. GPUs work over very limited ranges of datatypes.

There's some value in 3D, 4D, 5D numerically-indexed collections of other things that can't be directly processed by GPUs. This applies particularly when applying the typesafe-indexing techniques you mentioned in your talk in Berlin. Though the contents of the tensors can in theory be indexes into another table.

If I'm misunderstanding this statement, please let me know 😊. I believe I'm the only person who would receive any value from this so I propose that we drop it as a requirement.

It's OK to have a derived library giving a type Tensor<T> and rank distinction Matrix<T>, Vector<T> and just T for scalars. The underlying raw Tensor would be used for the particular unmanaged T types.

What I'm really recommending is that as you build this from the ground up, the "backend" dimension of extensibility should be solved first, at the foundational layer, and then hidden - and this is best done by different implementations of dynamically-typed-and-shape-checked-and-device-moved Tensor and its associated operations. This foundation gives you both control (you can go fully managed, and can build Tensor<T> etc. while still getting full GPU perf) and the possibility to participate as a credible player in the LibTorch ecosystem.

The DiffSharp RawTensor code is here btw. It's not perfect but does allow a LibTorch implementation

The definition order is

The rest is programming, but some notable parts are:

All of this is defined as backend-neutral. The backends are separate

@dsyme
Copy link
Member

dsyme commented Oct 3, 2023

On immutable v. immutable - I went back and forth on this for DiffSharp. It's painful: some specific tensors must absolutely be mutable, notably the enormous number of "model" parameters in any serious training, and several "local" tensors in loops accumulating sums or adjoints (in back-prop) etc. Equally mutability is corrosive and every single use of mutation on tensors needs extreme justification.

I proposed a system where tensors were immutable by default, with a need to specifically convert unsafely to a mutable tensor "register", so you could at least track and reason about where mutation was happening. We didn't end up checking it in, partly because you quickly end up duplicating a lot of operations, and partly because my collaborator was fundamentally OK with tensors being mutable, coming from PyTorch and all (and having bigger things to worry about). Adding a dynamic flag akin to shape/backend/dtype indicating mutability is probably possible, then building up things on top of that.

@dsyme
Copy link
Member

dsyme commented Oct 3, 2023

As an example to help you think where you might want to go with this, I stripped out the bespoke forward/backward differentiation from DiffSharp leaving just a fully-fledged "raw" tensor library with a Reference and LibTorch backend

branch: https://github.com/dsyme/DiffSharp/tree/dsyme/tensors
comparison to DiffSharp head: DiffSharp/DiffSharp@dev...dsyme:DiffSharp:dsyme/tensors

The parts stripped out are

  • forward/backward/nested differentiation
  • models and model differentiation
  • numerical differentiation
  • optimizer (needs differentiation)

The backends actually don't change at all.

The rest is the same: so this is now a fully-fledged tensor library and API with two backends (one Reference, one LibTorch) just without any differentiation/gradient/optimization support (unless you're using LibTorch backend, in which case you could in theory use the gradients integrated into LibTorch tensors). You could easily add other backends for slimmer C++ tensor libraries that don't provide any gradient capabilities, and the reference backend could be progressed to be a much faster managed implementation.

I hope it's useful to you all. If you take this shape of thing it should use a different name, and of course if you use any of this code it should respect the license etc.

(BTW the diff effectively shows exactly what it means to add DiffSharp-style differentition to a tensor API - it's both impressively minimal, but also impressively subtle, and in particular requires that each primitive binary and unary tensor operation supported - either for necessity or performance - declare its necessary derivatives. )

(Note, the tests may be independently useful for you all too)

After stripping back a bit more, this is the size of the resulting DLLs:

03/10/2023  17:29           779,776 DiffSharp.Core.dll

and for the backends:

03/10/2023  17:29           122,368 DiffSharp.Backends.Torch.dll
03/10/2023  17:29           669,184 DiffSharp.Backends.Reference.dll

plus of course the vast TorchSharp and LibTorch binaries.

@Lanayx
Copy link

Lanayx commented Oct 8, 2023

Linking another .NET tensor-related discussion
dotnet/runtime#89639

@smoothdeveloper
Copy link

and a mention in the RC2 announcement about .NET tensor primitives https://devblogs.microsoft.com/dotnet/announcing-dotnet-8-rc2/#introducing-tensor-primitives-for-net

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests