Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TensorPrimitives improvements in .NET 9.0 #93286

Open
8 of 15 tasks
Tracked by #97896
stephentoub opened this issue Oct 10, 2023 · 13 comments
Open
8 of 15 tasks
Tracked by #97896

TensorPrimitives improvements in .NET 9.0 #93286

stephentoub opened this issue Oct 10, 2023 · 13 comments
Labels
area-System.Numerics.Tensors Epic Groups multiple user stories. Can be grouped under a theme.
Milestone

Comments

@stephentoub
Copy link
Member

stephentoub commented Oct 10, 2023

Regardless of any additional types we may want to add to System.Numerics.Tensors, we would like to expand the set of APIs exposed on the TensorPrimitives static class in a few ways (beyond the work done in .NET 8 in #92219):

  • Vectorize TensorPrimitives operations that are currently scalar only #97193
  • Alignment improvements for ConvertXx, CosineSimilarity, IndexOfMin, IndexOfMax, IndexOfMinMagnitude, IndexOfMaxMagnitude
  • Investigate precision of the vectorized TensorPrimitives implementations. #98861
  • Additional operations defined in BLAS / LAPACK that don't currently have representation on TensorPrimitives
  • Perform a broader scan of ML.NET APIs, seeking more methods that should be on the post-GA backlog - @michaelgsharp
    • We've already covered all of the shared methods, but there are one-off implementations that might be worth productizing into TensorPrimitives
  • Additional operations that would enable completely removing the internal CpuMath class from ML.NET, e.g. Add (with indices), AddScale (with indices), DotProductSparse, MatrixTimesSource, ScaleAdd improvement via AddMultiply or MultipleAdd overloads, SdcaL1UpdateDense, SdcaL1UpdateSparse, and ZeroMatrixItems (might exist in System.Memory).
  • Double-check the flow of XML docs -> https://github.com/dotnet/dotnet-api-docs -> docs.microsoft.com/
  • Add conceptual docs for TensorPrimitives, maybe near https://github.com/dotnet/docs/blob/main/docs/standard/numerics.md
  • Generic overloads of all relevant methods, constrained to the appropriate numerical types
  • Get benchmarks added into dotnet/performance
    • Collect baseline results from the time between RC2 and GA right before our alignment improvements went in
    • Collect new results from main after all of the alignment
  • Improve performance of Min, Max, MinMagnitude, MaxMagnitude with relation to NaN handling
  • Determine for lengths of 0 if we want to throw or return NaN (we consistently throw today when non-0 is required; ML.NET apparently returns 0?) - @tannergooding
    • We currently throw; if we decide not to throw, this could be changed in a minor release in a non-breaking way.
  • Additional operations from Math{F} that don't currently have representation on TensorPrimitives, e.g. CopySign, Reciprocal{Sqrt}{Estimate}, Sqrt, Ceiling, Floor, Truncate, Log10, Log(x, y) (with y as both span and scalar), Pow(x, y) (with y as both span and scalar), Cbrt, IEEERemainder, Acos, Acosh, Cos, Asin, Asinh, Sin, Atan. This unmerged commit has a sketch, but it's out-of-date with improvements that have been made to the library since, and all of the operations should be vectorized.
  • Refactor the generic TP implementation into multiple source files.
  • Additional operations defined in the numerical interfaces that don't currently have representation on TensorPrimitives, e.g. BitwiseAnd, BitwiseOr, BitwiseXor, Exp10, Exp10M1, Exp2, Exp2M1, ExpM1, Atan2, Atan2Pi, ILogB, Lerp, ScaleB, Round, Log10P1, Log2P1, LogP1, Hypot, RootN, AcosPi, AsinPi, AtanPi, CosPi, SinPi, TanPi

We plan to update the System.Numerics.Tensors package alongside .NET 8 servicing releases. When there are bug fixes and performance improvements only, the patch number part of the version will be incremented. When there are new APIs added, the minor version will be bumped. For guidance on how we bump minor/major package versions, see this example.

@stephentoub stephentoub added api-suggestion Early API idea and discussion, it is NOT ready for implementation area-System.Numerics.Tensors labels Oct 10, 2023
@stephentoub stephentoub added this to the 9.0.0 milestone Oct 10, 2023
@ghost
Copy link

ghost commented Oct 10, 2023

Tagging subscribers to this area: @dotnet/area-system-numerics-tensors
See info in area-owners.md if you want to be subscribed.

Issue Details

Regardless of any additional types we may want to add to System.Numerics.Tensors, we would like to expand the set of APIs exposed on the TensorPrimitives static class in a few ways:

  • Additional operations from Math{F} that don't currently have representation on TensorPrimitives, e.g. CopySign, Reciprocal{Sqrt}{Estimate}, Sqrt, Ceiling, Floor, Truncate, Log10, Log(x, y) (with y as both span and scalar), Pow(x, y) (with y as both span and scalar), Cbrt, IEEERemainder, Acos, Acosh, Cos, Asin, Asinh, Sin, Atan. This unmerged commit has a sketch, but it's out-of-date with improvements that have been made to the library since, and all of the operations should be vectorized.
  • Additional operations defined in the numerical interfaces that don't currently have representation on TensorPrimitives, e.g. BitwiseAnd, BitwiseOr, BitwiseXor, Exp10, Exp10M1, Exp2, Exp2M1, ExpM1, Atan2, Atan2Pi, ILogB, Lerp, ScaleB, Round, Log10P1, Log2P1, LogP1, Hypot, RootN, AcosPi, AsinPi, AtanPi, CosPi, SinPi, TanPi
  • Additional operations defined in BLAS / LAPACK that don't currently have representation on TensorPrimitives
  • Additional operations that would enable completely removing the internal CpuMath class from ML.NET, e.g. Add (with indices), AddScale (with indices), DotProductSparse, MatrixTimesSource, ScaleAdd improvement via AddMultiply or MultipleAdd overloads, SdcaL1UpdateDense, SdcaL1UpdateSparse, and ZeroMatrixItems (might exist in System.Memory).
  • Generic overloads of all relevant methods, constrained to the appropriate numerical types

Concrete proposal to follow.

Author: stephentoub
Assignees: -
Labels:

api-suggestion, area-System.Numerics.Tensors

Milestone: 9.0.0

@Szer
Copy link

Szer commented Oct 10, 2023

Could you please elaborate on the advantages of having these APIs in a BCL rather than in a specialized NuGet package (like numpy in Python)? This could provide a valuable perspective for further discussion.

@stephentoub
Copy link
Member Author

stephentoub commented Oct 10, 2023

Could you please elaborate on the advantages of having these APIs in a BCL rather than in a specialized NuGet package

It is a nuget package today. It's currently not part of netcoreapp. If it were to be pulled into netcoreapp as well, it would be because we'd be using it from elsewhere in netcoreapp, e.g. using it from APIs like Enumerable.Average, BitArray.And, ManagedWebSocket.ApplyMask, etc., which we very well may do in the future (that has no impact on it continuing to be available as a nuget package).

@xoofx
Copy link
Member

xoofx commented Oct 13, 2023

Hey @stephentoub,

Would it be possible to expose the low level parts of the API instead of only providing Span versions?

e.g

public static Vector128<float> Log2(Vector128<float> value);
public static Vector256<float> Log2(Vector256<float> value);
public static Vector512<float> Log2(Vector512<float> value);
//...etc.

I did that for a prototype for a similar API and it's working great.
One reason to expose these APIs is that you can actually build higher level functions (e.g for tensors, the zoo of the activation functions) and build Span versions on top of them.

These API can then be used for other kind of custom Span batching (not related to tensors), where the packing of the vector is different (e.g 4xfloat chuncked xxxx, yyyy, zzzz)

@tannergooding
Copy link
Member

Would it be possible to expose the low level parts of the API instead of only providing Span versions?

Yes, but it needs to be its own proposal and cover all 5 vector types (Vector, Vector64/128/256/512) and consider whether its applicable to Vector2/3/4 as well.

@xoofx
Copy link
Member

xoofx commented Oct 13, 2023

Yes, but it needs to be its own proposal and cover all 5 vector types (Vector, Vector64/128/256/512)

Cool, I will try to write something.

@xoofx
Copy link
Member

xoofx commented Oct 14, 2023

Would it be possible to expose the low level parts of the API instead of only providing Span versions?

Follow-up, created the proposal #93513

@jeffhandley jeffhandley removed the api-suggestion Early API idea and discussion, it is NOT ready for implementation label Oct 23, 2023
@stephentoub stephentoub self-assigned this Nov 8, 2023
@msedi
Copy link

msedi commented Nov 26, 2023

@stephentoub:

If it were to be pulled into netcoreapp as well, it would be because we'd be using it from elsewhere in netcoreapp

if brought to the BCL wouldn't it make sense to rename TensorPrimitives to lets say ArrayMath, VectorMath or VectorPrimitives. Tensor seems a bit exaggerated for what it does, namely doing some math on arrays.

@tannergooding
Copy link
Member

@msedi that would be a breaking change. Additionally, the intent is to expand it to the full set of BLAS support, so Tensor is a very apt and appropriate name that was already scrutinized, reviewed, and approved by API review

@msedi
Copy link

msedi commented Nov 27, 2023

@tannergooding: Sure you right, I was just under the impression that there could be something more primitive. The tensor ist something, lets say higher level whereas the vector/array methods are on a lower level. But I'm completely fine with it whenever I know where to find it,

BTW. When looking at the code and the effort for the TensorPrimitives are there any efforts the JIT will some day manage to do the SIMD unfolding for us?

@tannergooding
Copy link
Member

the JIT will some day manage to do the SIMD unfolding for us?

The JIT is unlikely to get auto-vectorization in the near future as such support is complex and quite expensive to do. Additionally, outside of particular domains, such support does not often light up and has measurable impact to real world apps even less frequently. Especially for small workloads it can often have the opposite effect and slow down your code. In the domains where it does light up, and particularly where it would be beneficial to do, you are often going to get better perf by writing your own SIMD code directly.

It is therefore my opinion that our efforts would be better spent providing APIs from the BCL that provide this acceleration for you. Such as all the APIs on Span<T>, accelerating LINQ, the new APIs on TensorPrimitives, etc. It may likewise be beneficial to expose some SIMD infrastructure helpers like we've defined internally for TensorPrimitives; that is expose some public form of InvokeSpanSpanIntoSpan and friends, which would allow developers to only worry about providing the inner kernel and to have the rest of the SIMD logic (leading/trailing elements, alignment, unrolling, etc) handled internally. Efforts like ISimdVector<TSelf, T> also fit the bill of making it simpler for devs to write SIMD code.

@msedi
Copy link

msedi commented Nov 27, 2023

@tannergooding : Thanks for the info. That makes sense For our case we wrote source generators to generate all the array primitives, currently with Vector but I wanted to benchmark against your implementations. I assume yours is better ;-)

@ericstj ericstj changed the title Augment TensorPrimitives for post-.NET 8 TensorPrimitives improvements in .NET 9.0 Feb 2, 2024
@ericstj ericstj added the Epic Groups multiple user stories. Can be grouped under a theme. label Feb 9, 2024
@stephentoub stephentoub removed their assignment Apr 13, 2024
@tannergooding tannergooding modified the milestones: 9.0.0, 10.0.0 Aug 15, 2024
@tannergooding
Copy link
Member

Remaining work is for .NET 10

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-System.Numerics.Tensors Epic Groups multiple user stories. Can be grouped under a theme.
Projects
None yet
Development

No branches or pull requests

7 participants