Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Request: design description #1

Open
grothesque opened this issue Sep 6, 2024 · 8 comments
Open

Request: design description #1

grothesque opened this issue Sep 6, 2024 · 8 comments

Comments

@grothesque
Copy link

Thanks for sharing this very interesting library! I've been trying to understand its inner workings, but this is difficult since some central aspect seem both very clever (in a positive sense), and are not documented at all. I think that adding a "DESIGN.md" file (even a small one) may greatly increase the usefulness of this project to other people (be it possible contributors or members of the wider numeric Rust community).

For example, would you mind to describe how Span works? I note that it is a zero-sized struct, and only &Span are passed around. But what do these actually point to? And what is the role of RawSpan?

Or let's take Expr / Expression. I first thought that it's a view with a somewhat badly chosen name, but it seems to be much more than that. Are types that implement the trait Expression "expression templates" in C++ speak?

What is the relation of Expr and Span and why are both needed? You write that spans are like Rust slices, but slices are fat pointers, while spans seem to be just pointers that point to... something (that is owned by whom?).

It would be also interesting to have a brief discussion of the design aspects of C++'s mdspan/mdarray that you took over, and those that you didn't/couldn't and the reasons for that.

@fre-hu
Copy link
Owner

fre-hu commented Sep 11, 2024

Thanks for the suggestion and comments! I have added design notes in the latest commit as a start. The layout types are omitted for now since I'm looking into if they can be improved.

@grothesque
Copy link
Author

grothesque commented Sep 11, 2024

Very useful, many thanks.

Here's one thing that is still unclear to me: Would it be possible to modify the example program such that a or b could be created using the array! macro, i.e. with static size? Or in other words, is there some way to cast a static stack-allocated array into a DSpan?

If this is not possible, how would one write a (generic?) function that accepts input created with either expr! or array!?

@grothesque
Copy link
Author

I think I found a solution myself. The following generic function accepts references to arrays as well:

pub fn matmul<SA, SB, SC>(a: &Span<f64, SA>, b: &Span<f64, SB>, c: &mut Span<f64, SC>)
where
    SA: Shape,
    SB: Shape,
    SC: Shape,
{
    for (mut cj, bj) in c.cols_mut().zip(b.cols()) {
        for (ak, bkj) in a.cols().zip(bj) {
            for (cij, aik) in cj.expr_mut().zip(ak) {
                *cij = aik.mul_add(*bkj, *cij);
            }
        }
    }
}

Here is my understanding why the original non-generic matmul function did not accept references to stack-allocated arrays: these arrays have constant sizes, so that there is nowhere an appropriate RawSpan that could be pointed to. Is this correct?

Still, is there a way to make an array!-allocated array work with the original matmul? That would probably require creating a view of it somehow (that would contain a RawSpan instance).

@fre-hu
Copy link
Owner

fre-hu commented Sep 12, 2024

Yes, right. One way is to do a.reshape([2, 2]) or so which will change the shape type to have dynamic dimensions.

@grothesque
Copy link
Author

One piece of information that I feel should be added somewhere is while there are both layouts and mappings. The way I understand it is that while types that implement the Layout trait (e.g. Dense) do not store any data, types that implement the Mapping trait (e.g. DenseMapping) do store data, i.e. the shape and the necessary strides.

So, if my understanding is correct, an Array (stack-allocated, static-sized array) instance does not keep a Mapping struct, while a Grid or Expr does.

@grothesque
Copy link
Author

grothesque commented Sep 13, 2024

Yes, right. One way is to do a.reshape([2, 2]) or so which will change the shape type to have dynamic dimensions.

Thanks, this solution was not obvious to me. But it requires redundantly restating the shape, and panics if it doesn't match. The following seems to be more robust: a.reshape(a.shape().dims()), however it doesn't work with reshape_mut due to the borrow checker complaining.

Perhaps this should be a method, how about to_expr?

Or perhaps there should be even an equivalent impl of Borrow or AsRef? (This might be a bad idea, I don't have much experience with this.) In this way, the call to matmul would look the same no matter whether the arguments are borrowed as DSpan or as Span.

@fre-hu
Copy link
Owner

fre-hu commented Sep 15, 2024

I added more conversions for Array to be consistent with normal arrays. It should work now with &Expr::from(&a). Let's see if it is sufficient.

@grothesque
Copy link
Author

Here is an improved generic version of matmul that uses the type system to ensure that the static sizes (if used) do match:

pub fn matmul_generic<D0: Dim, D1: Dim, D2: Dim>(
    a: &Span<f64, (D0, D1)>,
    b: &Span<f64, (D1, D2)>,
    c: &mut Span<f64, (D0, D2)>,
) {
    for (mut cj, bj) in c.cols_mut().zip(b.cols()) {
        for (ak, bkj) in a.cols().zip(bj) {
            for (cij, aik) in cj.expr_mut().zip(ak) {
                *cij = aik.mul_add(*bkj, *cij);
            }
        }
    }
}

Is this use of mdarray "idiomatic"?

The above function has also the advantage of allowing the compiler to optimize the inner loop for a fixed number of iterations. For example, the following non-generic function

pub fn matmul_opt(a: &DSpan<f64, 2>, b: &DSpan<f64, 2>, c: &mut DSpan<f64, 2>) {
    let a = a.reshape((Const::<4>, Dyn(2)));
    let mut c = c.reshape_mut((Const::<4>, Dyn(2)));
    matmul_generic(&a, &b, &mut c);
}

is compiled to assembly that uses the fact that the inner loop has exactly four iterations. (I verified looking at the assembly generated by cargo rustc --release -- -Ctarget-feature=+fma --emit asm.) However, it only uses the FMA instruction vfmadd213sd and not its packed variant vfmadd213pd - I guess that's because the compiler cannot be sure about sufficient alignment. (I saw that there are ways to allocate Grid arrays with a specific alignment, so there seem to be ways.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants