Implement repr(packed) for repr(simd) #117116

calebzulawski · 2023-10-24T06:05:03Z

This allows creating vectors with non-power-of-2 lengths that do not have padding. See rust-lang/portable-simd#319

rustbot · 2023-10-24T06:05:11Z

(rustbot has picked a reviewer for you, use r? to override)

scottmcm · 2023-10-24T06:24:19Z

Is it possible it would make sense for this just to be what repr(simd) means, without needing packed?

Since all the other uses of it are power-of-two anyway, and thus if you make simd always be the GCD, that might be fine?

programmerjake · 2023-10-24T06:28:24Z

compiler/rustc_ty_utils/src/layout.rs

-            let align = dl.vector_align(size);
+
+            let (abi, align) = if def.repr().packed() && !e_len.is_power_of_two() {
+                // Non-power-of-two vectors have padding up to the next power-of-two.


iirc LLVM doesn't guarantee vectors don't have more padding than just to the next power of two, e.g. padding <2 x i8> to 128-bits.

https://llvm.org/docs/LangRef.html#vector-type seems to say that LLVM guarantees that vectors have no padding in their in memory representation? Padding <2 x i8> to 128 bits is something that LLVM does inside the vector registers (e.g. on aarch64), but never in memory.

that's only for load/store, LLVM vector types still have big enough alignment that Rust requires padding in many cases.

I'm not sure if it's guaranteed, but rustc does make the assumption (somewhere, I forget where)

core::mem::size_of:

More specifically, this is the offset in bytes between successive elements in an array with that item type including alignment padding.

Whoops, I meant rustc assumes that npot vectors round up to the next power of two.

that's only for load/store, LLVM vector types still have big enough alignment that Rust requires padding in many cases.

I thought LLVM types didn't have any sort of intrinsic alignment? And even if they have a preferred alignment that's too big, can't you just always annotate load/store instructions with the smaller alignment?

that's only for load/store, LLVM vector types still have big enough alignment that Rust requires padding in many cases.

I thought LLVM types didn't have any sort of intrinsic alignment? And even if they have a preferred alignment that's too big, can't you just always annotate load/store instructions with the smaller alignment?

they do have an intrinsic ABI alignment, it's used for default alignment of load/store/alloca when not explicitly specified and for stack spilling and a few other things

I'm not sure if it's guaranteed, but rustc does make the assumption (somewhere, I forget where)

just found this (yeah, I know that's not what you meant, but I couldn't resist):
https://lang-team.rust-lang.org/frequently-requested-changes.html#size--stride

Rust assumes that the size of an object is equivalent to the stride of an object

calebzulawski · 2023-10-24T06:32:45Z

Is it possible it would make sense for this just to be what repr(simd) means, without needing packed?

Since all the other uses of it are power-of-two anyway, and thus if you make simd always be the GCD, that might be fine?

Maybe eventually, but for now I didn't implement any way to call simd intrinsics on packed repr vectors--they still need to be loaded into full (padded) vectors. std::simd will need to convert the vectors before calling intrinsics.

calebzulawski · 2023-10-24T06:37:37Z

Note that I also change the ABI, so it's not possible to pass these in vector registers (I think). So if target feature calling conventions ever get worked out, there may still be a benefit to the current implementation as well.

workingjubilee · 2023-10-24T16:09:29Z

Note that I also change the ABI, so it's not possible to pass these in vector registers (I think). So if target feature calling conventions ever get worked out, there may still be a benefit to the current implementation as well.

Given that we're specifically working on defining target feature minima specifically so we can do that, I don't think that blocking passing in registers is acceptable.

calebzulawski · 2023-10-24T16:22:17Z

Unfortunately I think we simply can't have it both ways. LLVM defines vectors as having padding and we would like them to not have padding. This change doesn't affect all vectors, just those with repr(packed), which specifically removes padding and reduces alignment. A user could always use a non repr(packed) vector to pass by register.

programmerjake · 2023-10-24T16:22:36Z

Note that I also change the ABI, so it's not possible to pass these in vector registers (I think). So if target feature calling conventions ever get worked out, there may still be a benefit to the current implementation as well.

Given that we're specifically working on defining target feature minima specifically so we can do that, I don't think that blocking passing in registers is acceptable.

I don't expect this to block passing in registers, because this is defining the in-memory ABI. the passing-by-value ABI can be totally different, since semantically you have to then store that value somewhere in memory before you can look at the bytes.

programmerjake · 2023-10-24T16:27:33Z

Unfortunately I think we simply can't have it both ways.

I think we can have it both ways -- just pass-by-value as a LLVM vector and use the array type for in-memory address calculations and allocating memory. load/store with vector types only read-write the non-padding bytes, so we're fine.

calebzulawski · 2023-10-24T16:28:27Z

Unfortunately I think we simply can't have it both ways.

I think we can have it both ways -- just pass-by-value as a LLVM vector and use the array type for in-memory address calculations and allocating memory. load/store with vector types only read-write the non-padding bytes, so we're fine.

True, this is definitely possible once we are able to.

workingjubilee · 2023-10-24T23:04:39Z

I don't think that will optimize well at all. LLVM historically reasons about in-memory and in-register data in conflationary ways.

calebzulawski · 2023-10-24T23:43:07Z

I don't think there's any optimization involved. It's just a single vector load.

Either way, this PR doesn't change how existing vectors work. repr(packed) is specified to remove padding and it doesn't work correctly on SIMD types, which this change fixes.

I agree that the calling convention could become an issue in the future, but I'd like to deal with that in the future when it comes up, since there's still no consensus on a fix regardless of this PR. Also, fixing repr(packed) for SIMD doesn't require us to use it for std::simd, but I'd like to fix this issue first so we can try it out.

workingjubilee · 2023-10-25T00:46:02Z

"Legalizing to a vector load instead of a series of scalar loads" is an optimization.

workingjubilee · 2023-10-25T01:10:44Z

...Okay, I didn't quite divine from the messages/code changes that repr(packed) doesn't have any functionality currently with repr(simd), and that we don't even emit an error.

workingjubilee · 2023-10-25T01:12:49Z

I would prefer this to come with codegen/assembly tests to demonstrate what it actually looks like when these types are interacted with, and to prove it like... legalizes correctly, where "correctly" in this case is probably "not an LLVM error, I guess?"

calebzulawski · 2023-10-25T02:24:54Z

Since repr(simd, packed) can't really be used directly, I added a pretty simple test that just loads a packed vector and performs an operation on it, just to make sure it doesn't crash LLVM or anything like that.

workingjubilee · 2023-11-14T22:09:26Z

I want to approve this but I still can't see what the emitted LLVMIR is, and I can't e.g. increase my confidence by reaching for friends who know way more about LLVMIR and legalization if there's no codegen diffs to show them.

calebzulawski · 2023-11-15T02:27:06Z

Not quite as rigorous as as a codegen test, consider this reduced case for illustrative purposes:

#![feature(repr_simd, platform_intrinsics)]

#[repr(simd, packed)]
pub struct Simd<T, const N: usize>([T; N]);

#[repr(simd)]
#[derive(Copy, Clone)]
pub struct FullSimd<T, const N: usize>([T; N]);

extern "platform-intrinsic" {
    fn simd_mul<T>(a: T, b: T) -> T;
}

// non-powers-of-two have padding and need to be expanded to full vectors
fn load<T, const N: usize>(v: Simd<T, N>) -> FullSimd<T, N> {
    unsafe {
        let mut tmp = core::mem::MaybeUninit::<FullSimd<T, N>>::uninit();
        std::ptr::copy_nonoverlapping(&v as *const _, tmp.as_mut_ptr().cast(), 1);
        tmp.assume_init()
    }
}

pub fn square(x: Simd<f32, 3>) -> FullSimd<f32, 3> {
    let x = load(x);
    unsafe { simd_mul(x, x) }
}

With optimization, this simply generates the following:

; foo::square
; Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(argmem: readwrite) uwtable
define void @_ZN3foo6square17h0c2320cc22379be1E(ptr noalias nocapture noundef writeonly sret(<3 x float>) align 16 dereferenceable(16) %_0, ptr noalias nocapture noundef readonly align 4 dereferenceable(12) %x) unnamed_addr #0 {
start:
  %x.val = load <3 x float>, ptr %x, align 4
  %0 = fmul <3 x float> %x.val, %x.val
  store <3 x float> %0, ptr %_0, align 16
  ret void
}

This is nearly the same as using regular vectors, but note that the input %x is passed as a pointer with align 4. Remember that vectors are always passed by reference anyway, but it would be align 16. You will also see on the first line, where a regular vector load is align 16, this load is align 4.

calebzulawski · 2023-11-15T02:28:10Z

Whoops, accidentally closed while writing my comment :)

calebzulawski · 2023-11-18T16:23:38Z

@workingjubilee I added my comment above as a codegen test

bors · 2023-12-09T05:31:30Z

💔 Test failed - checks-actions

workingjubilee · 2023-12-09T05:33:06Z

@bors r-

calebzulawski · 2023-12-10T14:21:11Z

@workingjubilee I removed the codegen test since it's so dependent on optimization (just little things like attributes are changing, but making it impossible to write), and the tests run on so many optimization levels, I can't seem to make a useful test. Considering repr(simd) is unstable in the first place, hopefully you've seen enough codegen examples to be confident enough to merge as-is

workingjubilee · 2023-12-11T06:55:01Z

Indeed, I mostly wanted an example! sadness about the test, though. It really shouldn't be so hard...

@bors r+

bors · 2023-12-11T06:55:04Z

📌 Commit aa00bae has been approved by workingjubilee

It is now in the queue for this repository.

bors · 2023-12-11T08:07:23Z

⌛ Testing commit aa00bae with merge 8b1ba11...

bors · 2023-12-11T10:05:01Z

☀️ Test successful - checks-actions
Approved by: workingjubilee
Pushing 8b1ba11 to master...

rust-timer · 2023-12-11T11:28:00Z

Finished benchmarking commit (8b1ba11): comparison URL.

Overall result: no relevant changes - no action needed

@rustbot label: -perf-regression

Instruction count

This benchmark run did not return any relevant results for this metric.

Max RSS (memory usage)

This benchmark run did not return any relevant results for this metric.

Cycles

This benchmark run did not return any relevant results for this metric.

Binary size

This benchmark run did not return any relevant results for this metric.

Bootstrap: 669.249s -> 669.313s (0.01%)
Artifact size: 314.19 MiB -> 314.17 MiB (-0.01%)

calebzulawski · 2023-12-11T13:01:10Z

Thanks :)

…nsics, r=workingjubilee Make repr(packed) vectors work with SIMD intrinsics In rust-lang#117116 I fixed `#[repr(packed, simd)]` by doing the expected thing and removing padding from the layout. This should be the last step in providing a solution to rust-lang/portable-simd#319

…rinsics, r=workingjubilee Make repr(packed) vectors work with SIMD intrinsics In rust-lang#117116 I fixed `#[repr(packed, simd)]` by doing the expected thing and removing padding from the layout. This should be the last step in providing a solution to rust-lang/portable-simd#319

Rollup merge of rust-lang#125311 - calebzulawski:repr-packed-simd-intrinsics, r=workingjubilee Make repr(packed) vectors work with SIMD intrinsics In rust-lang#117116 I fixed `#[repr(packed, simd)]` by doing the expected thing and removing padding from the layout. This should be the last step in providing a solution to rust-lang/portable-simd#319

…, r=calebzulawski Test codegen for `repr(packed,simd)` -> `repr(simd)` This adds the codegen test originally requested in rust-lang#117116 but exploiting the collection of features in FileCheck and compiletest to make it more resilient to expectations being broken by optimization levels. Mostly by presetting optimization levels for each revision of the tests. I do not think the dereferenceable attribute's presence or absence is that important. r? `@calebzulawski`

RalfJung · 2024-06-08T14:53:40Z

tests/ui/simd/repr_packed.rs

+#![allow(non_camel_case_types)]
+
+#[repr(simd, packed)]
+struct Simd<T, const N: usize>([T; N]);


From how packed usually works, I would expect this to mean that the type has alignment 1. But that doesn't seem to be the case, instead the alignment is the largest possible for the size, or something like that?

What happens with packed(N)?

Would be good to have the interaction of simd and packed documented somewhere.

RalfJung · 2024-06-08T15:47:49Z

FWIW codegen has support for using different LLVM types in by-val vs by-ref situations: specifically, bool is i8 in memory but i1 as an SSA local. Maybe something similar could be done for these packed simd types?

RalfJung · 2024-06-08T15:58:09Z

compiler/rustc_codegen_llvm/src/intrinsic.rs

+        if ty.is_simd() && !matches!(arg.val, OperandValue::Immediate(_)) {
+            return_error!(InvalidMonomorphization::SimdArgument { span, name, ty: *ty });
+        }
+    }


This comment seems no longer accurate... maybe that got changed by #125311? Doing simd_mul etc on a packed SIMD type works just fine now. Even in debug builds this generates the code one would hope for.

rustbot assigned oli-obk Oct 24, 2023

rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Oct 24, 2023

programmerjake reviewed Oct 24, 2023

View reviewed changes

calebzulawski closed this Nov 15, 2023

calebzulawski reopened this Nov 15, 2023

This comment has been minimized.

Sign in to view

calebzulawski force-pushed the repr-simd-packed branch from 1a4f00e to 6ba6447 Compare November 18, 2023 17:11

This comment has been minimized.

Sign in to view

calebzulawski force-pushed the repr-simd-packed branch from 6ba6447 to 803623e Compare November 18, 2023 18:36

bors added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Dec 9, 2023

Remove codegen test that depends on optimizations

aa00bae

bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Dec 11, 2023

bors added the merged-by-bors This PR was explicitly merged by bors. label Dec 11, 2023

bors merged commit 8b1ba11 into rust-lang:master Dec 11, 2023
12 checks passed

rustbot added this to the 1.76.0 milestone Dec 11, 2023

workingjubilee added A-simd Area: SIMD (Single Instruction Multiple Data) PG-portable-simd Project group: Portable SIMD (https://github.com/rust-lang/project-portable-simd) labels Dec 14, 2023

calebzulawski mentioned this pull request May 20, 2024

Make repr(packed) vectors work with SIMD intrinsics #125311

Merged

workingjubilee mentioned this pull request Jun 2, 2024

Test codegen for repr(packed,simd) -> repr(simd) #125904

Merged

RalfJung reviewed Jun 8, 2024

View reviewed changes

RalfJung mentioned this pull request Jun 8, 2024

Miri ICEs on non-power-of-2 non-packed SIMD vectors rust-lang/miri#3458

Closed

Implement repr(packed) for repr(simd) #117116

Implement repr(packed) for repr(simd) #117116

Conversation

calebzulawski commented Oct 24, 2023

rustbot commented Oct 24, 2023

scottmcm commented Oct 24, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

programmerjake Oct 25, 2023 • edited Loading

Choose a reason for hiding this comment

calebzulawski commented Oct 24, 2023

calebzulawski commented Oct 24, 2023

workingjubilee commented Oct 24, 2023

calebzulawski commented Oct 24, 2023

programmerjake commented Oct 24, 2023

programmerjake commented Oct 24, 2023

calebzulawski commented Oct 24, 2023

workingjubilee commented Oct 24, 2023

calebzulawski commented Oct 24, 2023 • edited Loading

workingjubilee commented Oct 25, 2023

workingjubilee commented Oct 25, 2023

workingjubilee commented Oct 25, 2023 • edited Loading

calebzulawski commented Oct 25, 2023

workingjubilee commented Nov 14, 2023 • edited Loading

calebzulawski commented Nov 15, 2023 • edited Loading

calebzulawski commented Nov 15, 2023

calebzulawski commented Nov 18, 2023

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

bors commented Dec 9, 2023

workingjubilee commented Dec 9, 2023

calebzulawski commented Dec 10, 2023

workingjubilee commented Dec 11, 2023

bors commented Dec 11, 2023

bors commented Dec 11, 2023

bors commented Dec 11, 2023

rust-timer commented Dec 11, 2023

Overall result: no relevant changes - no action needed

calebzulawski commented Dec 11, 2023

Choose a reason for hiding this comment

RalfJung commented Jun 8, 2024

Choose a reason for hiding this comment

programmerjake Oct 25, 2023 •

edited

Loading

calebzulawski commented Oct 24, 2023 •

edited

Loading

workingjubilee commented Oct 25, 2023 •

edited

Loading

workingjubilee commented Nov 14, 2023 •

edited

Loading

calebzulawski commented Nov 15, 2023 •

edited

Loading