Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rustc_trans: rewrite mips64 ABI code #47964

Merged
merged 4 commits into from
Feb 25, 2018
Merged

Conversation

jcowgill
Copy link
Contributor

@jcowgill jcowgill commented Feb 2, 2018

This PR rewrites the ABI handling code for 64-bit MIPS and should fix various FFI issues including #47290.

To accomodate the 64-bit ABI I have had to add a new CastTarget variant which I've called Chunked (though maybe this isn't the best name). This allows an ABI to cast to some arbitrary structure of Reg types. This is required on MIPS which might need to cast to a structure containing a mixture of i64 and f64 types.

@rust-highfive
Copy link
Collaborator

Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @estebank (or someone else) soon.

If any changes to this PR are deemed necessary, please add them as extra commits. This ensures that the reviewer can see what has changed since they last reviewed the code. Due to the way GitHub handles out-of-date commits, this should also make it reasonably obvious what issues have or haven't been addressed. Large or tricky changes may require several passes of review and changes.

Please see the contribution instructions for more information.

@Mark-Simulacrum
Copy link
Member

r? @eddyb

@eddyb
Copy link
Member

eddyb commented Feb 2, 2018

Does clang need to do this? How many fields is that struct limited to? I'd rather not add the general case here.

chunks.push(bits_to_int_reg(final_int_bits));
}

arg.cast_to(CastTarget::Chunked(chunks));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, you don't need this! This is literally what Uniform { unit: Reg::i64(), total: trailing_int_size } does!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's only equivalent to Uniform if chunks is empty when entering the else block.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah sorry I didn't see it came from more code above 😞.

@shepmaster shepmaster added the S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. label Feb 3, 2018
@jcowgill
Copy link
Contributor Author

jcowgill commented Feb 3, 2018

Yes clang needs to do this. If you pass the structure directly to LLVM, it will allocate 1 register per field instead of merging them together into 64-bit chunks which is what happens here. There is no limit to the size of structures passed using this method. For example, if a function takes one structure argument containing 8 i64 and 8 f64 in any order, the entire structure will be passed in registers. Any extra fields get pushed onto the stack (after the early fields have been put in registers).


// We only care about aligned doubles
if let layout::Abi::Scalar(ref scalar) = field.abi {
if let layout::F64 = scalar.value {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How come this isn't recursive?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As in why doesn't it recurse into structures within structures?

I originally thought this from reading the ABI document, but it's a bit ambiguous and it seems the ABI is intentionally not recursive. Eg this structure is passed in an integer register by clang and gcc even though it is an aligned double.

struct test {
  struct {
    double x;
  };
};

@eddyb
Copy link
Member

eddyb commented Feb 3, 2018

@jcowgill So only a prefix actually gets put into registers? How many registers are there to use? (is this something we can rely on, like we do on x64?)

@jcowgill
Copy link
Contributor Author

jcowgill commented Feb 4, 2018

Yes only a prefix is put into registers. My earlier comment about "8 i64 and 8 f64" was wrong - only the first 8 "chunks" are put into registers (for a total of 8 registers of either type being used). Everything after the 64th byte will be put on the stack.

@eddyb
Copy link
Member

eddyb commented Feb 4, 2018

@jcowgill Thanks for investigating this! I'm still not super eager to merge the version using a Vec, but maybe there's a formulation that can also cheaply supersede the CastTarget::Pair x64 uses.

How do you feel about ChunkedPrefix { prefix: [RegKind; 8], chunk: Size, total: Size }?

I might also want to maybe move towards a model that better match hardware register sizes, so we wouldn't need to store the 64-bit size of the used registers (because they can't be anything else).

@jcowgill
Copy link
Contributor Author

jcowgill commented Feb 5, 2018

How do you feel about ChunkedPrefix { prefix: [RegKind; 8], chunk: Size, total: Size }?

I think that would work, although it is a bit more complicated. Would this be implemented using a separate structure the same way Uniform is?

If I understand correctly, these would then also be equivalent?
Uniform { unit: unit, total: total }
ChunkedPrefix { prefix: [unit.kind; 8], chunk: unit.size, total: total }

@eddyb
Copy link
Member

eddyb commented Feb 5, 2018

I meant only a CastTarget variant (with named fields), not separate structure. To also combine the CastTarget::Uniform variant into this one you'd want to replace total: Sized with rest: Uniform.

@jcowgill
Copy link
Contributor Author

jcowgill commented Feb 5, 2018

I think my equivalence is wrong anyway since it doesn't work for floats. I don't like rest: Uniform because it would have to store the total size for the entire cast target instead of the size of the "rest" part.

If ChunkedPrefix is supposed to only be a variant with named fields, I'm not sure why Uniform gets its own separate structure.

I'll have a go at implementing this...

@estebank estebank assigned eddyb and unassigned estebank Feb 5, 2018
@eddyb
Copy link
Member

eddyb commented Feb 5, 2018

@jcowgill I thought Uniform was separate because it was used elsewhere but if that's not true at all we can probably inline its methods into CastTarget's?

@jcowgill
Copy link
Contributor Author

jcowgill commented Feb 8, 2018

I've pushed a new version using ChunkedPrefix.

cx.data_layout().aggregate_align
.max(Reg { kind: RegKind::Integer, size: chunk }.align(cx))
.max(Reg { kind: RegKind::Float, size: chunk }.align(cx))
.max(Reg { kind: RegKind::Vector, size: chunk }.align(cx))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather iterate the array and set 3 bool flags - some of these might not be used.

let dl = &cx.tcx.data_layout;
let size = arg.layout.size;
let align = arg.layout.align.max(dl.i32_align).min(dl.i64_align);

if arg.layout.is_aggregate() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should handle the simple (non-aggregate) case first with an early return, to avoid the indentation in the aggregate case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

use rustc::ty::layout::Size;
fn extend_integer_width_mips(arg: &mut ArgType, bits: u64) {
// This is like ArgType::extend_integer_with_to, but always sign extends
// integers >= 32 bits as required by the n64 ABI
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is pretty confusing - so specifically, u32 is the only value that gets passed differently? Can you use extend_integer_with_to except in the u32 case? That way it's more obvious where the exception lies.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok I've adjusted the code to do that.

},
_ => None
}
}).collect();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you do this without a Vec? e.g. you can have a variable with the mutable iterator and call .next() three times, checking for combinations of Some and None.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've refactored the float reg testing code into a separate function which I call manually for the different numbers of fields. I think it looks a bit better.


if ret.layout.fields.count() == float_regs.len() {
if float_regs.len() == 1 {
ret.cast_to(Uniform { unit: float_regs[0], total: float_regs[0].size });
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have conversion impls so you can specifically do ret.cast_to(float_regs[0]).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

// Insert enough integers to cover [last_offset, offset)
assert!(last_offset.is_abi_aligned(dl.f64_align));
for _ in 0..((offset - last_offset).bits() / 64) {
chunks.push(RegKind::Integer);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can keep an index into the final array to avoid allocating a Vec.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

total: size
});
if !offset.is_abi_aligned(align) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no alignment padding anymore? I'm not even sure what clang does wrt that for other targets - especially since there's a stackalign attribute now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is correct. In 32-bit MIPS, we have to pad 64-bit aligned structures manually because we cast to an i32 Uniform so LLVM doesn't know that the structure is misaligned. In 64-bit MIPS we always use i64 chunks so this issue doesn't arise.

@@ -407,7 +407,8 @@ impl<'tcx> LayoutExt<'tcx> for TyLayout<'tcx> {
#[derive(Clone, Copy, PartialEq, Eq, Debug)]
pub enum CastTarget {
Uniform(Uniform),
Pair(Reg, Reg)
Pair(Reg, Reg),
ChunkedPrefix { prefix: [RegKind; 8], chunk: Size, total: Size }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really starting to wonder if we can turn all 3 variants into one (such that CastTarget becomes a struct).

If the prefix array uses Option<RegKind> instead to indicate that not all 8 slots are full, then we can replace total with rest: Uniform and have it represent strictly what's after the prefix.

And then Pair(a, b) can become prefix: [Some(a.kind), None, ..., None], chunk: a.size, rest: Uniform::from(b).

Oh and chunk makes more sense as prefix_chunk.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've attempted to implement this (in separate commits for the moment). It seems to work ok.

@pietroalbini pietroalbini added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Feb 14, 2018
@pietroalbini
Copy link
Member

@eddyb the user pushed some commits that seems to have fixed your concerns, can you review this again?

// Simplify to a single unit when there is no prefix and size <= unit size
if self.rest.total <= self.rest.unit.size {
return rest_ll_unit;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does Uniform need to be a separate struct at this point? Probably not, but we can revisit later.

@eddyb
Copy link
Member

eddyb commented Feb 22, 2018

Apologies for the delay, still digging through my notifications. This new CastTarget looks great!

@bors r+

@bors
Copy link
Contributor

bors commented Feb 22, 2018

📌 Commit 47c33f7 has been approved by eddyb

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Feb 22, 2018
Manishearth added a commit to Manishearth/rust that referenced this pull request Feb 25, 2018
kennytm added a commit to kennytm/rust that referenced this pull request Feb 25, 2018
rustc_trans: rewrite mips64 ABI code

This PR rewrites the ABI handling code for 64-bit MIPS and should fix various FFI issues including rust-lang#47290.

To accomodate the 64-bit ABI I have had to add a new `CastTarget` variant which I've called `Chunked` (though maybe this isn't the best name). This allows an ABI to cast to some arbitrary structure of `Reg` types. This is required on MIPS which might need to cast to a structure containing a mixture of `i64` and `f64` types.
bors added a commit that referenced this pull request Feb 25, 2018
Rollup of 17 pull requests

- Successful merges: #47964, #47970, #48076, #48115, #48166, #48281, #48297, #48302, #48362, #48369, #48489, #48491, #48494, #48517, #48529, #48235, #48330
- Failed merges:
@bors bors merged commit 47c33f7 into rust-lang:master Feb 25, 2018
@gnzlbg
Copy link
Contributor

gnzlbg commented Feb 28, 2018

@jcowgill I see you have a bunch of MIPS-related PRs. We are currently having trouble implementing MIPS SIMD intrinsics in coresimd (we can't compile binaries with the msa feature enabled). Let me know if you would be interested to help with that.

@jcowgill
Copy link
Contributor Author

@gnzlbg I can probably help with that - assuming I find some time :)
I see this issue (maybe we could move to it): rust-lang/stdarch#170. Is there any existing mips work?

@gnzlbg
Copy link
Contributor

gnzlbg commented Feb 28, 2018

@jcowgill this would be a start: #46310

rust-lang/stdarch#170. Is there any existing mips work?

I started implementing those intrinsics, but because of the issue above, did not progress much. Maybe once that issue is solved, implementing the intrinsics will "just work".

@jcowgill jcowgill deleted the mips64-abi branch March 8, 2018 17:33
jcowgill added a commit to jcowgill/rust that referenced this pull request Mar 9, 2018
Since rust-lang#47964 was merged, 64-bit mips started passing all structures
using 64-bit chunks regardless of their contents. The
repr-transparent-aggregates tests needs updating to cope with this.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants