Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect optimization calling function with __m256d parameters #57427

Closed
dbdr opened this issue Jan 7, 2019 · 2 comments
Closed

Incorrect optimization calling function with __m256d parameters #57427

dbdr opened this issue Jan 7, 2019 · 2 comments

Comments

@dbdr
Copy link

dbdr commented Jan 7, 2019

The following code essentially calls _mm256_cmp_pd(a, b, _CMP_NEQ_UQ) to test four f64 packed as a __m256d for inequality. In particular, it should therefore return all zeros when called with the same value for a and b. This works for opt-level 0 to 2 but fails with opt-level=3.

$  rustc -g -C opt-level=2 avx.rs && ./avx 
$  rustc -g -C opt-level=3 avx.rs && ./avx 
thread 'main' panicked at 'assertion failed: `(left == right)`
  left: `"__m256d(0.0, 0.0, 0.0, 0.0)"`,
 right: `"__m256d(NaN, NaN, 0.0, 0.0)"`', avx.rs:37:2
note: Run with `RUST_BACKTRACE=1` for a backtrace.
$ rustc --version
rustc 1.31.1 (b6c32da9b 2018-12-18)

Test code:

#[cfg(target_arch = "x86")]                                                                                                                                                               
use std::arch::x86::*;                                                                                                                                                                    
#[cfg(target_arch = "x86_64")]                                                                                                                                                            
use std::arch::x86_64::*;  

#[cfg_attr(any(target_arch = "x86", target_arch = "x86_64"), target_feature(enable = "avx"))]
unsafe fn avx_pack(v0: f64, v1: f64, v2: f64, v3: f64) -> __m256d {
	_mm256_set_pd(v0, v1, v2, v3)
}

fn pack(v0: f64, v1: f64, v2: f64, v3: f64) -> __m256d {
	if is_x86_feature_detected!("avx") {
		return unsafe {
			avx_pack(v0, v1, v2, v3)
		}
	}
	panic!("Unsupported");
}

#[cfg_attr(any(target_arch = "x86", target_arch = "x86_64"), target_feature(enable = "avx"))]
unsafe fn avx_cmp(a: __m256d, b: __m256d) -> __m256d {
	_mm256_cmp_pd(a, b, _CMP_NEQ_UQ)
}

fn cmp(a: __m256d, b: __m256d) -> __m256d {
	if is_x86_feature_detected!("avx") {
		return unsafe {
			avx_cmp(a, b)
		}
	}
	panic!("Unsupported");
}

fn main() {
	let p = pack(1.0, 2.0, 3.0, 4.0);
	let eq = cmp(p, p);
	assert_eq!("__m256d(0.0, 0.0, 0.0, 0.0)", format!("{:?}", eq));
}

Looking at the generated code (opt-level=3) using objdump, the avx_cmp function is compiled as:

0000000000009d20 <avx::avx_cmp::h56baa3e4f2dfd19e>:
    9d20:       c5 fd c2 c1 04          vcmpneqpd %ymm1,%ymm0,%ymm0
    9d25:       c5 fd 29 07             vmovapd %ymm0,(%rdi)
    9d29:       c5 f8 77                vzeroupper 
    9d2c:       c3                      retq   
    9d2d:       0f 1f 00                nopl   (%rax)

So it expects its arguments in registers ymm0 and ymm1, while nothing in the code sets them. The call site has:

    9dc4:       48 8d 9c 24 e0 00 00    lea    0xe0(%rsp),%rbx
    9dcb:       00 
    9dcc:       48 89 df                mov    %rbx,%rdi
    9dcf:       0f 28 d0                movaps %xmm0,%xmm2
    9dd2:       0f 28 d9                movaps %xmm1,%xmm3
    9dd5:       e8 46 ff ff ff          callq  9d20 <avx::avx_cmp::h56baa3e4f2dfd19e>
@parched
Copy link
Contributor

parched commented Jan 7, 2019

I think this maybe a dup of #50154

@dbdr
Copy link
Author

dbdr commented Jan 8, 2019

Thanks @parched, it does look like a dup of #50154. I confirm that if I use target_feature(enable = "avx") on the caller the bug does not occur. I'm closing.

@dbdr dbdr closed this as completed Jan 8, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants