Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clang i386/i486/i586/i686 targets inaccurate #61347

Closed
Fierelier opened this issue Mar 11, 2023 · 34 comments
Closed

Clang i386/i486/i586/i686 targets inaccurate #61347

Fierelier opened this issue Mar 11, 2023 · 34 comments
Labels
backend:X86 question A question, not bug report. Check out https://llvm.org/docs/GettingInvolved.html instead!

Comments

@Fierelier
Copy link

I have been trying to discuss the definition of i686 for Rust onto Pentium Pro, see my reasoning here: rust-lang/rust#82435 (comment) -- In the answer to this, I was told that clang implies Pentium 4 for i686. See their godbolt: https://llvm.godbolt.org/z/PP69efvfx -- This inconsistency is one of the reasons keeping the Rust team from changing the definition.

For the same reasoning as in my post, I think i686 should imply Pentium Pro, not Pentium 4. Clang is seemingly aware of this (See: https://clang.llvm.org/doxygen/Basic_2Targets_2X86_8cpp_source.html ), but it still uses a Pentium 4 cpu-target in the end for some reason.

@Fierelier Fierelier changed the title Clang i686-unknown-linux-gnu target should implies pentium4, should be pentiumpro Clang i686-unknown-linux-gnu target implies pentium4, should be pentiumpro Mar 11, 2023
@llvmbot
Copy link
Collaborator

llvmbot commented Mar 11, 2023

@llvm/issue-subscribers-backend-x86

@topperc
Copy link
Collaborator

topperc commented Mar 11, 2023

In LLVM and Clang i686 in the triple is treated the same as i586, i486, and i386. See this code in Triple.cpp

static Triple::ArchType parseArch(StringRef ArchName) {                          
  auto AT = StringSwitch<Triple::ArchType>(ArchName)                             
    .Cases("i386", "i486", "i586", "i686", Triple::x86) 

They all mean 32-bit x86.

The selection of "pentium4" is done by this code

based on OS being Linux and triple being 32-bit.

There is a separate -march=i686 option that can override the default.

@Fierelier
Copy link
Author

Fierelier commented Mar 11, 2023

Why isn't the triple more accurate? Why override the CPU at all?

EDIT: Actually, perhaps I should contact the rust devs instead, since I suspect distros would set the appropriate flags if they can? This is a big mess... Having this set straight would help regardless though, I'm sure.

@topperc
Copy link
Collaborator

topperc commented Mar 11, 2023

Why isn't the triple more accurate? Why override the CPU at all?

EDIT: Actually, perhaps I should contact the rust devs instead, since I suspect distros would set the appropriate flags if they can? This is a big mess... Having this set straight would help regardless though, I'm sure.

The first part of the triple isn't a CPU, it's an architecture. We don't allow arbitrary CPUs there. I think it comes from GNU tools. gcc's autoconf for building uses triples I don't know if gcc makes a distinction between i386, i486, i586, i686 in autoconf.

@Fierelier
Copy link
Author

Fierelier commented Mar 12, 2023

Right, I'm aware it isn't a CPU, it's just that LLVM compiles for the Pentium 4 CPU if i686 is chosen, like you've said, which is not the minimum for i686 (Pentium Pro), which should be targeted instead in my opinion because.. Let's take a modern-day example of an x86-64 CPU:
I compile binaries on my intel 8000 series CPU, and a few years later the 13000 series comes out. Suddenly the x86-64 spec has been upgraded to include new instruction sets, and binaries compiled in x86-64 suddenly no longer work on my intel 8000 series -- That would be really annoying

Text in cursive is theoretical, not an actual issue. Assuming x86-64 doesn't do that, why should i686? https://en.wikipedia.org/wiki/P6_(microarchitecture)

@topperc
Copy link
Collaborator

topperc commented Mar 12, 2023

Clang also uses pentium4 for Linux if the triple is i386, i486, or i586. They're all treated as meaning 32-bit. What is the default CPU for gcc when built for i686 or when compiling with -m32?

@Fierelier
Copy link
Author

The only thing I could find was this:
https://gcc.gnu.org/onlinedocs/gcc/x86-Options.html
"The -m32 option sets int, long, and pointer types to 32 bits, and generates code that runs on any i386 system."

@Fierelier
Copy link
Author

I think GCC thinks of i686-*-* and even i786-*-* as Pentium Pro: https://gcc.gnu.org/git/?p=gcc.git;a=blob;f=gcc/config.gcc;h=f986224817a12888e51be3873b6f230ea900bccb;hb=HEAD#l3789

@phoebewang
Copy link
Contributor

How about just make i686- and i786 override the OS part? https://reviews.llvm.org/D145857

@Fierelier
Copy link
Author

Fierelier commented Mar 13, 2023

I think that looks in-line with gcc for i686/i786, yes. But as said in #61347 (comment) this issue also affects i386, i486 and i586. I'm not sure what llvm compiles down to (and in the case of Linux, i486 might get dropped if it hasn't already). Either way, this looks like a good first step to me, at the very least.

@Fierelier Fierelier changed the title Clang i686-unknown-linux-gnu target implies pentium4, should be pentiumpro Clang i386/i486/i586/i686 targets inaccurate Mar 13, 2023
@Fierelier
Copy link
Author

Fierelier commented Mar 13, 2023

Since -m32 on gcc seems to be i386 (further research may be required!), maybe the fallback should just be changed to i386, or whatever the minimum is LLVM can compile to. I'm not so sure if you should adapt that i786 means Pentium Pro, since i786 is actually Pentium 4 technically, gcc just considers it not as such for some reason. It depends on how much you wanna align with gcc.

@Fierelier
Copy link
Author

Fierelier commented Mar 13, 2023

Maybe like this, assuming i386 is supported:

  switch (Triple.getOS()) {
  case llvm::Triple::NetBSD:
    return "i486";
  case llvm::Triple::Haiku:
  case llvm::Triple::OpenBSD:
    return "i586";
  case llvm::Triple::FreeBSD:
    return "i686";
  default:
    if (Triple.getArchName() == "i486")
      return "i486";
    if (Triple.getArchName() == "i586")
      return "i586";
    if (Triple.getArchName() == "i686")
      return "i686";
    if (Triple.getArchName() == "i786")
      return "i686"; // Basically what gcc does. Alternatively "pentium4", for technical correctness.
    return "i386";
  }

Or similar to how phoebewang suggested, which could allow more control in niche use-cases with custom kernels:

  if (Triple.getArchName() == "i486")
    return "i486";
  if (Triple.getArchName() == "i586")
    return "i586";
  if (Triple.getArchName() == "i686")
    return "i686";
  if (Triple.getArchName() == "i786")
    return "i686"; // Basically what gcc does. Alternatively "pentium4", for technical correctness.
  
  switch (Triple.getOS()) {
  case llvm::Triple::NetBSD:
    return "i486";
  case llvm::Triple::Haiku:
  case llvm::Triple::OpenBSD:
    return "i586";
  case llvm::Triple::FreeBSD:
    return "i686";
  default:
    return "i386";
  }

@topperc
Copy link
Collaborator

topperc commented Mar 13, 2023

-m32 on gcc seems to have #define __SSE2__ 1 https://godbolt.org/z/GezGEcEoT

@Fierelier
Copy link
Author

Fierelier commented Mar 13, 2023

Then I would suggest this:

  if (Triple.getArchName() == "i386")
    return "i386";
  if (Triple.getArchName() == "i486")
    return "i486";
  if (Triple.getArchName() == "i586")
    return "i586";
  if (Triple.getArchName() == "i686")
    return "i686";
  if (Triple.getArchName() == "i786")
    return "i686"; // Basically what gcc does. Alternatively "pentium4", for technical correctness.
  
  switch (Triple.getOS()) {
  case llvm::Triple::NetBSD:
    return "i486";
  case llvm::Triple::Haiku:
  case llvm::Triple::OpenBSD:
    return "i586";
  case llvm::Triple::FreeBSD:
    return "i686";
  default:
    return "pentium4";
  }

@topperc
Copy link
Collaborator

topperc commented Mar 13, 2023

Then I would suggest this:

  if (Triple.getArchName() == "i486")
    return "i486";
  if (Triple.getArchName() == "i586")
    return "i586";
  if (Triple.getArchName() == "i686")
    return "i686";
  if (Triple.getArchName() == "i786")
    return "i686"; // Basically what gcc does. Alternatively "pentium4", for technical correctness.
  
  switch (Triple.getOS()) {
  case llvm::Triple::NetBSD:
    return "i486";
  case llvm::Triple::Haiku:
  case llvm::Triple::OpenBSD:
    return "i586";
  case llvm::Triple::FreeBSD:
    return "i686";
  default:
    return "pentium4";
  }

So "i386" would be "pentium4" but the others would be there respective CPUs?

That doesn't sound any closer to solving the rust problem. Since rust maps i386 to i386.

@Fierelier
Copy link
Author

Sorry, I've edited the post since you responded.

@topperc
Copy link
Collaborator

topperc commented Mar 13, 2023

Sorry, I've edited the post since you responded.

So then there's no way to get pentium4? Those are the only legal strings except i886 and i986 which don't make sense.

@Fierelier
Copy link
Author

You are right. I suppose then one strive away from GCC here, and make i786 a pentium4, by removing the i786 check:

  if (Triple.getArchName() == "i386")
    return "i386";
  if (Triple.getArchName() == "i486")
    return "i486";
  if (Triple.getArchName() == "i586")
    return "i586";
  if (Triple.getArchName() == "i686")
    return "i686";
  
  switch (Triple.getOS()) {
  case llvm::Triple::NetBSD:
    return "i486";
  case llvm::Triple::Haiku:
  case llvm::Triple::OpenBSD:
    return "i586";
  case llvm::Triple::FreeBSD:
    return "i686";
  default:
    return "pentium4"; // i786
  }

@phoebewang
Copy link
Contributor

-m32 on Clang is i386 too. https://godbolt.org/z/WGaMbfKo5
I think we just need to override under i386. Updated in D145857

@Fierelier
Copy link
Author

-m32 implies i386-*-* target triple?

@topperc
Copy link
Collaborator

topperc commented Mar 13, 2023

-m32 implies i386-*-* target triple?

Theres really in one 32-bit x86 in the arch enum in Triple.h. -m32 picks the enum value. The enum value is mapped to the i386 string.

@Fierelier
Copy link
Author

Fierelier commented Mar 13, 2023

Maybe the more correct solution would be to have -m32 map to i786, and to use something closer to #61347 (comment) instead, so compiling with an i386-*-* target actually makes i386 compatible binaries?

@topperc
Copy link
Collaborator

topperc commented Mar 13, 2023

Maybe the more correct solution would be to have -m32 map to i786, and to use something closer to #61347 (comment) instead, so compiling with an i386-*-* target actually makes i386 compatible binaries?

That would have to be an LLVM change to Triple.cpp and Triple.h. We would need to split the x86 arch in the enum. But I really worry that will have impact to other projects.

@Fierelier
Copy link
Author

Okay, I suppose seeing as the i386, in my eyes, is pretty antique, and only useful with modern software in very limited ways, this is good enough.

@Fierelier
Copy link
Author

Though the i386 target would still result in programs not running on i486 - i686. I'm not sure yet about this one, but this is an improvement at least.

@topperc
Copy link
Collaborator

topperc commented Mar 13, 2023

Okay, I suppose seeing as the i386, in my eyes, is pretty antique, and only useful with modern software in very limited ways, this is good enough.

Llvm doesn't properly support i386. The 486 specific instructions aren't properly marked.

I'm not convinced that changing i686 won't break users. clang has behaved this way for at least a decade. To my knowledge no one has raised this issue before. It seems late to change this now given pentium 4 is over 20 years old.

I'm unconvinced by rust saying i686 picks pentium4 because clang does. Clang picks pentium4 for i386, i486 and i586 too but rust does not. How can i686 behavior be justified by clang behavior, but not the others?

@Fierelier
Copy link
Author

You know, that's correct, I will raise this argument to the issue.

@Fierelier
Copy link
Author

Fierelier commented Mar 13, 2023

Also, yes, changing these definitions will break things. But those things, as they are, are relying on, in my opinion, broken behavior. I personally think it's better to fix than to have this issue get even worse over the years, where the desire to fix it will become lower and lower.

@topperc
Copy link
Collaborator

topperc commented Mar 13, 2023

Why will it get worse over the years?

@Fierelier
Copy link
Author

Fierelier commented Mar 13, 2023

Since more and more people will adapt the viewpoint that i386 - i686 means Pentium 4, it will increasingly obfuscate the true meaning, and will make much software compiled for i686 not actually i686.

I'm not sure how much you value the i686 platform, especially the Pentium 3+, I personally value it a lot. The performance is good enough for light tasks like office and SD video. I wanna say, objectively, on their own, they're fast. And having them still be able to do some tasks is nice. There still is an enthusiastic community behind these, some people that want to uphold these architectures' usefulness.

Backwards compatibility is one of the reasons I really dig open source software and I'd hate to see that go. The compilers are the magic that make all of that work in the first place, and if they set a bad standard, people will adapt them, and it's more likely that history will repeat itself again in the future, because people are used to it.

@nikic
Copy link
Contributor

nikic commented Mar 13, 2023

@topperc Just to be clear, Rust will not be changing its i686-unknown-linux-gnu definition, and I would not recommend Clang to make any changes in this area either. The primary problem on the Rust side is backwards-compatibility as well (in particular because it's not possible to have tier 1 targets using broken-by-design x87 FP arithmetic), the inconsistent definition of i686 across different operating systems and compilers is just another confounding factor.

@Fierelier
Copy link
Author

@nikic i686 is a tier 1 target in the first place because it's assigned the wrong CPU. If i686 has broken floats, then i686 is not a tier 1 target.

They have no problem referring to P5 as i586, but they won't refer to P6 as i686. They're arbitrarily choosing these definitions. This entire naming scheme was based off of the CPU architecture. And it still should be, for reasons explained previously.

The compilers understand if you set -march, that i686 is Pentium Pro. Why would targets not refer to the same thing, if they are using the exact same names?

@workingjubilee
Copy link

Then Rust would add a new target for "i786" anyways, since we do have use for that target for various reasons, and remove the existing one, doing a lot of work to replace everything just to create agreement on a name that even the nearest other compiler, Clang, does not agree on.

@Fierelier Fierelier closed this as not planned Won't fix, can't repro, duplicate, stale May 10, 2023
@EugeneZelenko EugeneZelenko added the question A question, not bug report. Check out https://llvm.org/docs/GettingInvolved.html instead! label May 10, 2023
davidben added a commit to google/boringssl that referenced this issue Jan 31, 2024
Update-Note: Building for 32-bit x86 may require fixing your builds to
pass -msse2 to the compiler. This will also speed up the rest of the
code in your project. If your project needs to support the Pentium III,
please contact BoringSSL maintainers.

As far as I know, all our supported 32-bit x86 consumers require SSE2.
I think, in the past, I've asserted that our assembly skips SSE2
capability detection. Looking at it again, I don't think that's true.
OPENSSL_IA32_SSE2 means to enable runtime detection of SSE2, not
compile-time.

Additionally, I don't believe we have *ever* tested the non-SSE2
assembly codepaths. Also, now that we want to take the OPENSSL_ia32cap_P
accesses out of assembly, those runtime checks are problematic, as we'd
need to bifurcafe functions all the way down to bn_mul_words.

Unfortunately, the situation with compilers is... complicated. Ideally,
everyone would build with the equivalent of -msse2. 32-bit x86 is so
register-poor that running without SSE2 statically available seems
especially foolish. However, per
https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9868, while
Clang defaults to enabling SSE2, GCC does not.

We once broke gRPC's build, in
grpc/grpc#17540, by inadvertently assuming
SSE2. In that discussion, gRPC maintainers were okay requiring Pentium 4
as the minimum CPU, but it's unclear if they actually changed their
build. That discussion also said GCC 8 assumes SSE2, but I'm not able to
reproduce this.

LLVM does indeed interpret "i686" as implying SSE2:
llvm/llvm-project#61347
rust-lang/rust#82435

However, Debian LLVM does *not*. Debian carries a patch to turn this
off!
https://salsa.debian.org/pkg-llvm-team/llvm-toolchain/-/blob/snapshot/debian/patches/disable-sse2-old-x86.diff?ref_type=heads

Meanwhile, Fedora fixed their baseline back in 2018.
https://fedoraproject.org/wiki/Changes/Update_i686_architectural_baseline_to_include_SSE2

So let's start by detecting builds that forgot to pass -msse2 and see if
we can get them fixed. If this sticks, I'll follow up by unwinding all
the SSE2 branches.

Bug: 673
Change-Id: I851184b358aaae2926c3e3fe618f3155e71c2f71
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/65875
Reviewed-by: Bob Beck <bbe@google.com>
Commit-Queue: David Benjamin <davidben@google.com>
justsmth pushed a commit to justsmth/aws-lc that referenced this issue Aug 26, 2024
Update-Note: Building for 32-bit x86 may require fixing your builds to
pass -msse2 to the compiler. This will also speed up the rest of the
code in your project. If your project needs to support the Pentium III,
please contact BoringSSL maintainers.

As far as I know, all our supported 32-bit x86 consumers require SSE2.
I think, in the past, I've asserted that our assembly skips SSE2
capability detection. Looking at it again, I don't think that's true.
OPENSSL_IA32_SSE2 means to enable runtime detection of SSE2, not
compile-time.

Additionally, I don't believe we have *ever* tested the non-SSE2
assembly codepaths. Also, now that we want to take the OPENSSL_ia32cap_P
accesses out of assembly, those runtime checks are problematic, as we'd
need to bifurcafe functions all the way down to bn_mul_words.

Unfortunately, the situation with compilers is... complicated. Ideally,
everyone would build with the equivalent of -msse2. 32-bit x86 is so
register-poor that running without SSE2 statically available seems
especially foolish. However, per
https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9868, while
Clang defaults to enabling SSE2, GCC does not.

We once broke gRPC's build, in
grpc/grpc#17540, by inadvertently assuming
SSE2. In that discussion, gRPC maintainers were okay requiring Pentium 4
as the minimum CPU, but it's unclear if they actually changed their
build. That discussion also said GCC 8 assumes SSE2, but I'm not able to
reproduce this.

LLVM does indeed interpret "i686" as implying SSE2:
llvm/llvm-project#61347
rust-lang/rust#82435

However, Debian LLVM does *not*. Debian carries a patch to turn this
off!
https://salsa.debian.org/pkg-llvm-team/llvm-toolchain/-/blob/snapshot/debian/patches/disable-sse2-old-x86.diff?ref_type=heads

Meanwhile, Fedora fixed their baseline back in 2018.
https://fedoraproject.org/wiki/Changes/Update_i686_architectural_baseline_to_include_SSE2

So let's start by detecting builds that forgot to pass -msse2 and see if
we can get them fixed. If this sticks, I'll follow up by unwinding all
the SSE2 branches.

Bug: 673
Change-Id: I851184b358aaae2926c3e3fe618f3155e71c2f71
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/65875
Reviewed-by: Bob Beck <bbe@google.com>
Commit-Queue: David Benjamin <davidben@google.com>
(cherry picked from commit 56d3ad9d23bc130aa9404bfdd1957fe81b3ba498)
justsmth pushed a commit to justsmth/aws-lc that referenced this issue Aug 26, 2024
Update-Note: Building for 32-bit x86 may require fixing your builds to
pass -msse2 to the compiler. This will also speed up the rest of the
code in your project. If your project needs to support the Pentium III,
please contact BoringSSL maintainers.

As far as I know, all our supported 32-bit x86 consumers require SSE2.
I think, in the past, I've asserted that our assembly skips SSE2
capability detection. Looking at it again, I don't think that's true.
OPENSSL_IA32_SSE2 means to enable runtime detection of SSE2, not
compile-time.

Additionally, I don't believe we have *ever* tested the non-SSE2
assembly codepaths. Also, now that we want to take the OPENSSL_ia32cap_P
accesses out of assembly, those runtime checks are problematic, as we'd
need to bifurcafe functions all the way down to bn_mul_words.

Unfortunately, the situation with compilers is... complicated. Ideally,
everyone would build with the equivalent of -msse2. 32-bit x86 is so
register-poor that running without SSE2 statically available seems
especially foolish. However, per
https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9868, while
Clang defaults to enabling SSE2, GCC does not.

We once broke gRPC's build, in
grpc/grpc#17540, by inadvertently assuming
SSE2. In that discussion, gRPC maintainers were okay requiring Pentium 4
as the minimum CPU, but it's unclear if they actually changed their
build. That discussion also said GCC 8 assumes SSE2, but I'm not able to
reproduce this.

LLVM does indeed interpret "i686" as implying SSE2:
llvm/llvm-project#61347
rust-lang/rust#82435

However, Debian LLVM does *not*. Debian carries a patch to turn this
off!
https://salsa.debian.org/pkg-llvm-team/llvm-toolchain/-/blob/snapshot/debian/patches/disable-sse2-old-x86.diff?ref_type=heads

Meanwhile, Fedora fixed their baseline back in 2018.
https://fedoraproject.org/wiki/Changes/Update_i686_architectural_baseline_to_include_SSE2

So let's start by detecting builds that forgot to pass -msse2 and see if
we can get them fixed. If this sticks, I'll follow up by unwinding all
the SSE2 branches.

Bug: 673
Change-Id: I851184b358aaae2926c3e3fe618f3155e71c2f71
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/65875
Reviewed-by: Bob Beck <bbe@google.com>
Commit-Queue: David Benjamin <davidben@google.com>
(cherry picked from commit 56d3ad9d23bc130aa9404bfdd1957fe81b3ba498)
justsmth pushed a commit to justsmth/aws-lc that referenced this issue Aug 27, 2024
Update-Note: Building for 32-bit x86 may require fixing your builds to
pass -msse2 to the compiler. This will also speed up the rest of the
code in your project. If your project needs to support the Pentium III,
please contact BoringSSL maintainers.

As far as I know, all our supported 32-bit x86 consumers require SSE2.
I think, in the past, I've asserted that our assembly skips SSE2
capability detection. Looking at it again, I don't think that's true.
OPENSSL_IA32_SSE2 means to enable runtime detection of SSE2, not
compile-time.

Additionally, I don't believe we have *ever* tested the non-SSE2
assembly codepaths. Also, now that we want to take the OPENSSL_ia32cap_P
accesses out of assembly, those runtime checks are problematic, as we'd
need to bifurcafe functions all the way down to bn_mul_words.

Unfortunately, the situation with compilers is... complicated. Ideally,
everyone would build with the equivalent of -msse2. 32-bit x86 is so
register-poor that running without SSE2 statically available seems
especially foolish. However, per
https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9868, while
Clang defaults to enabling SSE2, GCC does not.

We once broke gRPC's build, in
grpc/grpc#17540, by inadvertently assuming
SSE2. In that discussion, gRPC maintainers were okay requiring Pentium 4
as the minimum CPU, but it's unclear if they actually changed their
build. That discussion also said GCC 8 assumes SSE2, but I'm not able to
reproduce this.

LLVM does indeed interpret "i686" as implying SSE2:
llvm/llvm-project#61347
rust-lang/rust#82435

However, Debian LLVM does *not*. Debian carries a patch to turn this
off!
https://salsa.debian.org/pkg-llvm-team/llvm-toolchain/-/blob/snapshot/debian/patches/disable-sse2-old-x86.diff?ref_type=heads

Meanwhile, Fedora fixed their baseline back in 2018.
https://fedoraproject.org/wiki/Changes/Update_i686_architectural_baseline_to_include_SSE2

So let's start by detecting builds that forgot to pass -msse2 and see if
we can get them fixed. If this sticks, I'll follow up by unwinding all
the SSE2 branches.

Bug: 673
Change-Id: I851184b358aaae2926c3e3fe618f3155e71c2f71
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/65875
Reviewed-by: Bob Beck <bbe@google.com>
Commit-Queue: David Benjamin <davidben@google.com>
(cherry picked from commit 56d3ad9d23bc130aa9404bfdd1957fe81b3ba498)
justsmth pushed a commit to justsmth/aws-lc that referenced this issue Aug 27, 2024
Update-Note: Building for 32-bit x86 may require fixing your builds to
pass -msse2 to the compiler. This will also speed up the rest of the
code in your project. If your project needs to support the Pentium III,
please contact BoringSSL maintainers.

As far as I know, all our supported 32-bit x86 consumers require SSE2.
I think, in the past, I've asserted that our assembly skips SSE2
capability detection. Looking at it again, I don't think that's true.
OPENSSL_IA32_SSE2 means to enable runtime detection of SSE2, not
compile-time.

Additionally, I don't believe we have *ever* tested the non-SSE2
assembly codepaths. Also, now that we want to take the OPENSSL_ia32cap_P
accesses out of assembly, those runtime checks are problematic, as we'd
need to bifurcafe functions all the way down to bn_mul_words.

Unfortunately, the situation with compilers is... complicated. Ideally,
everyone would build with the equivalent of -msse2. 32-bit x86 is so
register-poor that running without SSE2 statically available seems
especially foolish. However, per
https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9868, while
Clang defaults to enabling SSE2, GCC does not.

We once broke gRPC's build, in
grpc/grpc#17540, by inadvertently assuming
SSE2. In that discussion, gRPC maintainers were okay requiring Pentium 4
as the minimum CPU, but it's unclear if they actually changed their
build. That discussion also said GCC 8 assumes SSE2, but I'm not able to
reproduce this.

LLVM does indeed interpret "i686" as implying SSE2:
llvm/llvm-project#61347
rust-lang/rust#82435

However, Debian LLVM does *not*. Debian carries a patch to turn this
off!
https://salsa.debian.org/pkg-llvm-team/llvm-toolchain/-/blob/snapshot/debian/patches/disable-sse2-old-x86.diff?ref_type=heads

Meanwhile, Fedora fixed their baseline back in 2018.
https://fedoraproject.org/wiki/Changes/Update_i686_architectural_baseline_to_include_SSE2

So let's start by detecting builds that forgot to pass -msse2 and see if
we can get them fixed. If this sticks, I'll follow up by unwinding all
the SSE2 branches.

Bug: 673
Change-Id: I851184b358aaae2926c3e3fe618f3155e71c2f71
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/65875
Reviewed-by: Bob Beck <bbe@google.com>
Commit-Queue: David Benjamin <davidben@google.com>
(cherry picked from commit 56d3ad9d23bc130aa9404bfdd1957fe81b3ba498)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend:X86 question A question, not bug report. Check out https://llvm.org/docs/GettingInvolved.html instead!
Projects
None yet
Development

No branches or pull requests

7 participants