Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test suite failures #2288

Closed
jasontibbitts opened this issue May 20, 2019 · 15 comments
Closed

Test suite failures #2288

jasontibbitts opened this issue May 20, 2019 · 15 comments

Comments

@jasontibbitts
Copy link
Contributor

Version

2.0.0 rc2

Operating system type + version

Fedora 29

Behavior

I am trying to get the test suite to run as part of Fedora's package build process (and have largely succeeded) but several of the tests in the "integration" test suite fail. These are the tests in the top-level 't' directory of the tarball.

Basically, combineinfill.t, custom_gcode.t, fill.t, multi.t and retraction.t all abort with this message:

/usr/include/c++/8/bits/stl_vector.h:950: std::vector<_Tp, _Alloc>::const_reference std::vector<_Tp, _Alloc>::operator[](std::vector<_Tp, _Alloc>::size_type) const [with _Tp = double; _Alloc = std::allocator<double>; std::vector<_Tp, _Alloc>::const_reference = const double&; std::vector<_Tp, _Alloc>::size_type = long unsigned int]: Assertion '__builtin_expect(__n < this->size(), true)' failed.
Aborted (core dumped

Some of their subtests may run but obviously nothing runs past the abort. I'm afraid that I haven't yet been able to untangle all of the C++ here, so I've no idea what's really gone wrong here, but I've extracted the following backtrace. (I removed 19 other threads that weren't doing anything.)

Thread 1 (Thread 0x7fa7bd442740 (LWP 22972)):
#0  0x00007fa7bd47d57f in raise () from /lib64/libc.so.6
#1  0x00007fa7bd467895 in abort () from /lib64/libc.so.6
#2  0x00007fa7b008fe08 in std::__replacement_assert (__file=__file@entry=0x7fa7b034d468 "/usr/include/c++/8/bits/stl_vector.h",
    __line=__line@entry=950,
    __function=__function@entry=0x7fa7b03a32c0 <_ZZNKSt6vectorIdSaIdEEixEmE19__PRETTY_FUNCTION__> "std::vector<_Tp, _Alloc>::const_reference std::vector<_Tp, _Alloc>::operator[](std::vector<_Tp, _Alloc>::size_type) const [with _Tp = double; _Alloc = std::allocator<double>; std::vector<_Tp, _Alloc>:"..., __condition=__condition@entry=0x7fa7b034d438 "__builtin_expect(__n < this->size(), true)")
    at /usr/include/c++/8/x86_64-redhat-linux/bits/c++config.h:2391
#3  0x00007fa7b02ab9d9 in std::vector<double, std::allocator<double> >::operator[] (__n=<optimized out>, this=<optimized out>)
    at /usr/include/c++/8/bits/stl_vector.h:948
#4  Slic3r::ToolOrdering::fill_wipe_tower_partitions (this=0x7ffdf6787850, config=..., object_bottom_z=<optimized out>)
    at /builddir/build/BUILD/PrusaSlicer-version_2.0.0-rc2/src/libslic3r/GCode/ToolOrdering.cpp:302
#5  0x00007fa7b02ad36c in Slic3r::ToolOrdering::ToolOrdering (this=0x7ffdf6787850, print=..., first_extruder=4294967295,
    prime_multi_material=<optimized out>) at /builddir/build/BUILD/PrusaSlicer-version_2.0.0-rc2/src/libslic3r/Print.hpp:331
#6  0x00007fa7b01451a7 in Slic3r::GCode::_do_export (this=0x7ffdf6787ce0, print=..., file=0x56396bc13e60)
    at /usr/include/c++/8/bits/stl_vector.h:300
#7  0x00007fa7b0148556 in Slic3r::GCode::do_export (this=this@entry=0x7ffdf6787ce0, print=print@entry=0x56396c66c2b0,
    path=0x56396c69d3a0 "/builddir/build/BUILD/PrusaSlicer-version_2.0.0-rc2/t/combineinfill.t.gcode.temp",
    preview_data=preview_data@entry=0x0) at /builddir/build/BUILD/PrusaSlicer-version_2.0.0-rc2/src/libslic3r/GCode.cpp:448
#8  0x00007fa7b01e301f in Slic3r::Print::export_gcode (this=this@entry=0x56396c66c2b0,
    path_template="/builddir/build/BUILD/PrusaSlicer-version_2.0.0-rc2/t/combineinfill.t.gcode.temp", preview_data=preview_data@entry=0x0)
    at /usr/include/c++/8/bits/basic_string.h:2290
#9  0x00007fa7b00081a1 in XS_Slic3r__Print_export_gcode (my_perl=<optimized out>, cv=<optimized out>)
    at /usr/include/c++/8/bits/basic_string.h:252
#10 0x00007fa7bd90c5d9 in Perl_pp_entersub () from /lib64/libperl.so.5.28
#11 0x00007fa7bd9027e5 in Perl_runops_standard () from /lib64/libperl.so.5.28
#12 0x00007fa7bd87f7bf in perl_run () from /lib64/libperl.so.5.28
#13 0x000056396a8e434a in ?? ()
#14 0x00007fa7bd469413 in __libc_start_main () from /lib64/libc.so.6
#15 0x000056396a8e438e in ?? ()
@jasontibbitts
Copy link
Contributor Author

Asking someone who speaks C++ far better than I do, this is just an out of bounds array reference. Fedora builds with -D_GLIBCXX_ASSERTIONS so this is caught immediately; without that there's still the out of bounds memory reference but it may or may not crash depending on whatever.

@bubnikv
Copy link
Collaborator

bubnikv commented May 20, 2019 via email

@jasontibbitts
Copy link
Contributor Author

For the record, we are already building with -DCMAKE_BUILD_TYPE=Release. It's just that Fedora uses:

+ CFLAGS='-O2 -g -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS -fexceptions -fstack-protector-strong -grecord-gcc-switches -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 -m64 -mtune=generic -fasynchronous-unwind-tables -fstack-clash-protection -fcf-protection'

and this tends to catch all sorts of things. Better to catch out of bounds array references early.

@bubnikv
Copy link
Collaborator

bubnikv commented May 20, 2019 via email

@bubnikv
Copy link
Collaborator

bubnikv commented May 21, 2019

@jasontibbitts Would you please tell me:

  1. Which Fedora are you compiling on
  2. How do you provide the CFLAGS to the cmake? Did you patch the cmake scripts or do you pass it from the command line?
  3. What are the dependencies I have to install into the fedora? Maybe you can just drop your rpm source file here?

@bubnikv
Copy link
Collaborator

bubnikv commented May 21, 2019

I have to admit I am noob, but it sounds to me that the settings
+ CFLAGS='-O2 -g -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS -fexceptions -fstack-protector-strong -grecord-gcc-switches -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 -m64 -mtune=generic -fasynchronous-unwind-tables -fstack-clash-protection -fcf-protection'
must have a negative effect on run time if it checks the access each access to an std container. I would understand such a check to be enabled on security packages, but not on computationally intensive engineering packages.

lukasmatena added a commit that referenced this issue May 22, 2019
There was a bug in unit tests that led to generating the wipe tower with non-normalized preset.
This caused out-of-bounds access into max_layer_height vector in fill_wipe_tower_partitions.
The problem surfaced in #2288.
I quickly patched additional normalization of the preset to prevent this from happening.

Also, an assert in the same function turned out to trip on one of the tests.
This one was commented out for now and will (hopefully) be looked into later.

Function Print::apply_config was renamed to apply_config_perl_tests_only so everyone
sees its current purpose and does not mistake it for the more important Print::apply.
@lukasmatena
Copy link
Collaborator

As I mentioned in the commit message above, there really was a bug in our unit tests which made PrusaSlicer do the out-of-bounds access. This bug only appeared in the test (not in the application) and we never saw it actually crash. Forcing the range-check revealed the error so we could fix it - hopefully the commit above did and the tests should now pass even with the range checks enabled.

That being said, I'm also surprised you're enabling the extra assert checks and would not expect that in C++ release build. But I'm by far not an expert.

@bubnikv
Copy link
Collaborator

bubnikv commented May 23, 2019

That being said, I'm also surprised you're enabling the extra assert checks and would not expect that in C++ release build. But I'm by far not an expert.

I am considering myself an expert in high performance C++ computing and I am very surprised that these checks are enabled in a release mode. My understanding is, that _GLIBCXX_ASSERTIONS enable run time checks, which will likely have a detrimental effect on the computation speed and power consumption. We shall consider our carbon footprint and the effects on the global warming when forcing unnecessary computation cycles by the distro packaging rules. Are these rules really necessary for a package, which poses a negligible security risk?

@jasontibbitts
Copy link
Contributor Author

jasontibbitts commented Jun 4, 2019

I'm so sorry that I didn't see this before; there are so many github notifications....

I compile on Fedora 29 and 31. Yes, the default Fedora flags are quite strict, but note that the definition of "security sensitive" is basically anything that will ever open a file that could potentially come from somewhere else so you end up including everything that runs on the desktop. After all, I think you'd want the browser to offer to start PrusaSlicer when the user clicks on an STL file, but only if you can trust it to run on untrusted data.

In any case, I can't comment on the overhead as I really have no idea. Do note that I'm not doing anything to enable these extra checks; they are used for all packages in Fedora. I would expect that most other distributions would eventually followed suit if they haven't already and I'm sure I can get some expert to provide more information on the tradeoffs involved if you would really like to know. Fedora is very closely linked into GCC upstream (as well as glibc) and decisions about what flags to enable are generally made by GCC or glibc developers.

As for the package I'm building, I did have a fork at https://src.fedoraproject.org/fork/tibbs/rpms/slic3r-prusa3d but I've moved on to getting the renamed version properly into the distribution. Hopefully that won't take too long as I'm nearly ready to submit it.

@bubnikv
Copy link
Collaborator

bubnikv commented Jun 4, 2019 via email

@bubnikv
Copy link
Collaborator

bubnikv commented Aug 8, 2019

We fixed the crash that was reported initially. Closing for now.
I have asked @tamasmeszaros to try to do profiling with the Fedora default compilation flags to compare the performance cost against our own builds with no run time checks.

@bubnikv bubnikv closed this as completed Aug 8, 2019
@jasontibbitts
Copy link
Contributor Author

For the record, I did ask about the performance impact of this and did get a reasonable response from Florian Wiemer, who I believe is a glibc developer:
https://lists.fedoraproject.org/archives/list/[email protected]/message/3WR4TAHT2XYDHZSVXZWBAVFDY3XN4DEF/

I'll quote the relevant text below. I am happy to ask for more information if it would be useful. (Personally I'm not sure just what you would look for in a profile; otherwise I could do some testing myself.)

For loops which use the correct type for the index type
(size_t/std::vector::size_type), the bounds check is optimized away
by the compiler. There is one SPEC2006 regression, but it uses the
wrong index type:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85929

Such 32-bit/64-bit mismatches are often bugs anyway and easily fixed in
the source code. This case is difficult to optimize for the compiler
because by using the wrong type, the behavior of the code is different
for large arrays.

For non-sequential access, the compiler cannot optimize away the bounds
check. Whether the additional comparison is visible in profiles depends
on many, many factors. I haven't seen it show up so far.

Thanks,
Florian

@jasontibbitts
Copy link
Contributor Author

So I really thought this had been fixed, but now with 2.1.0-alpha1 I'm seeing the same Assertion '__builtin_expect(__n < this->size(), true)' failed. error in the same tests, plus a new one:

  • t/combinefill.t
  • t/custom_gcode.t
  • t/fill.t
  • t/multi.t
  • t/retraction.t
  • t/skirt_brim.t

I guess the skirt_brim.t failure is new; the other files failed in 2.0.0 but were fixed with 07282eb which I applied in the Fedora package. Is it possible that was only applied on a branch or was somehow reverted? I've checked that this isn't due to additional checks in a new gcc version or something like that.

@lukasmatena
Copy link
Collaborator

lukasmatena commented Aug 23, 2019

I haven't yet looked into it properly but the workaround that fixed it was removed with ac6969c which unified config handling for the unit tests and the main application.

The whole function Print::apply_config_perl_tests_only was removed and Print::apply should have taken over. It is possible that it is missing something. @bubnikv

@bubnikv
Copy link
Collaborator

bubnikv commented Aug 23, 2019 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants