Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SIGABRT make check fails on OSX #573

Closed
adamretter opened this issue Apr 11, 2015 · 23 comments
Closed

SIGABRT make check fails on OSX #573

adamretter opened this issue Apr 11, 2015 · 23 comments
Assignees

Comments

@adamretter
Copy link
Collaborator

This on the current HEAD of master:

$ make clean jclean check

Eventually leads to:

[       OK ] DBTest.ChecksumTest (16 ms)
[ RUN      ] DBTest.FIFOCompactionTest
db/db_test.cc:9623: Failure
Value of: 5
Expected: NumTableFilesAtLevel(0)
Which is: 4
libc++abi.dylib: terminating with uncaught exception of type testing::internal::GoogleTestFailureException: db/db_test.cc:9623: Failure
Value of: 5
Expected: NumTableFilesAtLevel(0)
Which is: 4
Received signal 6 (Abort trap: 6)
#0   0x7fff5df1ac38 
#1   abort (in libsystem_c.dylib) + 129 
#2   __cxa_bad_cast (in libc++abi.dylib) + 0    
#3   default_terminate_handler() (in libc++abi.dylib) + 243 
#4   _objc_terminate() (in libobjc.A.dylib) + 124   
#5   std::__terminate(void (*)()) (in libc++abi.dylib) + 8  
#6   __cxa_rethrow (in libc++abi.dylib) + 99    
#7   bool testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) (in db_test) (gtest-all.cc:3863)    
#8   testing::UnitTest::Run() (in db_test) (gtest-all.cc:5678)  
#9   main (in db_test) (db_test.cc:12462)   
#10  start (in libdyld.dylib) + 1   
#11  0x00000001 (in db_test)    
/bin/sh: line 1: 89952 Abort trap: 6           ./$t
make: *** [check] Error 1
@adamretter
Copy link
Collaborator Author

Having repeated this 3 times now, the error above has only occurred 2 out of the 3 times. That worries me that we now have an instability in Rocks.

@igorcanadi
Copy link
Collaborator

I'm able to repro, although it's not 2/3 for me -- it's about 1/10.

Don't worry about instability :) Those things are usually due to instability in tests rather than RocksDB.

@adamretter
Copy link
Collaborator Author

Glad you can reproduce @igorcanadi. I guess My Mac is much older and therefore under more strain than yours; Perhaps that explains why I see it more frequently.

Either way, I'll stop worrying now I know it's in your hands. Thanks.

@igorcanadi
Copy link
Collaborator

Close to fixing :)

@igorcanadi
Copy link
Collaborator

igorcanadi added a commit that referenced this issue Apr 13, 2015
Summary:
The problem is that sometimes two memtables will be compacted together into a single file. In that case, our assertion

        ASSERT_EQ(NumTableFilesAtLevel(0), 5);

fails because same amount of data is in 4 files instead of 5. We should wait for flush so that we prevent two memtables merging into a single file.

Test Plan: `for i in `seq 20`; do mrtest FIFOCompactionTest; done` -- fails at least once before. fails zero times after.

Reviewers: rven

Reviewed By: rven

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D36939
@igorcanadi
Copy link
Collaborator

Should be fixed now. Let me know if you still see issues.

@adamretter
Copy link
Collaborator Author

@igorcanadi Well the problem just seems to have moved now. Instead after your fix, on my first run of make clean jclean check I now get this:

[       OK ] WalManagerTest.WALArchivalSizeLimit (4027 ms)
[ RUN      ] WalManagerTest.WALArchivalTtl
db/wal_manager_test.cc:257: Failure
Value of: log_files.empty()
  Actual: false
Expected: true
libc++abi.dylib: terminating with uncaught exception of type testing::internal::GoogleTestFailureException: db/wal_manager_test.cc:257: Failure
Value of: log_files.empty()
  Actual: false
Expected: true
/bin/sh: line 1:  4493 Abort trap: 6           ./$t

@adamretter adamretter reopened this Apr 13, 2015
@igorcanadi
Copy link
Collaborator

I'm pretty sure those two are unrelated :) But I'll take a look at the other one.

@adamretter
Copy link
Collaborator Author

Thanks @igorcanadi... of course from my perspective, the subject of this issue ticket is still valid ;-)

So this new issue is intermittent also. It happened on the first run but not the second, doing a 3rd run now...

@igorcanadi
Copy link
Collaborator

How does it look like now? https://reviews.facebook.net/D36951

@adamretter
Copy link
Collaborator Author

@igorcanadi On the 3rd and 4th run, I see yet another different issue (i.e. before your D36951 is applied):

[ RUN      ] DBTest.DynamicLevelCompressionPerLevel2
db/db_test.cc:11268: Failure
Value of: size <= 30 || ct != kNoCompression
  Actual: false
Expected: true
libc++abi.dylib: terminating with uncaught exception of type testing::internal::GoogleTestFailureException: db/db_test.cc:11268: Failure
Value of: size <= 30 || ct != kNoCompression
  Actual: false
Expected: true
Received signal 6 (Abort trap: 6)
#0   0x1053ea0c8    
#1   abort (in libsystem_c.dylib) + 129 
#2   __cxa_bad_cast (in libc++abi.dylib) + 0    
#3   default_terminate_handler() (in libc++abi.dylib) + 243 
#4   _objc_terminate() (in libobjc.A.dylib) + 124   
#5   std::__terminate(void (*)()) (in libc++abi.dylib) + 8  
#6   __cxxabiv1::exception_cleanup_func(_Unwind_Reason_Code, _Unwind_Exception*) (in libc++abi.dylib) + 0   
#7   testing::UnitTest::AddTestPartResult(testing::TestPartResult::Type, char const*, int, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::alloca or<char> > const&) (in db_test) (gtest-all.cc:5581) 
#8   testing::internal::AssertHelper::operator=(testing::Message const&) const (in db_test) (gtest-all.cc:1824) 
#9   std::__1::__function::__func<rocksdb::DBTest_DynamicLevelCompressionPerLevel2_Test::TestBody()::$_12, std::__1::allocator<rocksdb::DBTest_DynamicLevelCompressionPerLevel2_Test::TestBody()::$_12>, void (rocksdb::CompressionType const&, unsigned long long) ::operator()(rocksdb::CompressionType const&, unsigned long long&&) (in db_test) (gtest.h:2280) 
#10  rocksdb::mock::MockTableBuilder::Finish() (in db_test) (mock_table.h:123)  
#11  rocksdb::BuildTable(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, rocksdb::Env*, rocksdb::ImmutableCFOptions const&, rocksdb::EnvOptions const&, rocksdb::TableCache*, rocksdb::Iterator*, rocksdb::FileMetaDa a*, rocksdb::InternalKeyComparator const&, std::__1::vector<std::__1::unique_ptr<rocksdb::IntTblPropCollectorFactory, std::__1::default_delete<rocksdb::IntTblPropCollectorFactory> >, std::__1::allocator<std::__1::unique_ptr<rocksdb::IntTblPropCollectorFa  tory, std::__1::default_delete<rocksdb::IntTblPropCollectorFactory> > > > const*, unsigned long long, unsigned long long, rocksdb::CompressionType, rocksdb::CompressionOptions const&, rocksdb::Env::IOPriority) (in db_test) (status.h:157)   
#12  rocksdb::FlushJob::WriteLevel0Table(rocksdb::autovector<rocksdb::MemTable*, 8ul> const&, rocksdb::VersionEdit*, unsigned long long*) (in db_test) (status.h:157)   
#13  rocksdb::FlushJob::Run(unsigned long long*) (in db_test) (status.h:85) 
#14  rocksdb::DBImpl::FlushMemTableToOutputFile(rocksdb::ColumnFamilyData*, rocksdb::MutableCFOptions const&, bool*, rocksdb::JobContext*, rocksdb::LogBuffer*) (in db_test) (status.h:134) 
#15  rocksdb::DBImpl::BackgroundFlush(bool*, rocksdb::JobContext*, rocksdb::LogBuffer*) (in db_test) (status.h:157) 
#16  rocksdb::DBImpl::BackgroundCallFlush() (in db_test) (status.h:134) 
#17  rocksdb::(anonymous namespace)::PosixEnv::ThreadPool::BGThreadWrapper(void*) (in db_test) (env_posix.cc:1639)  
#18  _pthread_body (in libsystem_pthread.dylib) + 131   
#19  _pthread_body (in libsystem_pthread.dylib) + 0 
#20  thread_start (in libsystem_pthread.dylib) + 13 
/bin/sh: line 1: 15111 Abort trap: 6           ./$t
make: *** [check] Error 1

I do wonder how my machine triggers these so easily compared with you guys...

@igorcanadi
Copy link
Collaborator

The second test should be addressed by the diff. @siying knows more about the third test failure, so I'm assigning this to him.

@adamretter
Copy link
Collaborator Author

Hmm I bet @siying really loves me right now ;-)

@siying
Copy link
Contributor

siying commented Apr 13, 2015

@adamretter your problems are always hard:) But thank you for reporting them.

@igorcanadi
Copy link
Collaborator

I get 4/50 failures for the last test. So I can repro as well :)

@siying
Copy link
Contributor

siying commented Apr 14, 2015

Have a patch to simplify the test: https://reviews.facebook.net/D36999 Hopefully it tells us more information if it still fails. I don't have a Mac to verify it.

@igorcanadi
Copy link
Collaborator

@siying I can trigger the failure on my dev server when I run parallel make check

@siying
Copy link
Contributor

siying commented Apr 14, 2015

What's the failure message?

@igorcanadi
Copy link
Collaborator

Same as @adamretter 's

@siying
Copy link
Contributor

siying commented Apr 14, 2015

@igorcanadi with my latest patch?

@igorcanadi
Copy link
Collaborator

Looks like it's passing with your latest pathc

@ghost
Copy link

ghost commented Aug 4, 2015

Thank you for reporting this issue and appreciate your patience. We've notified the core team for an update on this issue. We're looking for a response within the next 30 days or the issue may be closed.

@adamretter
Copy link
Collaborator Author

@igorcanadi @siying If the latest patch got applied, we can close this ticket.

@siying siying closed this as completed Aug 5, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants