Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix flaky test shutdown race in seqno_time_test #12282

Closed

Conversation

pdillinger
Copy link
Contributor

Summary: Seen in build-macos-cmake:

Received signal 11 (Segmentation fault: 11)
	#1   rocksdb::MockSystemClock::InstallTimedWaitFixCallback()::$_0::operator()(void*) const (in seqno_time_test) (mock_time_env.cc:29)
	#2   decltype(std::declval<rocksdb::MockSystemClock::InstallTimedWaitFixCallback()::$_0&>()(std::declval<void*>())) std::__1::__invoke[abi:v15006]<rocksdb::MockSystemClock::InstallTimedWaitFixCallback()::$_0&, void*>(rocksdb::MockSystemClock::InstallTimedWait	ixCallback()::$_0&, void*&&) (in seqno_time_test) (invoke.h:394)
...

This is presumably because the std::function from the lambda only saves a copy of the SeqnoTimeTest* this pointer, which doesn't prevent it from being reclaimed on parallel shutdown. If we instead save a copy of the std::shared_ptr<MockSystemClock> in the std::function, this should prevent the crash. (Note that in SyncPoint::Data::Process() copies the std::function before releasing the mutex for calling the callback.)

Test Plan: watch CI

Summary: Seen in build-macos-cmake:

```
Received signal 11 (Segmentation fault: 11)
	facebook#1   rocksdb::MockSystemClock::InstallTimedWaitFixCallback()::$_0::operator()(void*) const (in seqno_time_test) (mock_time_env.cc:29)
	facebook#2   decltype(std::declval<rocksdb::MockSystemClock::InstallTimedWaitFixCallback()::$_0&>()(std::declval<void*>())) std::__1::__invoke[abi:v15006]<rocksdb::MockSystemClock::InstallTimedWaitFixCallback()::$_0&, void*>(rocksdb::MockSystemClock::InstallTimedWait	ixCallback()::$_0&, void*&&) (in seqno_time_test) (invoke.h:394)
...
```

This is presumably because the std::function from the lambda only saves
a copy of the SeqnoTimeTest* this pointer, which doesn't prevent it from
being reclaimed on parallel shutdown. If we instead save a copy of the
`std::shared_ptr<MockSystemClock>` in the std::function, this should
prevent the crash. (Note that in `SyncPoint::Data::Process()` copies the
std::function before releasing the mutex for calling the callback.)

Test Plan: watch CI
@facebook-github-bot
Copy link
Contributor

@pdillinger has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

Copy link
Member

@cbi42 cbi42 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@facebook-github-bot
Copy link
Contributor

@pdillinger merged this pull request in b31f324.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants