Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test/raft: segment appender concurrent flush and write sanitizer error #3841

Closed
andrewhsu opened this issue Feb 17, 2022 · 12 comments
Closed
Assignees
Labels
area/raft area/tests kind/bug Something isn't working

Comments

@andrewhsu
Copy link
Member

from nightly test job on dev branch: https://buildkite.com/redpanda/redpanda/builds/7400#23a4a0e4-52db-44b8-ada4-bd2cd6982708/6-5332

The following tests FAILED:
	 33 - test_raft_rpunit (Failed)

earlier in the same logs:

*** 1 failure is detected in the test module "../../../src/v/raft/tests/jitter_tests.cc"
Test Exit code 201
@andrewhsu andrewhsu added kind/bug Something isn't working ci-failure labels Feb 17, 2022
@dotnwat
Copy link
Member

dotnwat commented Feb 18, 2022

@andrewhsu if you scroll up further in the log you'll see:

TRACE 2022-02-17 18:53:26,723 [shard 0] raft - [group_id:0, {kafka/group_0/0}] replicate_entries_stm.cc:184 - Leader append result: {append_time:1208561, base_offset:{0}, last_offset:{0}, byte_size:167}
--
  | flush() called concurrently with other operations.
  | called from 0x12f8f44 0xec84fe 0x10419dd 0xd40360 0xd4cbad 0x10f02ef 0x10f3747 0x10f0b75 0x105921c 0x1056f00 0xbf5897 0x10b9e3f /lib64/libpthread.so.0+0x9298 /lib64/libc.so.6+0x1006a2
  | --------
  | seastar::continuation<seastar::internal::promise_base_with_type<void>, storage::segment_appender::dispatch_background_head_write()::$_20::operator()()::'lambda'(seastar::semaphore_units<seastar::semaphore_default_exception_factory, std::__1::chrono::steady_clock>)::operator()(seastar::semaphore_units<seastar::semaphore_default_exception_factory, std::__1::chrono::steady_clock>)::'lambda'(unsigned long), seastar::future<void> seastar::future<unsigned long>::then_impl_nrvo<storage::segment_appender::dispatch_background_head_write()::$_20::operator()()::'lambda'(seastar::semaphore_units<seastar::semaphore_default_exception_factory, std::__1::chrono::steady_clock>)::operator()(seastar::semaphore_units<seastar::semaphore_default_exception_factory, std::__1::chrono::steady_clock>)::'lambda'(unsigned long), seastar::future<void> >(storage::segment_appender::dispatch_background_head_write()::$_20::operator()()::'lambda'(seastar::semaphore_units<seastar::semaphore_default_exception_factory, std::__1::chrono::steady_clock>)::operator()(seastar::semaphore_units<seastar::semaphore_default_exception_factory, std::__1::chrono::steady_clock>)::'lambda'(unsigned long)&&)::'lambda'(seastar::internal::promise_base_with_type<void>&&, storage::segment_appender::dispatch_background_head_write()::$_20::operator()()::'lambda'(seastar::semaphore_units<seastar::semaphore_default_exception_factory, std::__1::chrono::steady_clock>)::operator()(seastar::semaphore_units<seastar::semaphore_default_exception_factory, std::__1::chrono::steady_clock>)::'lambda'(unsigned long)&, seastar::future_state<unsigned long>&&), unsigned long>
....pending op: {sanitizer_op: ss::future<size_t>::write_dma(pos:32768, *void, len:16384)

Looks like it is coming from segment appender

   seastar::continuation<seastar::internal::promise_base_with_type<void>, seastar::future<void>::finally_body<storage::segment_appender::dispatch_background_head_write()::$_20::operator()()::'lambda'(seastar::semaphore_units<seastar::semaphore_default_exception_factory, std::__1::chrono::steady_clock>)::operator()(seastar::semaphore_units<seastar::semaphore_default_exception_factory, std::__1::chrono::steady_clock>)::'lambda'(), false>, seastar::futurize<seastar::future<void> >::type seastar::future<void>::then_wrapped_nrvo<seastar::future<void>, seastar::future<void>::finally_body<storage::segment_appender::dispatch_background_head_write()::$_20::operator()()::'lambda'(seastar::semaphore_units<seastar::semaphore_default_exception_factory, std::__1::chrono::steady_clock>)::operator()(seastar::semaphore_units<seastar::semaphore_default_exception_factory, std::__1::chrono::steady_clock>)::'lambda'(), false> >(seastar::future<void>::finally_body<storage::segment_appender::dispatch_background_head_write()::$_20::operator()()::'lambda'(seastar::semaphore_units<seastar::semaphore_default_exception_factory, std::__1::chrono::steady_clock>)::operator()(seastar::semaphore_units<seastar::semaphore_default_exception_factory, std::__1::chrono::steady_clock>)::'lambda'(), false>&&)::'lambda'(seastar::internal::promise_base_with_type<void>&&, seastar::future<void>::finally_body<storage::segment_appender::dispatch_background_head_write()::$_20::operator()()::'lambda'(seastar::semaphore_units<seastar::semaphore_default_exception_factory, std::__1::chrono::steady_clock>)::operator()(seastar::semaphore_units<seastar::semaphore_default_exception_factory, std::__1::chrono::steady_clock>)::'lambda'(), false>&, seastar::future_state<seastar::internal::monostate>&&), void>

We made the segment appender to be safe for concurrent flush and append, but this error is originating from the file handler sanitizer wrapper. It's not clear if this is a false positive, or it really is an invalid use of the file handler.

@dotnwat dotnwat changed the title failure in test_raft_rpunit test/raft: segment appender concurrent flush and write sanitizer error Feb 18, 2022
@ztlpn
Copy link
Contributor

ztlpn commented Jun 10, 2022

@ztlpn
Copy link
Contributor

ztlpn commented Jul 28, 2022

CI failure triage: this is very rare, but I bet it is a genuine bug. Need to read the code carefully.

@dotnwat
Copy link
Member

dotnwat commented Jul 28, 2022

CI failure triage: this is very rare, but I bet it is a genuine bug. Need to read the code carefully.

agree. this needs someone to sent down expecting a full day of analyzing this.

@ztlpn
Copy link
Contributor

ztlpn commented Jul 28, 2022

Ok looks like there is a fair amount of red herrings in this issue :)

Red herring 1: jitter_tests.cc has nothing to do with the issue. It just happens to be the first file in the sources list for the test_raft_rpunit executable and that's what seastar sets as the main test module name. The actual failure is:

../../../src/v/raft/tests/raft_group_fixture.h(552): fatal error: in "replace_whole_group": Timeout elapsed while wating for: new nodes are up to date

Red herring 2: concurrent flush and append have nothing to do with the test failure! Confusingly, these "flush() called concurrently" messages contain stack traces and that makes them look similar to assertion failure messages, but really they are just warnings and don't affect any test outcomes. Moreover, the messages (there are several) appear after the test failure. So either there are two separate issues or the causality is in reverse.

@ztlpn
Copy link
Contributor

ztlpn commented Jul 29, 2022

Ok looks like the replace_whole_group test failure is a duplicate of #342 and was recently fixed in #5696 (it was a test issue, redpanda is fine).

But this concurrent flush and append message still requires investigation, so not closing the issue yet.

@ztlpn
Copy link
Contributor

ztlpn commented Aug 1, 2022

So I managed to reproduce concurrent flush+write with some sleeps, it is indeed allowed by segment_appender. Approximate scenario is:

  • raft appends 100 bytes
  • raft dispatches flush1
  • raft appends 15000 bytes
  • flush1 dispatches a background write on chunk 0-16K (bg_head_write1)
  • raft appends 5000 bytes, _prev_head_write mutex gets replaced, another write of the 0-16K chunk gets dispatched (bg_head_write2)
  • raft dispatches flush2
  • flush2 dispatches a background write of the 16K-32K chunk (bg_head_write3)
  • because bg_head_write1 and bg_head_write3 can be concurrent, dma write from bg_head_write3 can be concurrent with the file flush from bg_head_write1.

I can't see an easy fix without sacrificing concurrency. But maybe concurrent flush/dma_writes are not that bad? Not sure.

cc @dotnwat

@dotnwat
Copy link
Member

dotnwat commented Aug 23, 2022

@ztlpn nice analysis.

when i added background flushing into the segment appender i wanted to allow appends and flushes to run concurrently--that was a key part of the optimization by which raft would be able to pipeline i/o.

however, i definitely wasn't considering that the file handles that the segment appender was using would be wrapped in the sanitizer which is catching the flush() called concurrently with other operations.

is the sanitizer too conservative for use in the segment appender? if the following are true, then I think we could probably relax the rule:

  • we aren't violating any constraints on the seastar::file api (i don't think so)
  • the concurrent ops in the segment appender are correct

@ajfabbri
Copy link
Contributor

ajfabbri commented Aug 25, 2022

Seeing this in testing for #5318, build here.

is the sanitizer too conservative for use in the segment appender?

From FS (e.g. XFS) perspective, yes (assuming we honor that flush only affects previously-completed operations). Good point about the seastar part..

@jcsp jcsp removed the ci-failure label Nov 25, 2022
@jcsp
Copy link
Contributor

jcsp commented Nov 25, 2022

Removing ci-failure because the test doesn't fail, the ticket is open to investigate a warning.

@dotnwat @ztlpn do you think we have next steps on this, or shall we close?

@dotnwat
Copy link
Member

dotnwat commented Nov 28, 2022

i don't think this is a proper bug in the segment appender (or a violation of the appender's api rules). if we start getting CI failures because of this we should look again and fixing it but it's clearly very rare.

@dotnwat dotnwat closed this as completed Nov 28, 2022
@dotnwat
Copy link
Member

dotnwat commented Nov 28, 2022

this one, #5433, though, we might want to investigate

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/raft area/tests kind/bug Something isn't working
Projects
None yet
Development

No branches or pull requests

6 participants