test/raft: segment appender concurrent flush and write sanitizer error #3841

andrewhsu · 2022-02-17T21:31:15Z

from nightly test job on dev branch: https://buildkite.com/redpanda/redpanda/builds/7400#23a4a0e4-52db-44b8-ada4-bd2cd6982708/6-5332

The following tests FAILED:
	 33 - test_raft_rpunit (Failed)

earlier in the same logs:

*** 1 failure is detected in the test module "../../../src/v/raft/tests/jitter_tests.cc"
Test Exit code 201

The text was updated successfully, but these errors were encountered:

dotnwat · 2022-02-18T20:52:58Z

@andrewhsu if you scroll up further in the log you'll see:

TRACE 2022-02-17 18:53:26,723 [shard 0] raft - [group_id:0, {kafka/group_0/0}] replicate_entries_stm.cc:184 - Leader append result: {append_time:1208561, base_offset:{0}, last_offset:{0}, byte_size:167}
--
  | flush() called concurrently with other operations.
  | called from 0x12f8f44 0xec84fe 0x10419dd 0xd40360 0xd4cbad 0x10f02ef 0x10f3747 0x10f0b75 0x105921c 0x1056f00 0xbf5897 0x10b9e3f /lib64/libpthread.so.0+0x9298 /lib64/libc.so.6+0x1006a2
  | --------
  | seastar::continuation<seastar::internal::promise_base_with_type<void>, storage::segment_appender::dispatch_background_head_write()::$_20::operator()()::'lambda'(seastar::semaphore_units<seastar::semaphore_default_exception_factory, std::__1::chrono::steady_clock>)::operator()(seastar::semaphore_units<seastar::semaphore_default_exception_factory, std::__1::chrono::steady_clock>)::'lambda'(unsigned long), seastar::future<void> seastar::future<unsigned long>::then_impl_nrvo<storage::segment_appender::dispatch_background_head_write()::$_20::operator()()::'lambda'(seastar::semaphore_units<seastar::semaphore_default_exception_factory, std::__1::chrono::steady_clock>)::operator()(seastar::semaphore_units<seastar::semaphore_default_exception_factory, std::__1::chrono::steady_clock>)::'lambda'(unsigned long), seastar::future<void> >(storage::segment_appender::dispatch_background_head_write()::$_20::operator()()::'lambda'(seastar::semaphore_units<seastar::semaphore_default_exception_factory, std::__1::chrono::steady_clock>)::operator()(seastar::semaphore_units<seastar::semaphore_default_exception_factory, std::__1::chrono::steady_clock>)::'lambda'(unsigned long)&&)::'lambda'(seastar::internal::promise_base_with_type<void>&&, storage::segment_appender::dispatch_background_head_write()::$_20::operator()()::'lambda'(seastar::semaphore_units<seastar::semaphore_default_exception_factory, std::__1::chrono::steady_clock>)::operator()(seastar::semaphore_units<seastar::semaphore_default_exception_factory, std::__1::chrono::steady_clock>)::'lambda'(unsigned long)&, seastar::future_state<unsigned long>&&), unsigned long>

....pending op: {sanitizer_op: ss::future<size_t>::write_dma(pos:32768, *void, len:16384)

Looks like it is coming from segment appender

   seastar::continuation<seastar::internal::promise_base_with_type<void>, seastar::future<void>::finally_body<storage::segment_appender::dispatch_background_head_write()::$_20::operator()()::'lambda'(seastar::semaphore_units<seastar::semaphore_default_exception_factory, std::__1::chrono::steady_clock>)::operator()(seastar::semaphore_units<seastar::semaphore_default_exception_factory, std::__1::chrono::steady_clock>)::'lambda'(), false>, seastar::futurize<seastar::future<void> >::type seastar::future<void>::then_wrapped_nrvo<seastar::future<void>, seastar::future<void>::finally_body<storage::segment_appender::dispatch_background_head_write()::$_20::operator()()::'lambda'(seastar::semaphore_units<seastar::semaphore_default_exception_factory, std::__1::chrono::steady_clock>)::operator()(seastar::semaphore_units<seastar::semaphore_default_exception_factory, std::__1::chrono::steady_clock>)::'lambda'(), false> >(seastar::future<void>::finally_body<storage::segment_appender::dispatch_background_head_write()::$_20::operator()()::'lambda'(seastar::semaphore_units<seastar::semaphore_default_exception_factory, std::__1::chrono::steady_clock>)::operator()(seastar::semaphore_units<seastar::semaphore_default_exception_factory, std::__1::chrono::steady_clock>)::'lambda'(), false>&&)::'lambda'(seastar::internal::promise_base_with_type<void>&&, seastar::future<void>::finally_body<storage::segment_appender::dispatch_background_head_write()::$_20::operator()()::'lambda'(seastar::semaphore_units<seastar::semaphore_default_exception_factory, std::__1::chrono::steady_clock>)::operator()(seastar::semaphore_units<seastar::semaphore_default_exception_factory, std::__1::chrono::steady_clock>)::'lambda'(), false>&, seastar::future_state<seastar::internal::monostate>&&), void>

We made the segment appender to be safe for concurrent flush and append, but this error is originating from the file handler sanitizer wrapper. It's not clear if this is a false positive, or it really is an invalid use of the file handler.

ztlpn · 2022-06-10T14:04:07Z

https://buildkite.com/redpanda/redpanda/builds/11122#018147af-6094-4eaf-a983-d5ca1f069f84

ztlpn · 2022-07-28T14:29:20Z

CI failure triage: this is very rare, but I bet it is a genuine bug. Need to read the code carefully.

dotnwat · 2022-07-28T21:59:51Z

CI failure triage: this is very rare, but I bet it is a genuine bug. Need to read the code carefully.

agree. this needs someone to sent down expecting a full day of analyzing this.

ztlpn · 2022-07-28T23:30:54Z

Ok looks like there is a fair amount of red herrings in this issue :)

Red herring 1: jitter_tests.cc has nothing to do with the issue. It just happens to be the first file in the sources list for the test_raft_rpunit executable and that's what seastar sets as the main test module name. The actual failure is:

../../../src/v/raft/tests/raft_group_fixture.h(552): fatal error: in "replace_whole_group": Timeout elapsed while wating for: new nodes are up to date

Red herring 2: concurrent flush and append have nothing to do with the test failure! Confusingly, these "flush() called concurrently" messages contain stack traces and that makes them look similar to assertion failure messages, but really they are just warnings and don't affect any test outcomes. Moreover, the messages (there are several) appear after the test failure. So either there are two separate issues or the causality is in reverse.

ztlpn · 2022-07-29T11:47:03Z

Ok looks like the replace_whole_group test failure is a duplicate of #342 and was recently fixed in #5696 (it was a test issue, redpanda is fine).

But this concurrent flush and append message still requires investigation, so not closing the issue yet.

ztlpn · 2022-08-01T00:20:36Z

So I managed to reproduce concurrent flush+write with some sleeps, it is indeed allowed by segment_appender. Approximate scenario is:

raft appends 100 bytes
raft dispatches flush1
raft appends 15000 bytes
flush1 dispatches a background write on chunk 0-16K (bg_head_write1)
raft appends 5000 bytes, _prev_head_write mutex gets replaced, another write of the 0-16K chunk gets dispatched (bg_head_write2)
raft dispatches flush2
flush2 dispatches a background write of the 16K-32K chunk (bg_head_write3)
because bg_head_write1 and bg_head_write3 can be concurrent, dma write from bg_head_write3 can be concurrent with the file flush from bg_head_write1.

I can't see an easy fix without sacrificing concurrency. But maybe concurrent flush/dma_writes are not that bad? Not sure.

cc @dotnwat

dotnwat · 2022-08-23T05:46:47Z

@ztlpn nice analysis.

when i added background flushing into the segment appender i wanted to allow appends and flushes to run concurrently--that was a key part of the optimization by which raft would be able to pipeline i/o.

however, i definitely wasn't considering that the file handles that the segment appender was using would be wrapped in the sanitizer which is catching the flush() called concurrently with other operations.

is the sanitizer too conservative for use in the segment appender? if the following are true, then I think we could probably relax the rule:

we aren't violating any constraints on the seastar::file api (i don't think so)
the concurrent ops in the segment appender are correct

ajfabbri · 2022-08-25T03:22:20Z

Seeing this in testing for #5318, build here.

is the sanitizer too conservative for use in the segment appender?

From FS (e.g. XFS) perspective, yes (assuming we honor that flush only affects previously-completed operations). Good point about the seastar part..

jcsp · 2022-11-25T10:06:13Z

Removing ci-failure because the test doesn't fail, the ticket is open to investigate a warning.

@dotnwat @ztlpn do you think we have next steps on this, or shall we close?

dotnwat · 2022-11-28T21:31:19Z

i don't think this is a proper bug in the segment appender (or a violation of the appender's api rules). if we start getting CI failures because of this we should look again and fixing it but it's clearly very rare.

dotnwat · 2022-11-28T21:32:25Z

this one, #5433, though, we might want to investigate

andrewhsu added kind/bug Something isn't working ci-failure labels Feb 17, 2022

dotnwat changed the title ~~failure in test_raft_rpunit~~ test/raft: segment appender concurrent flush and write sanitizer error Feb 18, 2022

piyushredpanda assigned ztlpn Apr 21, 2022

mmedenjak added area/raft area/tests labels Jul 5, 2022

jcsp removed the ci-failure label Nov 25, 2022

dotnwat closed this as completed Nov 28, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test/raft: segment appender concurrent flush and write sanitizer error #3841

test/raft: segment appender concurrent flush and write sanitizer error #3841

andrewhsu commented Feb 17, 2022

dotnwat commented Feb 18, 2022 •

edited

Loading

ztlpn commented Jun 10, 2022

ztlpn commented Jul 28, 2022

dotnwat commented Jul 28, 2022

ztlpn commented Jul 28, 2022

ztlpn commented Jul 29, 2022

ztlpn commented Aug 1, 2022

dotnwat commented Aug 23, 2022

ajfabbri commented Aug 25, 2022 •

edited

Loading

jcsp commented Nov 25, 2022

dotnwat commented Nov 28, 2022

dotnwat commented Nov 28, 2022

test/raft: segment appender concurrent flush and write sanitizer error #3841

test/raft: segment appender concurrent flush and write sanitizer error #3841

Comments

andrewhsu commented Feb 17, 2022

dotnwat commented Feb 18, 2022 • edited Loading

ztlpn commented Jun 10, 2022

ztlpn commented Jul 28, 2022

dotnwat commented Jul 28, 2022

ztlpn commented Jul 28, 2022

ztlpn commented Jul 29, 2022

ztlpn commented Aug 1, 2022

dotnwat commented Aug 23, 2022

ajfabbri commented Aug 25, 2022 • edited Loading

jcsp commented Nov 25, 2022

dotnwat commented Nov 28, 2022

dotnwat commented Nov 28, 2022

dotnwat commented Feb 18, 2022 •

edited

Loading

ajfabbri commented Aug 25, 2022 •

edited

Loading