Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cloud_storage: Segment deleted while S3 upload in progress (Failure in FranzGoVerifiableWithSiTest.test_si_with_timeboxed and test_si_without_timeboxed) #4624

Closed
jcsp opened this issue May 9, 2022 · 8 comments · Fixed by #4642
Assignees
Labels

Comments

@jcsp
Copy link
Contributor

jcsp commented May 9, 2022

Failing 1/18 runs in last 24h.

Failures:
https://buildkite.com/redpanda/redpanda/builds/9898#38cf2489-dd4e-4457-86db-9d3d1d27f849

test_id:    rptest.tests.franz_go_verifiable_test.FranzGoVerifiableWithSiTest.test_si_without_timeboxed.segment_size=20971520
status:     FAIL
run time:   2 minutes 54.148 seconds


    <BadLogLines nodes=docker-rp-4(4) example="ERROR 2022-05-09 07:27:08,466 [shard 1] cloud_storage - [fiber79~3~0|1|24722ms] - remote.cc:109 - System error std::__1::system_error (error system:9, Bad file descriptor)">
Traceback (most recent call last):
  File "/usr/local/lib/python3.9/dist-packages/ducktape/tests/runner_client.py", line 135, in run
    data = self.run_test()
  File "/usr/local/lib/python3.9/dist-packages/ducktape/tests/runner_client.py", line 227, in run_test
    return self.test_context.function(self.test)
  File "/usr/local/lib/python3.9/dist-packages/ducktape/mark/_mark.py", line 476, in wrapper
    return functools.partial(f, *args, **kwargs)(*w_args, **w_kwargs)
  File "/root/tests/rptest/services/cluster.py", line 47, in wrapped
    self.redpanda.raise_on_bad_logs(allow_list=log_allow_list)
  File "/root/tests/rptest/services/redpanda.py", line 1071, in raise_on_bad_logs
    raise BadLogLines(bad_lines)
rptest.services.utils.BadLogLines: <BadLogLines nodes=docker-rp-4(4) example="ERROR 2022-05-09 07:27:08,466 [shard 1] cloud_storage - [fiber79~3~0|1|24722ms] - remote.cc:109 - System error std::__1::system_error (error system:9, Bad file descriptor)">

There is a "Bad file descriptor" error in the log of one of the redpanda nodes.

@jcsp
Copy link
Contributor Author

jcsp commented May 9, 2022

Looking at the recent change, the main thing is the segment sizes changing from 5MB to 20MB or 100MB. The increased granularity could go some way to explaining the bad_alloc, but the file descriptor error is a bit surprising (bigger segments = fewer FDs).

@jcsp
Copy link
Contributor Author

jcsp commented May 9, 2022

On the FD error (https://buildkite.com/redpanda/redpanda/builds/9898#38cf2489-dd4e-4457-86db-9d3d1d27f849), the specific is:

ERROR 2022-05-09 07:27:08,466 [shard 1] cloud_storage - [fiber79~3~0|1|24722ms] - remote.cc:109 - System error std::__1::system_error (error system:9, Bad file descriptor)
WARN  2022-05-09 07:27:08,466 [shard 1] cloud_storage - [fiber79~3~0|1|24722ms] - remote.cc:391 - Uploading segment {"19be5f92/kafka/topic-kmqizqears/17_18/18-1-v1.log.1"} to {panda-bucket-5827d038-cf69-11ec-a9a5-0242ac12001b}, {failed}, segment not uploaded

So this was happening during upload.

I think it's happening because a segment is getting deleted out from under us.

INFO  2022-05-09 07:26:57,460 [shard 1] storage - segment.cc:626 - Creating new segment /var/lib/redpanda/data/kafka/topic-kmqizqears/5_18/18-1-v1.log
DEBUG 2022-05-09 07:27:03,042 [shard 1] archival - [fiber72~3|0|30000ms kafka/topic-kmqizqears/5] - ntp_archiver_service.cc:230 - Uploading segment {source segment offsets: {term:1, base_offset:18, committed_offset:35, dirty_offset:35}, exposed_name: {18-1-v1.log}, starting_offset: 18, file_offset: 0, content_length: 17001819, final_offset: 35, final_file_offset: 17001819} to {"40d9a701/kafka/topic-kmqizqears/5_18/18-1-v1.log.1"}
DEBUG 2022-05-09 07:27:03,042 [shard 1] cloud_storage - [fiber72~3~0|1|30000ms] - remote.cc:331 - Uploading segment to path {"40d9a701/kafka/topic-kmqizqears/5_18/18-1-v1.log.1"}, length 17001819
DEBUG 2022-05-09 07:27:03,042 [shard 1] cloud_storage - [fiber72~3~0|1|30000ms] - remote.cc:337 - Uploading segment to path {"40d9a701/kafka/topic-kmqizqears/5_18/18-1-v1.log.1"}
/40d9a701/kafka/topic-kmqizqears/5_18/18-1-v1.log.1
PUT /40d9a701/kafka/topic-kmqizqears/5_18/18-1-v1.log.1 HTTP/1.1
1 0 PUT /40d9a701/kafka/topic-kmqizqears/5_18/18-1-v1.log.1 HTTP/1.1
TRACE 2022-05-09 07:27:03,042 [shard 1] http - [/40d9a701/kafka/topic-kmqizqears/5_18/18-1-v1.log.1] - client.cc:83 - client.make_request PUT HTTP/1.1
DEBUG 2022-05-09 07:27:03,042 [shard 1] http - [/40d9a701/kafka/topic-kmqizqears/5_18/18-1-v1.log.1] - client.cc:128 - about to start connecting, false, is-closed false
TRACE 2022-05-09 07:27:04,581 [shard 1] http - [/40d9a701/kafka/topic-kmqizqears/5_18/18-1-v1.log.1] - client.cc:153 - connection timeout
TRACE 2022-05-09 07:27:05,993 [shard 1] http - [/40d9a701/kafka/topic-kmqizqears/5_18/18-1-v1.log.1] - client.cc:153 - connection timeout
TRACE 2022-05-09 07:27:07,036 [shard 1] http - [/40d9a701/kafka/topic-kmqizqears/5_18/18-1-v1.log.1] - client.cc:153 - connection timeout
DEBUG 2022-05-09 07:27:07,810 [shard 1] http - [/40d9a701/kafka/topic-kmqizqears/5_18/18-1-v1.log.1] - client.cc:159 - connected, true
INFO  2022-05-09 07:27:08,382 [shard 1] storage - disk_log_impl.cc:938 - remove_prefix_full_segments - tombstone & delete segment: {offset_tracker:{term:1, base_offset:18, committed_offset:35, dirty_offset:35}, compacted_segment=0, finished_self_compaction=0, reader={/var/lib/redpanda/data/kafka/topic-kmqizqears/5_18/18-1-v1.log, (17001819 bytes)}, writer=nullptr, cache={cache_size=0}, compaction_index:nullopt, closed=0, tombstone=0, index={file:/var/lib/redpanda/data/kafka/topic-kmqizqears/5_18/18-1-v1.base_index, offsets:{18}, index:{header_bitflags:0, base_offset:{18}, max_offset:{35}, base_timestamp:{timestamp: 1652081213569}, max_timestamp:{timestamp: 1652081219640}, index(17,17,17)}, step:32768, needs_persistence:0}}
TRACE 2022-05-09 07:27:08,382 [shard 1] storage - segment.cc:84 - closing segment: {offset_tracker:{term:1, base_offset:18, committed_offset:35, dirty_offset:35}, compacted_segment=0, finished_self_compaction=0, reader={/var/lib/redpanda/data/kafka/topic-kmqizqears/5_18/18-1-v1.log, (17001819 bytes)}, writer=nullptr, cache={cache_size=0}, compaction_index:nullopt, closed=1, tombstone=1, index={file:/var/lib/redpanda/data/kafka/topic-kmqizqears/5_18/18-1-v1.base_index, offsets:{18}, index:{header_bitflags:0, base_offset:{18}, max_offset:{35}, base_timestamp:{timestamp: 1652081213569}, max_timestamp:{timestamp: 1652081219640}, index(17,17,17)}, step:32768, needs_persistence:0}} 
INFO  2022-05-09 07:27:08,405 [shard 1] storage - segment.cc:106 - removing: {"/var/lib/redpanda/data/kafka/topic-kmqizqears/5_18/18-1-v1.log", "/var/lib/redpanda/data/kafka/topic-kmqizqears/5_18/18-1-v1.base_index"}
WARN  2022-05-09 07:27:08,471 [shard 1] cloud_storage - [fiber72~3~0|1|24571ms] - remote.cc:391 - Uploading segment {"40d9a701/kafka/topic-kmqizqears/5_18/18-1-v1.log.1"} to {panda-bucket-5827d038-cf69-11ec-a9a5-0242ac12001b}, {failed}, segment not uploaded

It's possible this is showing up with larger segment sizes, because they take longer to upload, creating a longer time window for the delete to overlap with the upload.

AIUI we're not meant to delete things until they're in S3, no matter the retention settings, so this could be a pretty serious bug.

@jcsp jcsp added the area/cloud-storage Shadow indexing subsystem label May 9, 2022
@jcsp
Copy link
Contributor Author

jcsp commented May 9, 2022

The FD issue just reproduced on clustered ducktape, so it's not related to the resource constraints in docker.

This reproduced 1 time in 40 runs (a --repeat 10 run of franz_go_verifiable_test.py::FranzGoVerifiableWithSiTest), so not super frequent, but not super rare.

@jcsp jcsp changed the title Failure in FranzGoVerifiableWithSiTest.test_si_with_timeboxed (and FranzGoVerifiableWithSiTest.test_si_without_timeboxed) cloud_storage: Segment deleted while S3 upload in progress (Failure in FranzGoVerifiableWithSiTest.test_si_with_timeboxed) May 9, 2022
@jcsp jcsp changed the title cloud_storage: Segment deleted while S3 upload in progress (Failure in FranzGoVerifiableWithSiTest.test_si_with_timeboxed) cloud_storage: Segment deleted while S3 upload in progress (Failure in FranzGoVerifiableWithSiTest.test_si_with_timeboxed and test_si_without_timeboxed) May 9, 2022
@NyaliaLui
Copy link
Contributor

NyaliaLui commented May 9, 2022

@NyaliaLui I think these are a recent addition from you

Yes, the changes were in response to a need to test varying SI params.

@rystsov
Copy link
Contributor

rystsov commented May 10, 2022

Another instance, same symptoms (removal before upload) - https://buildkite.com/redpanda/redpanda/builds/9879#758a00a6-d312-4be5-b266-b12986129c3e

@jcsp
Copy link
Contributor Author

jcsp commented May 11, 2022

@abhijat
Copy link
Contributor

abhijat commented May 12, 2022

discussed with @Lazin and @ztlpn - thanks for pointing me in the right direction, this issue is more related to leadership transfer than a segment being deleted while being uploaded.

Basically leadership is transferred while the leader is waiting for an http client to upload, and by the time it gets a handle to the http client it has lost leadership. Because the new leader uploads the segment, the archival metadata stm offset is forwarded and gc deletes the file from underneath the old leader's upload.

The rough sequence of events (in one failure instance with two nodes rp-4 and rp-15)

rp4 starts upload as leader, has to wait for http client from pool
DEBUG 2022-05-09 07:27:03,193 [shard 1] cloud_storage - [fiber79~3~0|1|30000ms] - remote.cc:331 - Uploading segment to path {"19be5f92/kafka/topic-kmqizqears/17_18/18-1-v1.log.1"}, length 17001819

rp4 transfer of leadership to rp15
INFO  2022-05-09 07:27:04,239 [shard 1] raft - [group_id:18, {kafka/topic-kmqizqears/17}] consensus.cc:2754 - Starting leadership transfer from {id: {1}, revision: {18}} to {id: {3}, revision: {18}} in term 1

rp15 (leader) starts upload, acquires client, upload successful
DEBUG 2022-05-09 07:27:04,304 [shard 1] archival - [fiber65~1|0|30000ms kafka/topic-kmqizqears/17] - ntp_archiver_service.cc:230 - Uploading segment {source segment offsets: {term:1, base_offset:18, committed_offset:35, dirty_offset:35}, exposed_name: {18-1-v1.log}, starting_offset: 18, file_offset: 0, content_length: 17001819, final_offset: 35, final_file_offset: 17001819} to {"19be5f92/kafka/topic-kmqizqears/17_18/18-1-v1.log.2"}
DEBUG 2022-05-09 07:27:04,304 [shard 1] cloud_storage - [fiber65~1~0|1|30000ms] - remote.cc:331 - Uploading segment to path {"19be5f92/kafka/topic-kmqizqears/17_18/18-1-v1.log.2"}, length 17001819
DEBUG 2022-05-09 07:27:04,304 [shard 1] cloud_storage - [fiber65~1~0|1|30000ms] - remote.cc:337 - Uploading segment to path {"19be5f92/kafka/topic-kmqizqears/17_18/18-1-v1.log.2"}

rp15 propagates last offset to followers (this should update rp4._last_offset to 35)
INFO  2022-05-09 07:27:04,771 [shard 1] cluster - ntp: {kafka/topic-kmqizqears/17} - archival_metadata_stm.cc:179 - new remote segment added (name: {18-1-v1.log}, base_offset: 18 last_offset: 35), remote start_offset: 0, last_offset: 35

rp4 finally acquired http client (upload should be able to progress)
DEBUG 2022-05-09 07:27:07,274 [shard 1] cloud_storage - [fiber79~3~0|1|25919ms] - remote.cc:337 - Uploading segment to path {"19be5f92/kafka/topic-kmqizqears/17_18/18-1-v1.log.1"}

rp4 gc activity (now max collectible offset == last_offset == 35 so delete is ok)
INFO  2022-05-09 07:27:08,221 [shard 1] storage - segment.cc:106 - removing: {"/var/lib/redpanda/data/kafka/topic-kmqizqears/73_18/17-1-v1.log", "/var/lib/redpanda/data/kafka/topic-kmqizqears/73_18/17-1-v1.base_index"}

rp4 upload failed because file deleted
ERROR 2022-05-09 07:27:08,466 [shard 1] cloud_storage - [fiber79~3~0|1|24722ms] - remote.cc:109 - System error std::__1::system_error (error system:9, Bad file descriptor)
WARN  2022-05-09 07:27:08,466 [shard 1] cloud_storage - [fiber79~3~0|1|24722ms] - remote.cc:391 - Uploading segment {"19be5f92/kafka/topic-kmqizqears/17_18/18-1-v1.log.1"} to {panda-bucket-5827d038-cf69-11ec-a9a5-0242ac12001b}, {failed}, segment not uploaded

The only suspicious part is that the gc happened a second after the old leader acquired the http client, so the old leader should have been able to finish the upload in that duration, but looking at the http client acquisition, during that time period many uploads were queued up, so it may be that the uploads were slow.

Confirming currently with another failure instance if this understanding is consistent and if so it can be fixed with a leadership check before doing the upload after waiting for http client, we may have lost leadership while waiting for a client from the pool.

The existing read lock added in the linked PR can be removed.

@abhijat
Copy link
Contributor

abhijat commented May 12, 2022

In the other instance, it looks like the upload has already started when leadership transfers and the file is deleted, the leader does not wait on acquiring client.

 $ egrep 'kafka/topic-eqxmcazhmu/6.*Starting leadership transfer|topic-eqxmcazhmu/6_18/0-1-v1.log|Bad file' redpanda.log 
INFO  2022-05-11 10:22:20,245 [shard 0] storage - segment.cc:626 - Creating new segment /var/lib/redpanda/data/kafka/topic-eqxmcazhmu/6_18/0-1-v1.log
DEBUG 2022-05-11 10:22:53,467 [shard 0] archival - [fiber51~1|0|30000ms kafka/topic-eqxmcazhmu/6] - ntp_archiver_service.cc:230 - Uploading segment {source segment offsets: {term:1, base_offset:0, committed_offset:107, dirty_offset:107}, exposed_name: {0-1-v1.log}, starting_offset: 0, file_offset: 0, content_length: 106010741, final_offset: 106, final_file_offset: 106010741} to {"2757a095/kafka/topic-eqxmcazhmu/6_18/0-1-v1.log.1"}
DEBUG 2022-05-11 10:22:53,467 [shard 0] cloud_storage - [fiber51~1~0|1|30000ms] - remote.cc:331 - Uploading segment to path {"2757a095/kafka/topic-eqxmcazhmu/6_18/0-1-v1.log.1"}, length 106010741
DEBUG 2022-05-11 10:22:53,467 [shard 0] cloud_storage - [fiber51~1~0|1|30000ms] - remote.cc:337 - Uploading segment to path {"2757a095/kafka/topic-eqxmcazhmu/6_18/0-1-v1.log.1"}
/2757a095/kafka/topic-eqxmcazhmu/6_18/0-1-v1.log.1
PUT /2757a095/kafka/topic-eqxmcazhmu/6_18/0-1-v1.log.1 HTTP/1.1
1 0 PUT /2757a095/kafka/topic-eqxmcazhmu/6_18/0-1-v1.log.1 HTTP/1.1
TRACE 2022-05-11 10:22:53,467 [shard 0] http - [/2757a095/kafka/topic-eqxmcazhmu/6_18/0-1-v1.log.1] - client.cc:83 - client.make_request PUT HTTP/1.1
DEBUG 2022-05-11 10:22:53,467 [shard 0] http - [/2757a095/kafka/topic-eqxmcazhmu/6_18/0-1-v1.log.1] - client.cc:128 - about to start connecting, false, is-closed false
DEBUG 2022-05-11 10:22:53,475 [shard 0] http - [/2757a095/kafka/topic-eqxmcazhmu/6_18/0-1-v1.log.1] - client.cc:159 - connected, true
INFO  2022-05-11 10:22:53,540 [shard 0] raft - [group_id:7, {kafka/topic-eqxmcazhmu/6}] consensus.cc:2754 - Starting leadership transfer from {id: {1}, revision: {18}} to {id: {2}, revision: {18}} in term 1
INFO  2022-05-11 10:22:56,123 [shard 0] storage - disk_log_impl.cc:938 - remove_prefix_full_segments - tombstone & delete segment: {offset_tracker:{term:1, base_offset:0, committed_offset:107, dirty_offset:107}, compacted_segment=0, finished_self_compaction=0, reader={/var/lib/redpanda/data/kafka/topic-eqxmcazhmu/6_18/0-1-v1.log, (107010838 bytes)}, writer=nullptr, cache={cache_size=10}, compaction_index:nullopt, closed=0, tombstone=0, index={file:/var/lib/redpanda/data/kafka/topic-eqxmcazhmu/6_18/0-1-v1.base_index, offsets:{0}, index:{header_bitflags:0, base_offset:{0}, max_offset:{107}, base_timestamp:{timestamp: 1652264540245}, max_timestamp:{timestamp: 1652264569690}, index(108,108,108)}, step:32768, needs_persistence:0}}
TRACE 2022-05-11 10:22:56,123 [shard 0] storage - segment.cc:84 - closing segment: {offset_tracker:{term:1, base_offset:0, committed_offset:107, dirty_offset:107}, compacted_segment=0, finished_self_compaction=0, reader={/var/lib/redpanda/data/kafka/topic-eqxmcazhmu/6_18/0-1-v1.log, (107010838 bytes)}, writer=nullptr, cache={cache_size=10}, compaction_index:nullopt, closed=1, tombstone=1, index={file:/var/lib/redpanda/data/kafka/topic-eqxmcazhmu/6_18/0-1-v1.base_index, offsets:{0}, index:{header_bitflags:0, base_offset:{0}, max_offset:{107}, base_timestamp:{timestamp: 1652264540245}, max_timestamp:{timestamp: 1652264569690}, index(108,108,108)}, step:32768, needs_persistence:0}} 
INFO  2022-05-11 10:22:56,130 [shard 0] storage - segment.cc:106 - removing: {"/var/lib/redpanda/data/kafka/topic-eqxmcazhmu/6_18/0-1-v1.log", "/var/lib/redpanda/data/kafka/topic-eqxmcazhmu/6_18/0-1-v1.base_index"}
ERROR 2022-05-11 10:22:56,425 [shard 0] cloud_storage - [fiber51~1~0|1|27048ms] - remote.cc:109 - System error std::__1::system_error (error system:9, Bad file descriptor)
WARN  2022-05-11 10:22:56,425 [shard 0] cloud_storage - [fiber51~1~0|1|27048ms] - remote.cc:391 - Uploading segment {"2757a095/kafka/topic-eqxmcazhmu/6_18/0-1-v1.log.1"} to {panda-bucket-2e7daaf8-d114-11ec-89fd-0242ac120033}, {failed}, segment not uploaded

might have to handle the error with a leadership check on upload failure, and ignore the error if the leadership was lost during upload.

This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants