Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

filesystem errors during KgoVerifierWithSiTestLargeSegments.test_si_without_timeboxed #5878

Closed
abhijat opened this issue Aug 6, 2022 · 5 comments
Assignees
Labels
area/cloud-storage Shadow indexing subsystem kind/bug Something isn't working

Comments

@abhijat
Copy link
Contributor

abhijat commented Aug 6, 2022

[INFO  - 2022-08-06 07:54:38,171 - runner_client - log - lineno:278]: RunnerClient: rptest.scale_tests.franz_go_verifiable_test.FranzGoVerifiableWithSiTest.test_si_without_timeboxed.segment_size=104857600: FAIL: <BadLogLines nodes=ip-172-31-50-177(32),ip-172-31-50-215(15),ip-172-31-58-52(24) example="ERROR 2022-08-06 05:03:17,873 [shard 2] cloud_storage - [fiber3~8~37~0|1|46632ms] - remote.cc:114 - System error std::__1::__fs::filesystem::filesystem_error (error system:2, filesystem error: stat failed: No such file or directory ["/var/lib/redpanda/data/cloud_storage_cache/08cdff59/kafka/topic-nfwsdorgnf/76_21/0-1-v1.log.1.index"])">
Traceback (most recent call last):
  File "/home/ubuntu/.local/lib/python3.10/site-packages/ducktape/tests/runner_client.py", line 135, in run
    data = self.run_test()
  File "/home/ubuntu/.local/lib/python3.10/site-packages/ducktape/tests/runner_client.py", line 227, in run_test
    return self.test_context.function(self.test)
  File "/home/ubuntu/.local/lib/python3.10/site-packages/ducktape/mark/_mark.py", line 476, in wrapper
    return functools.partial(f, *args, **kwargs)(*w_args, **w_kwargs)
  File "/home/ubuntu/redpanda/tests/rptest/services/cluster.py", line 48, in wrapped
    self.redpanda.raise_on_bad_logs(allow_list=log_allow_list)
  File "/home/ubuntu/redpanda/tests/rptest/services/redpanda.py", line 1122, in raise_on_bad_logs
    raise BadLogLines(bad_lines)
rptest.services.utils.BadLogLines: <BadLogLines nodes=ip-172-31-50-177(32),ip-172-31-50-215(15),ip-172-31-58-52(24) example="ERROR 2022-08-06 05:03:17,873 [shard 2] cloud_storage - [fiber3~8~37~0|1|46632ms] - remote.cc:114 - System error std::__1::__fs::filesystem::filesystem_error (error system:2, filesystem error: stat failed: No such file or directory ["/var/lib/redpanda/data/cloud_storage_cache/08cdff59/kafka/topic-nfwsdorgnf/76_21/0-1-v1.log.1.index"])">

these are repeated quite often during scale test:

[WARNING - 2022-08-06 07:54:38,169 - redpanda - raise_on_bad_logs - lineno:1107]: [test_si_without_timeboxed] Unexpected log line on ip-172-31-58-52: ERROR 2022-08-06 07:00:49,780 [shard 3] cloud_storage - [fiber6~75~9~0|1|49342ms] - remote.cc:114 - System error std::__1::__fs::filesystem::filesystem_error (error system:2, filesystem error: stat failed: No such file or directory ["/var/lib/redpanda/data/cloud_storage_cache/6d4ce52f/kafka/topic-nfwsdorgnf/9_21/40-2-v1.log.2"])
[WARNING - 2022-08-06 07:54:38,169 - redpanda - raise_on_bad_logs - lineno:1107]: [test_si_without_timeboxed] Unexpected log line on ip-172-31-58-52: ERROR 2022-08-06 07:03:16,346 [shard 2] cloud_storage - [fiber13~70~159~0|1|46513ms] - remote.cc:114 - System error std::__1::__fs::filesystem::filesystem_error (error system:2, filesystem error: stat failed: No such file or directory ["/var/lib/redpanda/data/cloud_storage_cache/980a4514/kafka/topic-nfwsdorgnf/52_21/101-1-v1.log.1.index"])
[WARNING - 2022-08-06 07:54:38,169 - redpanda - raise_on_bad_logs - lineno:1107]: [test_si_without_timeboxed] Unexpected log line on ip-172-31-58-52: ERROR 2022-08-06 07:06:35,736 [shard 2] cloud_storage - [fiber13~72~9~0|1|47297ms] - remote.cc:114 - System error std::__1::__fs::filesystem::filesystem_error (error system:2, filesystem error: stat failed: No such file or directory ["/var/lib/redpanda/data/cloud_storage_cache/a9ea197e/kafka/topic-nfwsdorgnf/52_21/0-1-v1.log.1.index"])
[WARNING - 2022-08-06 07:54:38,169 - redpanda - raise_on_bad_logs - lineno:1107]: [test_si_without_timeboxed] Unexpected log line on ip-172-31-58-52: ERROR 2022-08-06 07:09:48,815 [shard 1] cloud_storage - [fiber18~76~29~0|1|58942ms] - remote.cc:114 - System error std::__1::__fs::filesystem::filesystem_error (error system:2, filesystem error: stat failed: No such file or directory ["/var/lib/redpanda/data/cloud_storage_cache/20743e08/kafka/topic-nfwsdorgnf/15_21/0-1-v1.log.2.index"])
[WARNING - 2022-08-06 07:54:38,169 - redpanda - raise_on_bad_logs - lineno:1107]: [test_si_without_timeboxed] Unexpected log line on ip-172-31-58-52: ERROR 2022-08-06 07:18:07,990 [shard 1] cloud_storage - [fiber18~82~66~0|1|55568ms] - remote.cc:114 - System error std::__1::__fs::filesystem::filesystem_error (error system:2, filesystem error: stat failed: No such file or directory ["/var/lib/redpanda/data/cloud_storage_cache/20743e08/kafka/topic-nfwsdorgnf/15_21/0-1-v1.log.2"])
[WARNING - 2022-08-06 07:54:38,169 - redpanda - raise_on_bad_logs - lineno:1107]: [test_si_without_timeboxed] Unexpected log line on ip-172-31-58-52: ERROR 2022-08-06 07:18:22,213 [shard 3] cloud_storage - [fiber10~75~13~0|1|45536ms] - remote.cc:114 - System error std::__1::__fs::filesystem::filesystem_error (error system:2, filesystem error: stat failed: No such file or directory ["/var/lib/redpanda/data/cloud_storage_cache/9314b74e/kafka/topic-nfwsdorgnf/97_21/110-1-v1.log.1.index"])
[WARNING - 2022-08-06 07:54:38,169 - redpanda - raise_on_bad_logs - lineno:1107]: [test_si_without_timeboxed] Unexpected log line on ip-172-31-58-52: ERROR 2022-08-06 07:21:31,008 [shard 1] rpc - server.cc:126 - kafka rpc protocol - Error[applying protocol] remote address: 205.210.31.19:57193 - std::runtime_error (Unexpected EOF for client ID)
[WARNING - 2022-08-06 07:54:38,169 - redpanda - raise_on_bad_logs - lineno:1107]: [test_si_without_timeboxed] Unexpected log line on ip-172-31-58-52: ERROR 2022-08-06 07:29:50,532 [shard 3] cloud_storage - [fiber21~45~87~0|1|46615ms] - remote.cc:114 - System error std::__1::__fs::filesystem::filesystem_error (error system:2, filesystem error: stat failed: No such file or directory ["/var/lib/redpanda/data/cloud_storage_cache/228ea5bd/kafka/topic-nfwsdorgnf/45_21/0-1-v1.log.1.index"])
[WARNING - 2022-08-06 07:54:38,169 - redpanda - raise_on_bad_logs - lineno:1107]: [test_si_without_timeboxed] Unexpected log line on ip-172-31-58-52: ERROR 2022-08-06 07:42:05,136 [shard 3] cloud_storage - [fiber10~87~12~0|1|45809ms] - remote.cc:114 - System error std::__1::__fs::filesystem::filesystem_error (error system:2, filesystem error: stat failed: No such file or directory ["/var/lib/redpanda/data/cloud_storage_cache/9314b74e/kafka/topic-nfwsdorgnf/97_21/110-1-v1.log.1.index"])
[WARNING - 2022-08-06 07:54:38,170 - redpanda - raise_on_bad_logs - lineno:1107]: [test_si_without_timeboxed] Unexpected log line on ip-172-31-58-52: ERROR 2022-08-06 07:50:37,605 [shard 3] cloud_storage - [fiber6~92~319~0|1|56266ms] - remote.cc:114 - System error std::__1::__fs::filesystem::filesystem_error (error system:2, filesystem error: stat failed: No such file or directory ["/var/lib/redpanda/data/cloud_storage_cache/12426a3f/kafka/topic-nfwsdorgnf/9_21/0-1-v1.log.2"])
[WARNING - 2022-08-06 07:54:38,170 - redpanda - raise_on_bad_logs - lineno:1107]: [test_si_without_timeboxed] Unexpected log line on ip-172-31-58-52: ERROR 2022-08-06 07:52:44,012 [shard 2] cloud_storage - [fiber13~85~145~0|1|50993ms] - remote.cc:114 - System error std::__1::__fs::filesystem::filesystem_error (error system:2, filesystem error: stat failed: No such file or directory ["/var/lib/redpanda/data/cloud_storage_cache/a9ea197e/kafka/topic-nfwsdorgnf/52_21/0-1-v1.log.1.index"])
@abhijat abhijat added kind/bug Something isn't working area/cloud-storage Shadow indexing subsystem labels Aug 6, 2022
@abhijat abhijat self-assigned this Aug 8, 2022
@Lazin
Copy link
Contributor

Lazin commented Aug 8, 2022

Looks like cache eviction removes files right after they're downloaded or generated.
Or maybe the node is out of disk space.

@dotnwat
Copy link
Member

dotnwat commented Aug 9, 2022

@abhijat
Copy link
Contributor Author

abhijat commented Aug 9, 2022

these errors should hopefully reduce/disappear with #5915 but we still need to investigate root cause, so we can keep ticket open.

It is easy to reproduce them for investigation, we can just set the cache size to 5mb on the CDT driver node in the scale_tests/franz_go..py test module.

@rystsov rystsov changed the title filesystem errors during FranzGoVerifiableWithSiTest/test_si_without_timeboxed/segment_size=104857600 filesystem errors during KgoVerifierWithSiTest/test_si_without_timeboxed/segment_size=104857600 Aug 25, 2022
@rystsov rystsov changed the title filesystem errors during KgoVerifierWithSiTest/test_si_without_timeboxed/segment_size=104857600 filesystem errors during KgoVerifierWithSiTestLargeSegments.test_si_without_timeboxed Aug 25, 2022
@piyushredpanda
Copy link
Contributor

this hasn't been seen in a long while but @abhijat wants to investigate/RCA. Adding to v22.3 stretch items which is a placeholder for such tickets.

@jcsp
Copy link
Contributor

jcsp commented Nov 3, 2022

This looks the same as #6601

Was fixed by #6794

@jcsp jcsp closed this as completed Nov 3, 2022
This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/cloud-storage Shadow indexing subsystem kind/bug Something isn't working
Projects
None yet
Development

No branches or pull requests

5 participants