-
Notifications
You must be signed in to change notification settings - Fork 579
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
test_kafka_streams_page_view: "Unexpected files in data directory" #7680
Comments
@NyaliaLui -- this is back again? Can you please quickly triage this? Does the test need an overhaul? Would love to have this test be consistently passing and not be back with failures often (as I think it has been -- but I could be wrong?) |
The test failed because the |
Yeah it looks like the failing ConsumerGroupTest at #7681 may have left state in the data directory. This is still a working hypothesis as I'm still looking for confirmation but it is likely that 7681 also caused all the other failures in that DT run. |
@dotnwat why do you think it's |
@rystsov my interpretation of @NyaliaLui's investigation was that this test passes the vast majority of the time, but some other test in this case had a clean-up issue that this test discovered causing the failure. so there wasn't an indication of any underlying problem with redpanda. happy to change sev level. still trying to understand how to apply sev label. |
Looking back at this failure, it looks like the This is the same reason that #6490 failed but in that ticket, the consumer failed on So, I think this failure is not related to this test and therefore can be closed. |
TLDR - I confirmed locally that the failure in this ticket is caused by the consumer in I attempted to recreate this error locally by: def stop_node(self, node):
self._stopping.set()
# node.account.kill_process("java", clean_shutdown=True)
# try:
# wait_until(lambda: self._done is None or self._done == True,
# timeout_sec=10)
# except:
# self.logger.warn(
# f"{self._instance_name} running on {node.name} failed to stop gracefully"
# )
# node.account.kill_process("java", clean_shutdown=False)
# wait_until(
# lambda: self._done is None or self._done == True,
# timeout_sec=5,
# err_msg=
# f"{self._instance_name} running on {node.name} failed to stop after SIGKILL"
# ) I then ran the consumer group test with the kafka streams test to see what happens First we see that the consumer group test with [INFO:2023-02-09 21:22:23,682]: RunnerClient: rptest.tests.consumer_group_test.ConsumerGroupTest.test_dead_group_recovery.static_members=False: Setting up...
[WARNING - 2023-02-09 21:22:24,789 - redpanda - cov_enabled - lineno:2590]: enable_cov should be one of 'ON', or 'OFF'
[WARNING - 2023-02-09 21:22:24,792 - redpanda - cov_enabled - lineno:2590]: enable_cov should be one of 'ON', or 'OFF'
[WARNING - 2023-02-09 21:22:24,826 - redpanda - cov_enabled - lineno:2590]: enable_cov should be one of 'ON', or 'OFF'
[INFO:2023-02-09 21:22:28,906]: RunnerClient: rptest.tests.consumer_group_test.ConsumerGroupTest.test_dead_group_recovery.static_members=False: Running...
[WARNING - 2023-02-09 21:22:31,232 - service - clean_node - lineno:304]: KafkaCliConsumer-0-140525057230112: clean_node has not been overriden. This may be fine if the service leaves no persistent state.
[WARNING - 2023-02-09 21:22:31,233 - service - clean_node - lineno:304]: KafkaCliConsumer-1-140525057239904: clean_node has not been overriden. This may be fine if the service leaves no persistent state.
[WARNING - 2023-02-09 21:22:35,531 - service - clean_node - lineno:304]: RpkProducer-0-140525057230400: clean_node has not been overriden. This may be fine if the service leaves no persistent state.
[ERROR - 2023-02-09 21:32:41,686 - cluster - wrapped - lineno:41]: Test failed, doing failure checks...
Traceback (most recent call last):
File "/root/tests/rptest/services/cluster.py", line 35, in wrapped
r = f(self, *args, **kwargs)
File "/root/tests/rptest/tests/consumer_group_test.py", line 391, in test_dead_group_recovery
c.wait()
File "/usr/local/lib/python3.10/dist-packages/ducktape/services/background_thread.py", line 72, in wait
super(BackgroundThreadService, self).wait(timeout_sec)
File "/usr/local/lib/python3.10/dist-packages/ducktape/services/service.py", line 267, in wait
raise TimeoutError("Timed out waiting %s seconds for service nodes to finish. " % str(timeout_sec)
ducktape.errors.TimeoutError: Timed out waiting 600 seconds for service nodes to finish. These nodes are still alive: ['KafkaCliConsumer-0-140525057230112 node 1 on docker-rp-4']
[INFO:2023-02-09 21:32:41,888]: RunnerClient: rptest.tests.consumer_group_test.ConsumerGroupTest.test_dead_group_recovery.static_members=False: FAIL: TimeoutError("Timed out waiting 600 seconds for service nodes to finish. These nodes are still alive: ['KafkaCliConsumer-0-140525057230112 node 1 on docker-rp-4']")
Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/ducktape/tests/runner_client.py", line 135, in run
data = self.run_test()
File "/usr/local/lib/python3.10/dist-packages/ducktape/tests/runner_client.py", line 227, in run_test
return self.test_context.function(self.test)
File "/usr/local/lib/python3.10/dist-packages/ducktape/mark/_mark.py", line 476, in wrapper
return functools.partial(f, *args, **kwargs)(*w_args, **w_kwargs)
File "/root/tests/rptest/services/cluster.py", line 35, in wrapped
r = f(self, *args, **kwargs)
File "/root/tests/rptest/tests/consumer_group_test.py", line 391, in test_dead_group_recovery
c.wait()
File "/usr/local/lib/python3.10/dist-packages/ducktape/services/background_thread.py", line 72, in wait
super(BackgroundThreadService, self).wait(timeout_sec)
File "/usr/local/lib/python3.10/dist-packages/ducktape/services/service.py", line 267, in wait
raise TimeoutError("Timed out waiting %s seconds for service nodes to finish. " % str(timeout_sec)
ducktape.errors.TimeoutError: Timed out waiting 600 seconds for service nodes to finish. These nodes are still alive: ['KafkaCliConsumer-0-140525057230112 node 1 on docker-rp-4'] Then the next tests fail because of unexpected files in the data directory test_id: rptest.tests.consumer_group_test.ConsumerGroupTest.test_dead_group_recovery.static_members=True
status: FAIL
run time: 11.379 seconds
RuntimeError('Unexpected files in data directory')
Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/ducktape/tests/runner_client.py", line 133, in run
self.setup_test()
File "/usr/local/lib/python3.10/dist-packages/ducktape/tests/runner_client.py", line 218, in setup_test
self.test.setup()
File "/usr/local/lib/python3.10/dist-packages/ducktape/tests/test.py", line 91, in setup
self.setUp()
File "/root/tests/rptest/tests/redpanda_test.py", line 99, in setUp
self.redpanda.start()
File "/root/tests/rptest/services/redpanda.py", line 1033, in start
raise RuntimeError("Unexpected files in data directory")
RuntimeError: Unexpected files in data directory
Test requested 6 nodes, used only 3
--------------------------------------------------------------------------------
test_id: rptest.tests.compatibility.kafka_streams_test.KafkaStreamsPageView.test_kafka_streams_page_view
status: FAIL
run time: 11.359 seconds
RuntimeError('Unexpected files in data directory')
Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/ducktape/tests/runner_client.py", line 133, in run
self.setup_test()
File "/usr/local/lib/python3.10/dist-packages/ducktape/tests/runner_client.py", line 218, in setup_test
self.test.setup()
File "/usr/local/lib/python3.10/dist-packages/ducktape/tests/test.py", line 91, in setup
self.setUp()
File "/root/tests/rptest/tests/redpanda_test.py", line 99, in setUp
self.redpanda.start()
File "/root/tests/rptest/services/redpanda.py", line 1033, in start
raise RuntimeError("Unexpected files in data directory")
RuntimeError: Unexpected files in data directory
Test requested 5 nodes, used only 3 So I look at the debug log to see what files are in the data directory
We see that Therefore, I am closing this ticket because it seems that the |
https://buildkite.com/redpanda/redpanda/builds/19571#0184f582-22ce-49b9-80cb-5c237459b23f
The text was updated successfully, but these errors were encountered: