Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CI Failure (franz_go_verifiable_services stuck) in EndToEndTopicRecovery.test_restore #6099

Closed
rystsov opened this issue Aug 19, 2022 · 3 comments
Assignees
Labels
ci-failure kind/bug Something isn't working pr-blocker CI failures blocking a PR from being merged

Comments

@rystsov
Copy link
Contributor

rystsov commented Aug 19, 2022

https://buildkite.com/redpanda/redpanda/builds/14311#0182b1e9-cd33-421b-a908-d7400ba039df

Module: rptest.tests.e2e_topic_recovery_test
Class:  EndToEndTopicRecovery
Method: test_restore
Arguments:
{
  "message_size": 5000,
  "num_messages": 100000,
  "recovery_overrides": {
    "retention.bytes": 1024
  }
}
test_id:    rptest.tests.e2e_topic_recovery_test.EndToEndTopicRecovery.test_restore.message_size=5000.num_messages=100000.recovery_overrides=.retention.bytes.1024
status:     FAIL
run time:   11 minutes 24.628 seconds

    TimeoutError("Timed out waiting 600 seconds for service nodes to finish. These nodes are still alive: ['FranzGoVerifiableSeqConsumer-0-140012869777200 node 1 on docker-rp-24']")
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/ducktape/tests/runner_client.py", line 135, in run
    data = self.run_test()
  File "/usr/local/lib/python3.10/dist-packages/ducktape/tests/runner_client.py", line 227, in run_test
    return self.test_context.function(self.test)
  File "/usr/local/lib/python3.10/dist-packages/ducktape/mark/_mark.py", line 476, in wrapper
    return functools.partial(f, *args, **kwargs)(*w_args, **w_kwargs)
  File "/root/tests/rptest/services/cluster.py", line 35, in wrapped
    r = f(self, *args, **kwargs)
  File "/root/tests/rptest/tests/e2e_topic_recovery_test.py", line 181, in test_restore
    self._consumer.wait()
  File "/usr/local/lib/python3.10/dist-packages/ducktape/services/background_thread.py", line 72, in wait
    super(BackgroundThreadService, self).wait(timeout_sec)
  File "/usr/local/lib/python3.10/dist-packages/ducktape/services/service.py", line 267, in wait
    raise TimeoutError("Timed out waiting %s seconds for service nodes to finish. " % str(timeout_sec)
ducktape.errors.TimeoutError: Timed out waiting 600 seconds for service nodes to finish. These nodes are still alive: ['FranzGoVerifiableSeqConsumer-0-140012869777200 node 1 on docker-rp-24']
@rystsov rystsov added kind/bug Something isn't working ci-failure labels Aug 19, 2022
@rystsov
Copy link
Contributor Author

rystsov commented Aug 19, 2022

franz_go_verifiable_services got stuck and blocked shutdown (wait)

[DEBUG - 2022-08-18 17:50:06,129 - franz_go_verifiable_services - execute_cmd - lineno:74]: time="2022-08-18T17:50:06Z" level=info msg="Loading offsets for topic topic-hspkvsvycm t=-1..."
[DEBUG - 2022-08-18 17:50:06,337 - franz_go_verifiable_services - execute_cmd - lineno:74]: time="2022-08-18T17:50:06Z" level=warning msg="error fetching topic-hspkvsvycm/0 metadata: NOT_LEADER_FOR_PARTITION: This server is not the leader for that topic-partition."
[DEBUG - 2022-08-18 17:50:06,337 - franz_go_verifiable_services - execute_cmd - lineno:74]: time="2022-08-18T17:50:06Z" level=warning msg="Retrying getOffsets in 2s"

@rystsov rystsov added the pr-blocker CI failures blocking a PR from being merged label Aug 19, 2022
@rystsov
Copy link
Contributor Author

rystsov commented Aug 19, 2022

blocks #6003

@jcsp
Copy link
Contributor

jcsp commented Aug 24, 2022

These services are mostly rewritten in tip of dev, so this is likely to be fixed (or if there is something wrong, fail in a different way). Can always reopen if not.

@jcsp jcsp closed this as completed Aug 24, 2022
This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ci-failure kind/bug Something isn't working pr-blocker CI failures blocking a PR from being merged
Projects
None yet
Development

No branches or pull requests

3 participants