Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crash: assertion in offset_translator_state (EndToEndTopicRecovery test_restore failed) #5241

Closed
LenaAn opened this issue Jun 27, 2022 · 15 comments
Assignees
Labels
area/cloud-storage Shadow indexing subsystem area/tests ci-failure kind/bug Something isn't working

Comments

@LenaAn
Copy link
Contributor

LenaAn commented Jun 27, 2022

https://buildkite.com/redpanda/redpanda/builds/11728#0181a4d3-f70e-46e8-8299-0d796bdd5947

test_id:    rptest.tests.e2e_topic_recovery_test.EndToEndTopicRecovery.test_restore.message_size=10000.num_messages=100000.recovery_overrides=.retention.bytes.1024

status:     FAIL

run time:   6 minutes 14.865 seconds





    NodeCrash([(<ducktape.cluster.cluster.ClusterNode object at 0xffff88708610>, "ERROR 2022-06-27 11:57:55,423 [shard 1] assert - Assert failure: (../../../src/v/storage/offset_translator_state.cc:135) 'base_offset > rbegin->first' ntp {kafka/topic-pblgqsmnyx/0}: trying to add batch to offset translator at offset 9223372036854775807 that is not higher than the previous last offset 9223372036854775807\n"), (<ducktape.cluster.cluster.ClusterNode object at 0xffff88708c10>, "ERROR 2022-06-27 11:57:55,423 [shard 1] assert - Assert failure: (../../../src/v/storage/offset_translator_state.cc:135) 'base_offset > rbegin->first' ntp {kafka/topic-pblgqsmnyx/0}: trying to add batch to offset translator at offset 9223372036854775807 that is not higher than the previous last offset 9223372036854775807\n")])

Traceback (most recent call last):

  File "/root/tests/rptest/services/cluster.py", line 35, in wrapped

    r = f(self, *args, **kwargs)

  File "/root/tests/rptest/tests/e2e_topic_recovery_test.py", line 156, in test_restore

    self._restore_topic(topic_spec, recovery_overrides)

  File "/root/tests/rptest/tests/e2e_topic_recovery_test.py", line 106, in _restore_topic

    rpk.describe_topic_configs(topic)

  File "/root/tests/rptest/clients/rpk.py", line 250, in describe_topic_configs

    output = self._run_topic(cmd)

  File "/root/tests/rptest/clients/rpk.py", line 421, in _run_topic

    return self._execute(cmd, stdin=stdin, timeout=timeout)

  File "/root/tests/rptest/clients/rpk.py", line 537, in _execute

    raise RpkException(

rptest.clients.rpk.RpkException: RpkException<command /var/lib/buildkite-agent/builds/arm64-xfs-builders-i-00e9cbf44a7a92cca-1/redpanda/redpanda/vbuild/release/clang/dist/local/redpanda/bin/rpk topic --brokers docker-rp-23:9092,docker-rp-19:9092,docker-rp-9:9092 describe topic-pblgqsmnyx -c returned 1, output:  error: unable to request configs: unable to dial: dial tcp 172.18.0.18:9092: connect: connection refused

>



During handling of the above exception, another exception occurred:



Traceback (most recent call last):

  File "/usr/local/lib/python3.9/dist-packages/ducktape/tests/runner_client.py", line 135, in run

    data = self.run_test()

  File "/usr/local/lib/python3.9/dist-packages/ducktape/tests/runner_client.py", line 227, in run_test

    return self.test_context.function(self.test)

  File "/usr/local/lib/python3.9/dist-packages/ducktape/mark/_mark.py", line 476, in wrapper

    return functools.partial(f, *args, **kwargs)(*w_args, **w_kwargs)

  File "/root/tests/rptest/services/cluster.py", line 38, in wrapped

    self.redpanda.raise_on_crash()

  File "/root/tests/rptest/services/redpanda.py", line 993, in raise_on_crash

    raise NodeCrash(crashes)

rptest.services.utils.NodeCrash: <NodeCrash (docker-rp-23,docker-rp-19) docker-rp-23: ERROR 2022-06-27 11:57:55,423 [shard 1] assert - Assert failure: (../../../src/v/storage/offset_translator_state.cc:135) 'base_offset > rbegin->first' ntp {kafka/topic-pblgqsmnyx/0}: trying to add batch to offset translator at offset 9223372036854775807 that is not higher than the previous last offset 9223372036854775807
@VladLazar
Copy link
Contributor

@twmb
Copy link
Contributor

twmb commented Jul 5, 2022

@twmb
Copy link
Contributor

twmb commented Jul 5, 2022

@LenaAn
Copy link
Contributor Author

LenaAn commented Jul 6, 2022

@VadimPlh
Copy link
Contributor

VadimPlh commented Jul 7, 2022

@piyushredpanda piyushredpanda assigned Lazin and unassigned graphcareful Jul 7, 2022
@mmaslankaprv
Copy link
Member

@NyaliaLui
Copy link
Contributor

@ztlpn
Copy link
Contributor

ztlpn commented Jul 13, 2022

@NyaliaLui
Copy link
Contributor

Same test but different failure. I'm marking it here because it seems related
https://buildkite.com/redpanda/redpanda/builds/12511#0181f85e-2379-494e-8a8b-28fc629a1d95/1634-8782

@VadimPlh
Copy link
Contributor

@VadimPlh
Copy link
Contributor

@Lazin
Copy link
Contributor

Lazin commented Aug 2, 2022

This should be fixed now by #5544
The offset translator assertion is triggered by the empty segments (recovery removes config batches and they can become empty).

@Lazin Lazin closed this as completed Aug 2, 2022
@BenPope
Copy link
Member

BenPope commented Oct 11, 2022

Saw this on v21.1.x: https://buildkite.com/redpanda/redpanda/builds/16459#0183c6d5-beaa-4b01-bdbe-dd8ca1e8aeff.

@Lazin I guess #5544 is too big to backport?

@BenPope
Copy link
Member

BenPope commented Oct 24, 2022

@BenPope BenPope reopened this Oct 24, 2022
@jcsp
Copy link
Contributor

jcsp commented Nov 10, 2022

I think we probably won't do a bespoke version of #5544 for 22.1.x unless this becomes an issue in the field

@jcsp jcsp closed this as completed Nov 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/cloud-storage Shadow indexing subsystem area/tests ci-failure kind/bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.