Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data corruption CRC mismatch in MultiRestartTest.test_recovery_after_multiple_restarts #4638

Closed
rystsov opened this issue May 10, 2022 · 6 comments
Assignees
Labels

Comments

@rystsov
Copy link
Contributor

rystsov commented May 10, 2022

https://buildkite.com/redpanda/redpanda/builds/9879#4eecdac0-67d4-4142-b665-6deab5cb9bdf

Module: rptest.tests.multi_restarts_with_archival_test
Class:  MultiRestartTest
Method: test_recovery_after_multiple_restarts
test_id:    rptest.tests.multi_restarts_with_archival_test.MultiRestartTest.test_recovery_after_multiple_restarts
status:     FAIL
run time:   4 minutes 2.889 seconds


    <BadLogLines nodes=docker-rp-22(1) example="ERROR 2022-05-07 07:17:59,648 [shard 1] storage - parser.cc:176 - detected header corruption. stopping parser. Expected CRC of 3704304617, but got header CRC: 945991046 - {header_crc:945991046, size_bytes:91, base_offset:{172}, type:batch_type::kvstore, crc:2039451389, attrs:{compression:none, type:CreateTime, transactional: 0, control: 0}, last_offset_delta:0, first_timestamp:{timestamp: 1651907878388}, max_timestamp:{timestamp: 1651907878388}, producer_id:72057594037927935, producer_epoch:0, base_sequence:0, record_count:0, ctx:{term:{-9223372036854775808}, owner_shard:{1}}}. consumer:storage::checksumming_consumer segment {offset_tracker:{term:0, base_offset:136, committed_offset:136, dirty_offset:136}, compacted_segment=0, finished_self_compaction=0, reader={/var/lib/redpanda/data/redpanda/kvstore/1_0/136-0-v1.log, (36864 bytes)}, writer=nullptr, cache=nullptr, compaction_index:nullopt, closed=0, tombstone=0, index={file:/var/lib/redpanda/data/redpanda/kvstore/1_0/136-0-v1.base_index, offsets:{136}, index:{header_bitflags:0, base_offset:{136}, max_offset:{171}, base_timestamp:{timestamp: 1651907863237}, max_timestamp:{timestamp: 1651907878372}, index(1,1,1)}, step:32768, needs_persistence:1}}">
Traceback (most recent call last):
  File "/usr/local/lib/python3.9/dist-packages/ducktape/tests/runner_client.py", line 135, in run
    data = self.run_test()
  File "/usr/local/lib/python3.9/dist-packages/ducktape/tests/runner_client.py", line 227, in run_test
    return self.test_context.function(self.test)
  File "/root/tests/rptest/services/cluster.py", line 47, in wrapped
    self.redpanda.raise_on_bad_logs(allow_list=log_allow_list)
  File "/root/tests/rptest/services/redpanda.py", line 1071, in raise_on_bad_logs
    raise BadLogLines(bad_lines)
rptest.services.utils.BadLogLines: <BadLogLines nodes=docker-rp-22(1) example="ERROR 2022-05-07 07:17:59,648 [shard 1] storage - parser.cc:176 - detected header corruption. stopping parser. Expected CRC of 3704304617, but got header CRC: 945991046 - {header_crc:945991046, size_bytes:91, base_offset:{172}, type:batch_type::kvstore, crc:2039451389, attrs:{compression:none, type:CreateTime, transactional: 0, control: 0}, last_offset_delta:0, first_timestamp:{timestamp: 1651907878388}, max_timestamp:{timestamp: 1651907878388}, producer_id:72057594037927935, producer_epoch:0, base_sequence:0, record_count:0, ctx:{term:{-9223372036854775808}, owner_shard:{1}}}. consumer:storage::checksumming_consumer segment {offset_tracker:{term:0, base_offset:136, committed_offset:136, dirty_offset:136}, compacted_segment=0, finished_self_compaction=0, reader={/var/lib/redpanda/data/redpanda/kvstore/1_0/136-0-v1.log, (36864 bytes)}, writer=nullptr, cache=nullptr, compaction_index:nullopt, closed=0, tombstone=0, index={file:/var/lib/redpanda/data/redpanda/kvstore/1_0/136-0-v1.base_index, offsets:{136}, index:{header_bitflags:0, base_offset:{136}, max_offset:{171}, base_timestamp:{timestamp: 1651907863237}, max_timestamp:{timestamp: 1651907878372}, index(1,1,1)}, step:32768, needs_persistence:1}}">
@rystsov
Copy link
Contributor Author

rystsov commented May 10, 2022

@mmaslankaprv
Copy link
Member

mmaslankaprv commented May 16, 2022

Another instance: https://buildkite.com/redpanda/redpanda/builds/10165#545d6f66-3042-4775-9513-c8714e1ae49a

[INFO  - 2022-05-15 07:13:07,801 - runner_client - log - lineno:278]: RunnerClient: rptest.tests.multi_restarts_with_archival_test.MultiRestartTest.test_recovery_after_multiple_restarts: Summary: <BadLogLines nodes=docker-rp-17(1) example="ERROR 2022-05-15 07:10:44,193 [shard 0] storage - parser.cc:176 - detected header corruption. stopping parser. Expected CRC of 1436479202, but got header CRC: 1887436520 - {header_crc:1887436520, size_bytes:129, base_offset:{1513}, type:batch_type::unknown{0}, crc:0, attrs:{compression:none, type:CreateTime, transactional: 0, control: 0}, last_offset_delta:0, first_timestamp:{timestamp: 0}, max_timestamp:{timestamp: 0}, producer_id:0, producer_epoch:0, base_sequence:0, record_count:0, ctx:{term:{-9223372036854775808}, owner_shard:{0}}}. consumer:storage::checksumming_consumer segment {offset_tracker:{term:6, base_offset:1482, committed_offset:1482, dirty_offset:1482}, compacted_segment=1, finished_self_compaction=0, reader={/var/lib/redpanda/data/kafka/__consumer_offsets/0_26/1482-6-v1.log, (8192 bytes)}, writer=nullptr, cache=nullptr, compaction_index:nullopt, closed=0, tombstone=0, index={file:/var/lib/redpanda/data/kafka/__consumer_offsets/0_26/1482-6-v1.base_index, offsets:{1482}, index:{header_bitflags:0, base_offset:{1482}, max_offset:{1512}, base_timestamp:{timestamp: 1652598639001}, max_timestamp:{timestamp: 1652598642496}, index(1,1,1)}, step:32768, needs_persistence:1}}">
Traceback (most recent call last):
  File "/usr/local/lib/python3.9/dist-packages/ducktape/tests/runner_client.py", line 135, in run
    data = self.run_test()
  File "/usr/local/lib/python3.9/dist-packages/ducktape/tests/runner_client.py", line 227, in run_test
    return self.test_context.function(self.test)
  File "/root/tests/rptest/services/cluster.py", line 47, in wrapped
    self.redpanda.raise_on_bad_logs(allow_list=log_allow_list)
  File "/root/tests/rptest/services/redpanda.py", line 1074, in raise_on_bad_logs
    raise BadLogLines(bad_lines)
rptest.services.utils.BadLogLines: <BadLogLines nodes=docker-rp-17(1) example="ERROR 2022-05-15 07:10:44,193 [shard 0] storage - parser.cc:176 - detected header corruption. stopping parser. Expected CRC of 1436479202, but got header CRC: 1887436520 - {header_crc:1887436520, size_bytes:129, base_offset:{1513}, type:batch_type::unknown{0}, crc:0, attrs:{compression:none, type:CreateTime, transactional: 0, control: 0}, last_offset_delta:0, first_timestamp:{timestamp: 0}, max_timestamp:{timestamp: 0}, producer_id:0, producer_epoch:0, base_sequence:0, record_count:0, ctx:{term:{-9223372036854775808}, owner_shard:{0}}}. consumer:storage::checksumming_consumer segment {offset_tracker:{term:6, base_offset:1482, committed_offset:1482, dirty_offset:1482}, compacted_segment=1, finished_self_compaction=0, reader={/var/lib/redpanda/data/kafka/__consumer_offsets/0_26/1482-6-v1.log, (8192 bytes)}, writer=nullptr, cache=nullptr, compaction_index:nullopt, closed=0, tombstone=0, index={file:/var/lib/redpanda/data/kafka/__consumer_offsets/0_26/1482-6-v1.base_index, offsets:{1482}, index:{header_bitflags:0, base_offset:{1482}, max_offset:{1512}, base_timestamp:{timestamp: 1652598639001}, max_timestamp:{timestamp: 1652598642496}, index(1,1,1)}, step:32768, needs_persistence:1}}">

@ajfabbri
Copy link
Contributor

Cool thanks for the details. I'll take a look.

@NyaliaLui
Copy link
Contributor

@ajfabbri
Copy link
Contributor

It looks like this is expected during recovery due to replaying truncated logs (i.e. from abrupt shutdown). Seems like this most recent CI failed build was just after #4969, but I'm double-checking.

@ajfabbri
Copy link
Contributor

The actual commit for lowering the level of this log message (and thus removing the CI failure) was 9bb79f7c, which appears to be after the last reproduction of this bug:

$ FIX=9bb79f7c6ce7933fda5043537cc8470967905f22
$ BUG=bf0d7a6d1f6c7b1e2bb77d3f65c2cd7b371169a1
$ git merge-base --is-ancestor $BUG $FIX; echo $?
1

Based on this information, closing this bug. If the message appears again after that commit which lowered the log level, please reopen this bug.

This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants