Failure in `TopicRecoveryTest.test_size_based_retention` #4887

ZeDRoman · 2022-05-23T15:11:52Z

Build: https://buildkite.com/redpanda/redpanda/builds/10396#677124b6-8fb4-418b-bd49-d89e63578bd7

FAIL test: TopicRecoveryTest.test_size_based_retention (1/19 runs)
  failure at 2022-05-23T07:38:51.539Z: AssertionError('Too much or not enough data restored, expected 10485760 got 10209301')
      in job https://buildkite.com/redpanda/redpanda/builds/10396#677124b6-8fb4-418b-bd49-d89e63578bd7

Error:



test_id:    rptest.tests.topic_recovery_test.TopicRecoveryTest.test_size_based_retention
--
  | status:     FAIL
  | run time:   51.011 seconds
  |  
  |  
  | AssertionError('Too much or not enough data restored, expected 10485760 got 10209301')
  | Traceback (most recent call last):
  | File "/usr/local/lib/python3.9/dist-packages/ducktape/tests/runner_client.py", line 135, in run
  | data = self.run_test()
  | File "/usr/local/lib/python3.9/dist-packages/ducktape/tests/runner_client.py", line 227, in run_test
  | return self.test_context.function(self.test)
  | File "/root/tests/rptest/services/cluster.py", line 35, in wrapped
  | r = f(self, *args, **kwargs)
  | File "/root/tests/rptest/tests/topic_recovery_test.py", line 1293, in test_size_based_retention
  | self.do_run(test_case)
  | File "/root/tests/rptest/tests/topic_recovery_test.py", line 1180, in do_run
  | test_case.validate_cluster(baseline, restored)
  | File "/root/tests/rptest/tests/topic_recovery_test.py", line 776, in validate_cluster
  | assert is_close_size(size_bytes, self.restored_size_bytes), \
  | AssertionError: Too much or not enough data restored, expected 10485760 got 10209301

The text was updated successfully, but these errors were encountered:

ZeDRoman · 2022-05-24T08:00:51Z

Another instance
https://buildkite.com/redpanda/redpanda/builds/10430#bf475072-ee06-4cf1-b034-d113419c57ce

twmb · 2022-05-25T02:48:33Z

https://buildkite.com/redpanda/redpanda/builds/10497#d2403fb1-cfed-4737-95b0-b71a5302541b

ZeDRoman · 2022-05-26T09:26:13Z

+1 https://buildkite.com/redpanda/redpanda/builds/10588#0180ff1a-dff2-4299-9999-249a10842283

jcsp · 2022-05-27T09:52:41Z

6/97 runs failed in last 72h -- this one is quite frequent.

VadimPlh · 2022-05-29T14:12:57Z

Again https://buildkite.com/redpanda/redpanda/builds/10689#01810e8c-1e39-4a42-b13c-0fb654cd2373

andrewhsu · 2022-05-31T22:51:48Z

seen again https://buildkite.com/redpanda/redpanda/builds/10693#0181137f-16d2-4b57-8b92-1c7b8ff7c5ee/1561-8435

in PR #4940

VadimPlh · 2022-06-01T09:28:07Z

Again https://buildkite.com/redpanda/redpanda/builds/10751#01811ab4-2845-486a-93d3-8649f66bc5f2

NyaliaLui · 2022-06-01T21:40:36Z

Seen again in https://buildkite.com/redpanda/redpanda/builds/10797#01811ded-c17c-4599-a7b1-10bae0e0238e/1565-8091

VadimPlh · 2022-06-02T09:09:26Z

Again https://buildkite.com/redpanda/redpanda/builds/10876#01812326-8604-4165-a379-f580b3a8e712

NyaliaLui · 2022-06-03T21:39:27Z

Another https://buildkite.com/redpanda/redpanda/builds/10901#0181274c-bc00-496d-b823-09d36d047edc/1567-8053

ztlpn · 2022-06-06T13:18:59Z

one more https://buildkite.com/redpanda/redpanda/builds/10970#018137b8-8fad-403f-b8cc-2e5fce55fb60

ztlpn · 2022-06-07T12:28:39Z

https://buildkite.com/redpanda/redpanda/builds/11002#01813cd4-e0b3-4e92-ac3e-681fe2d6e08b

ajfabbri · 2022-06-07T18:03:52Z

https://buildkite.com/redpanda/redpanda/builds/10998#01813c5e-1bd6-4fb7-aaad-c52d03bdca78

BenPope · 2022-06-15T20:15:45Z

https://buildkite.com/redpanda/redpanda/builds/11327#018165bf-98fa-416c-95c2-3d8470ddb1a0

jcsp · 2022-07-04T10:08:31Z

4/738 failures in last 30 days.

Most recent failure on dev https://buildkite.com/redpanda/redpanda/builds/11002#01813cd4-e0b3-4e92-ac3e-681fe2d6e08b

BenPope · 2022-10-11T07:56:06Z

v22.2.x https://buildkite.com/redpanda/redpanda/builds/15369#018342c9-26ad-4e53-803b-3d84e126aa8d

piyushredpanda · 2022-10-11T19:52:15Z

@ZeDRoman is helping pick this up. Thanks, Roman.

ZeDRoman · 2022-10-14T16:22:09Z

Reason of Failure:

In Shadow Indexing we have option to recover size more or equal to retention.bytes . So Shadow Indexing would download segments until sum of their sizes become more or equal to retention.bytes property. (partition_recovery_manager.cc download_log_with_capped_size)

In Disk log GC we start to delete segments if their total size more than retention.bytes . So after GC we would have total size less or equal to retention.bytes . (disk_log_impl.cc size_based_gc_max_offset)

So when they are working together we have such behavior: SI downloads segments more than retention.bytes then Disk log GC removes one segment because total size more than retention.bytes .

It turned out in TopicRecoveryTest.test_size_based_retention. SI downloads segments, then segments are automatically deleted by Disk log GC, then we check that SI downloaded more than retention.bytes and test fails (because segment was deleted).

Solution:
Evgeny Lazin proposed that we need to adjust this behavior to download strictly less than retention bytes

ZeDRoman added kind/bug Something isn't working ci-failure labels May 23, 2022

piyushredpanda assigned Lazin May 24, 2022

twmb mentioned this issue May 25, 2022

rpk: rename httpError for httpResponseError #4912

Merged

abhijat mentioned this issue May 26, 2022

cloud_storage: redacts fields from header #4939

Merged

dotnwat mentioned this issue May 26, 2022

non-functional changes spotted in review post-merge #4938

Merged

This was referenced May 28, 2022

application: handle unwanted positional args #4941

Merged

tests: use rpk producer for even record distribution #4928

Merged

dotnwat mentioned this issue Jun 1, 2022

[v22.1.x] admin: reject maintenance mode req on 1 node cluster #4986

Merged

ajfabbri mentioned this issue Jun 2, 2022

Reject writes as needed to avoid full disks (v1) #4803

Merged

mmaslankaprv mentioned this issue Jun 3, 2022

Fixed serialization of group tombstones #4901

Merged

jcsp mentioned this issue Jun 7, 2022

redpanda: remove _redpanda_enabled=false mode #4324

Merged

Lazin mentioned this issue Jun 8, 2022

rptest: Disable two flaky topic recovery tests #5070

Merged

r-vasquez mentioned this issue Jun 24, 2022

[v22.1.x] rpk: fix ipv6 parsing of ParseHostMaybeScheme function #5230

Merged

mmedenjak added the area/tests label Jul 5, 2022

mmedenjak added the area/cloud-storage Shadow indexing subsystem label Jul 21, 2022

rystsov mentioned this issue Sep 9, 2022

CI Failure (restored partition is too small) in TopicRecoveryTest.test_fast2 #6356

Closed

rystsov added the ci-disabled-test label Sep 16, 2022

mmedenjak assigned andijcr and unassigned Lazin Sep 28, 2022

piyushredpanda assigned ZeDRoman and unassigned andijcr Oct 11, 2022

ZeDRoman mentioned this issue Oct 17, 2022

Topic recovery download with capped size. Download not more than retention.policy #6797

Merged

6 tasks

mmedenjak closed this as completed in #6797 Oct 19, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Failure in `TopicRecoveryTest.test_size_based_retention` #4887

Failure in `TopicRecoveryTest.test_size_based_retention` #4887

ZeDRoman commented May 23, 2022

ZeDRoman commented May 24, 2022

twmb commented May 25, 2022

ZeDRoman commented May 26, 2022

jcsp commented May 27, 2022

VadimPlh commented May 29, 2022

andrewhsu commented May 31, 2022

VadimPlh commented Jun 1, 2022

NyaliaLui commented Jun 1, 2022

VadimPlh commented Jun 2, 2022

NyaliaLui commented Jun 3, 2022

ztlpn commented Jun 6, 2022

ztlpn commented Jun 7, 2022

ajfabbri commented Jun 7, 2022

BenPope commented Jun 15, 2022

jcsp commented Jul 4, 2022

BenPope commented Oct 11, 2022

piyushredpanda commented Oct 11, 2022

ZeDRoman commented Oct 14, 2022 •

edited

Loading

Failure in TopicRecoveryTest.test_size_based_retention #4887

Failure in TopicRecoveryTest.test_size_based_retention #4887

Comments

ZeDRoman commented May 23, 2022

ZeDRoman commented May 24, 2022

twmb commented May 25, 2022

ZeDRoman commented May 26, 2022

jcsp commented May 27, 2022

VadimPlh commented May 29, 2022

andrewhsu commented May 31, 2022

VadimPlh commented Jun 1, 2022

NyaliaLui commented Jun 1, 2022

VadimPlh commented Jun 2, 2022

NyaliaLui commented Jun 3, 2022

ztlpn commented Jun 6, 2022

ztlpn commented Jun 7, 2022

ajfabbri commented Jun 7, 2022

BenPope commented Jun 15, 2022

jcsp commented Jul 4, 2022

BenPope commented Oct 11, 2022

piyushredpanda commented Oct 11, 2022

ZeDRoman commented Oct 14, 2022 • edited Loading

Failure in `TopicRecoveryTest.test_size_based_retention` #4887

Failure in `TopicRecoveryTest.test_size_based_retention` #4887

ZeDRoman commented Oct 14, 2022 •

edited

Loading