Topic recovery download with capped size. Download not more than retention.policy #6797

ZeDRoman · 2022-10-17T17:17:30Z

Cover letter

In Shadow Indexing we have option to recover size more or equal to retention.bytes . So Shadow Indexing would download segments until sum of their sizes become more or equal to retention.bytes property. (partition_recovery_manager.cc download_log_with_capped_size)

In Disk log GC we start to delete segments if their total size more than retention.bytes . So after GC we would have total size less or equal to retention.bytes . (disk_log_impl.cc size_based_gc_max_offset)

So when they are working together we have such behavior: SI downloads segments more than retention.bytes then Disk log GC removes one segment because total size more than retention.bytes .

It turned out in TopicRecoveryTest.test_size_based_retention. SI downloads segments, then segments are automatically deleted by Disk log GC, then we check that SI downloaded more than retention.bytes and test fails (because segment was deleted).

Fixes #4887

Backport Required

jcsp · 2022-10-18T09:42:01Z

src/v/cloud_storage/partition_recovery_manager.cc

@@ -364,7 +364,7 @@ partition_downloader::download_log_with_capped_size(
    model::offset_delta start_delta{0};
    for (auto it = offset_map.rbegin(); it != offset_map.rend(); it++) {
        const auto& meta = it->second.meta;
-        if (total_size > max_size) {
+        if (total_size + meta.size_bytes > max_size) {


This probably needs a special case to download at least one segment. Otherwise if someone has e.g. 1GB segments, but sets retention to 100MB, then they will recover nothing.

Yeah, you are right

jcsp · 2022-10-18T09:43:16Z

tests/rptest/utils/si_utils.py

+    """
+    size_bytes_per_ntp = {}
+    segments_sizes_per_ntp = {}
+    for _, data in chk.items():


This is hard to follow, names like data and chk are quite obscure

As well as nicer names, you could declare types in the function definition (chk: SometType, retention_policy: SomeType) to help the reader understand

Initially Shadow Indexing downloaded segments more or equal to retention.policy But in Disk log implementation, GC removes segments until total size will be less or equal to retention.policy So downloading segments to size more than retention.policy is unnecessary because they will be deleted by GC This change makes CI to download less to retention.policy

Lazin · 2022-10-18T12:48:29Z

tests/rptest/utils/si_utils.py

+    """
+    size_bytes_per_ntp = {}
+    segments_sizes_per_ntp = {}
+    for _node, node_segments_reports in nodes_segments_report.items():


node_segments_report shadows the parameter and should probably be renamed

parameter has data of multiple nodes, so its name is
nodes_segments_report

Maybe for _node, report in nodes_segments_report.items()

Even if the variables don't literally shadow each other, it's a bit too subtle to have them differ by just one s in the middle.

changed to report

Now tests checks that SI recover not more than retention.policy size

github-actions bot added the area/redpanda label Oct 17, 2022

mmedenjak added kind/bug Something isn't working area/tests ci-failure area/cloud-storage Shadow indexing subsystem and removed area/redpanda labels Oct 18, 2022

jcsp reviewed Oct 18, 2022

View reviewed changes

ZeDRoman force-pushed the issue-4887 branch from a688fe2 to d9354e6 Compare October 18, 2022 12:20

github-actions bot added the area/redpanda label Oct 18, 2022

Lazin reviewed Oct 18, 2022

View reviewed changes

ducktape: fix for SI test_size_based_retention tests

d3728b8

Now tests checks that SI recover not more than retention.policy size

ZeDRoman force-pushed the issue-4887 branch from d9354e6 to d3728b8 Compare October 18, 2022 15:18

ZeDRoman marked this pull request as ready for review October 18, 2022 15:19

ZeDRoman requested review from Lazin and jcsp October 19, 2022 08:26

jcsp approved these changes Oct 19, 2022

View reviewed changes

mmedenjak merged commit fdc9b57 into redpanda-data:dev Oct 19, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Topic recovery download with capped size. Download not more than retention.policy #6797

Topic recovery download with capped size. Download not more than retention.policy #6797

ZeDRoman commented Oct 17, 2022

jcsp Oct 18, 2022

ZeDRoman Oct 18, 2022

jcsp Oct 18, 2022

jcsp Oct 18, 2022

Lazin Oct 18, 2022

ZeDRoman Oct 18, 2022

jcsp Oct 18, 2022

ZeDRoman Oct 18, 2022

Topic recovery download with capped size. Download not more than retention.policy #6797

Topic recovery download with capped size. Download not more than retention.policy #6797

Conversation

ZeDRoman commented Oct 17, 2022

Cover letter

Backport Required

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment