-
Notifications
You must be signed in to change notification settings - Fork 577
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] storage
: support tombstone deletion from local storage
#23071
base: dev
Are you sure you want to change the base?
Conversation
bcea3df
to
997e416
Compare
new failures in https://buildkite.com/redpanda/redpanda/builds/53587#0191922c-b7b6-4d78-b444-b263992c20e0:
new failures in https://buildkite.com/redpanda/redpanda/builds/53587#0191922c-b7b4-4fbd-a703-01479e803964:
new failures in https://buildkite.com/redpanda/redpanda/builds/53587#01919243-2760-465b-80a5-c9f47c58c8f8:
new failures in https://buildkite.com/redpanda/redpanda/builds/53587#01919243-275f-487d-9db9-479c6ea8a0d9:
new failures in https://buildkite.com/redpanda/redpanda/builds/53587#01919243-2762-4723-b723-8d945538b78e:
new failures in https://buildkite.com/redpanda/redpanda/builds/53587#01919243-275d-4c40-a027-dd23429cbd9b:
new failures in https://buildkite.com/redpanda/redpanda/builds/53587#0191922c-b7b2-4e4a-af49-64a93fc496f8:
new failures in https://buildkite.com/redpanda/redpanda/builds/53587#0191922c-b7b0-4235-bef3-a44ffd4db104:
new failures in https://buildkite.com/redpanda/redpanda/builds/53712#01919af2-9c3f-4ed9-b50d-f97d61e2c5be:
new failures in https://buildkite.com/redpanda/redpanda/builds/53719#01919b86-74f8-4cf4-bcdb-c914e778b976:
new failures in https://buildkite.com/redpanda/redpanda/builds/53719#01919b9f-8a1d-4545-9d2f-a4fa52280484:
|
76e6110
to
4b57747
Compare
/ci-repeat 1 |
4b57747
to
68d258e
Compare
/ci-repeat 1 |
68d258e
to
aec0057
Compare
/ci-repeat 1 |
aec0057
to
5264d4c
Compare
/ci-repeat 1 |
5264d4c
to
65930d8
Compare
/ci-repeat 1 |
65930d8
to
a487743
Compare
/ci-repeat 1 |
a487743
to
99f11cf
Compare
/ci-repeat 1 |
99f11cf
to
ab1a387
Compare
1a2d890
to
7b56fa8
Compare
This property should have been marked with needing a restart, since `redpanda` only attempts to construct archival/tiered storage related objects upon start-up, when it first checks this cluster property. Correct the property by marking it as `needs_restart::yes`.
To keep with the style of other prints in `redpanda`, when the `tristate` is in a `Not Set` state, output `{{nullopt}}` instead of simply `{{}}`.
Non-functional change. The logic here is equivalent to the helper function `is_engaged()`. Use it for improved readability and code de-duplication.
Non-functional change to make this segment of code entirely explicit and not rely on knowledge of `tristate`'s default constructor.
There was buggy behavior for "sticky" properties and empty `std::optional<>`s created as a result of `rpk topic alter-config [TOPIC] --delete [PROPERTY]`. "Sticky" properties take the cluster config default at topic construction time, but do not indicate their source by DYNAMIC_TOPIC_CONFIG (we have an override mechanism to lie and force it as DEFAULT_CONFIG), and furthermore, do not "fall back" on the cluster default at any time in the future. Currently, `redpanda.remote.read` and `redpanda.remote.write` exhibit this "sticky" topic property trait. We had a bug, triggered by this sequence of events: 1. `cloud_storage_enable_remote_read=false` at cluster level. 2. `rpk topic create t` 3. `rpk topic describe t` (correctly) describes `redpanda.remote.read=false` 4. `rpk cluster config set cloud_storage_enable_remote_read=true` 5. `rpk topic alter-config t --delete redpanda.remote.read` 6. `rpk topic describe t` (incorrectly) describes `redpanda.remote.read=true` We now have rpk topic describe printing the cluster default and indicating topic `t` has remote read permissions, but this is not actually the case as far as the topic is concerned (topic property is `std::nullopt`, once again, we don't fall back on cluster default). Fix the behavior here by always providing an `override` value in `make_topic_configs()` for these two properties.
Consider the case in which we have 3 segments: S: [S1] [S2] [ S3 ] K: |K1| |K2| | K1 | V: |V1| |V2| |null| The current condition for `num_compactible_records > 1` in `may_have_compactible_records()` would result in these segments being removed from the range used for window compaction, and prevent the tombstone value for `K1` in `S3` from being applied to `K1` in `S1`. This condition is mostly due to historical reasons, in which we didn't want to have completely empty segments post compaction. This issue is solved by the placeholder feature. Adjust it to `num_compactible_records > 0` to allow the above case to work as expected. This change should not have any other dramatic effects on the process of compaction. Also modify tests that use `may_have_compactible_records()` to reflect the updated behavior.
7b56fa8
to
3c44f80
Compare
79c0e85
to
aa5abe8
Compare
Previously, `has_value()` would consider an empty value as one with a value, which is a confusing definition, as Kafka producers intend for an empty value in a key-value pair to represent a tombstone. Alter the logic of `has_value()` to consider empty values in a record as the `false` case, and add `is_tombstone()` as a more descriptive helper function.
Persist the timestamp at which every record up to and including those in this segment were first compacted via sliding window in the `index_state`. This will indicate whether or not a segment can be considered "clean" or "dirty" still during compaction.
...in the `segment_index`.
We use `seg->mark_as_finished_window_compaction()` to indicate that a segment has been through a full round of window compaction, whether it is completely de-duplicated ("clean") or only partially indexed (still "dirty"). Add `mark_segment_as_finished_window_compaction()` to `segment_utils` as a helper function to help mark a segment as completed window compaction, and whether it is "clean" (in which case we mark the `clean_compact_timestamp` in the `segment_index`).
For use during the self-compaction and window compaction process in order to tell whether a record should be retained or not (in the case that it is a tombstone record, with a value set for `tombstone_delete_horizon`).
Utility function for getting the optional `timestamp` past which tombstones can be removed. This returns a value iff the segment `s` has been marked as cleanly compacted, and the compaction_config has a value assigned for `tombstone_retention_ms`. In all other cases, `std::nullopt` is returned, indicating that tombstone records will not be removed if encountered.
During the copying process in self compaction, we can check for any tombstone record that has been marked clean by the sliding window compaction process. If it has been marked clean, and the current timestamp is past the tombstone delete horizon defined by `clean_compact_timestamp + tombstone.retention.ms`, it is eligible for deletion. Add logic to the `should_keep()` function used in the `copy_reducer` which removes tombstones during the copy process.
During the deduplication process in sliding window compaction, if a tombstone record has already been seen and is past the tombstone horizon set by the `clean_compact_timestamp + tombstone.retention.ms`, it is eligible for deletion. Add logic to the `copy_reducer` which removes tombstones during the deduplication process.
For ease of adding tombstone records to a partition in fixture tests.
…mpacted` This fixture test was timing out often in debug mode. Reduce the number of test cases to allow it to pass without timing out.
Adding support for `tombstone_retention_ms`, which is a cluster property that will have support for topic-level overriding. This is equivalent to Kafka's `delete_retention_ms`, but is more precisely named, and represents the retention time for tombstone records in compacted topics in `redpanda`.
Plumb the `tombstone_retention_ms` config through `topic_configuration`, `topic_properties`, and `ntp_config`. Also modify necessary compatibility sites. Also modify `tools/offline_log_viewer` to be compatible with the updated `serde`.
When applying `update_topic_properties_cmd`, there may be some checks that depend on multiple properties. The current motivating case is ensuring that `tombstone.retention.ms` is not enabled at the same time as any tiered storage properties. Add `topic_table::topic_multi_property_validation()` to perform these checks and abort the topic property update if the `properties` are found to be invalid.
And fix tests that use kafka topic configuration properties.
aa5abe8
to
a3788da
Compare
WIP of support for Kafka's
delete_retention_ms
(more aptly named inredpanda
astombstone_retention_ms
💯) and tombstone removal.Likely to be split up into several PRs in the future for ease of review. Some of the first few commits are clean-up for the current compaction implementation.
Many, many more fixture and ducktape tests to come.
Backports Required
Release Notes