Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] storage: support tombstone deletion from local storage #23071

Draft
wants to merge 24 commits into
base: dev
Choose a base branch
from

Conversation

WillemKauf
Copy link
Contributor

@WillemKauf WillemKauf commented Aug 27, 2024

WIP of support for Kafka's delete_retention_ms (more aptly named in redpanda as tombstone_retention_ms 💯) and tombstone removal.

Likely to be split up into several PRs in the future for ease of review. Some of the first few commits are clean-up for the current compaction implementation.

Many, many more fixture and ducktape tests to come.

Backports Required

  • none - not a bug fix
  • none - this is a backport
  • none - issue does not exist in previous branches
  • none - papercut/not impactful enough to backport
  • v24.2.x
  • v24.1.x
  • v23.3.x

Release Notes

  • none

@vbotbuildovich
Copy link
Collaborator

vbotbuildovich commented Aug 27, 2024

new failures in https://buildkite.com/redpanda/redpanda/builds/53587#0191922c-b7b6-4d78-b444-b263992c20e0:

"rptest.tests.archive_retention_test.CloudArchiveRetentionTest.test_delete.cloud_storage_type=CloudStorageType.ABS.retention_type=retention.ms"
"rptest.tests.cloud_retention_test.CloudRetentionTest.test_gc_entire_manifest.cloud_storage_type=CloudStorageType.ABS"
"rptest.tests.cloud_retention_test.CloudRetentionTest.test_cloud_retention.max_consume_rate_mb=20.cloud_storage_type=CloudStorageType.ABS"
"rptest.tests.e2e_shadow_indexing_test.EndToEndShadowIndexingTest.test_reset_spillover.cloud_storage_type=CloudStorageType.ABS"
"rptest.tests.e2e_shadow_indexing_test.EndToEndShadowIndexingTestCompactedTopic.test_compacting_during_leadership_transfer.cloud_storage_type=CloudStorageType.ABS"
"rptest.tests.topic_recovery_test.TopicRecoveryTest.test_fast2.cloud_storage_type=CloudStorageType.ABS"
"rptest.tests.topic_recovery_test.TopicRecoveryTest.test_missing_segment.cloud_storage_type=CloudStorageType.ABS"
"rptest.tests.archival_test.ArchivalTest.test_timeboxed_uploads.acks=0.cloud_storage_type=CloudStorageType.ABS"
"rptest.tests.topic_recovery_test.TopicRecoveryTest.test_many_partitions.cloud_storage_type=CloudStorageType.ABS.check_mode=check_manifest_and_segment_metadata"
"rptest.tests.topic_recovery_test.TopicRecoveryTest.test_no_data.cloud_storage_type=CloudStorageType.ABS"
"rptest.tests.topic_recovery_test.TopicRecoveryTest.test_vcluster_id.cloud_storage_type=CloudStorageType.ABS"
"rptest.tests.topic_recovery_test.TopicRecoveryTest.test_empty_segments.cloud_storage_type=CloudStorageType.ABS"
"rptest.tests.topic_recovery_test.TopicRecoveryTest.test_size_based_retention.cloud_storage_type=CloudStorageType.ABS"

new failures in https://buildkite.com/redpanda/redpanda/builds/53587#0191922c-b7b4-4fbd-a703-01479e803964:

"rptest.tests.e2e_topic_recovery_test.EndToEndTopicRecovery.test_restore_with_aborted_tx.recovery_overrides=.retention.local.target.bytes.1024.redpanda.remote.write.True.redpanda.remote.read.True.cloud_storage_type=CloudStorageType.ABS"
"rptest.tests.archive_retention_test.CloudArchiveRetentionTest.test_delete.cloud_storage_type=CloudStorageType.ABS.retention_type=retention.bytes"
"rptest.tests.topic_delete_test.TopicDeleteCloudStorageTest.topic_delete_cloud_storage_test.disable_delete=False.cloud_storage_type=CloudStorageType.ABS"
"rptest.tests.e2e_topic_recovery_test.EndToEndTopicRecovery.test_restore.message_size=5000.num_messages=100000.recovery_overrides=.retention.local.target.bytes.1024.cloud_storage_type=CloudStorageType.ABS"
"rptest.tests.topic_delete_test.TopicDeleteCloudStorageTest.drop_lifecycle_marker_test.cloud_storage_type=CloudStorageType.ABS"
"rptest.tests.retention_policy_test.ShadowIndexingCloudRetentionTest.test_cloud_size_based_retention.cloud_storage_type=CloudStorageType.ABS"
"rptest.tests.retention_policy_test.ShadowIndexingCloudRetentionTest.test_cloud_time_based_retention.cloud_storage_type=CloudStorageType.ABS"

new failures in https://buildkite.com/redpanda/redpanda/builds/53587#01919243-2760-465b-80a5-c9f47c58c8f8:

"rptest.tests.archive_retention_test.CloudArchiveRetentionTest.test_delete.cloud_storage_type=CloudStorageType.ABS.retention_type=retention.bytes"
"rptest.tests.e2e_topic_recovery_test.EndToEndTopicRecovery.test_restore_with_config_batches.num_messages=2.cloud_storage_type=CloudStorageType.ABS"
"rptest.tests.topic_delete_test.TopicDeleteCloudStorageTest.drop_lifecycle_marker_test.cloud_storage_type=CloudStorageType.ABS"
"rptest.tests.topic_recovery_test.TopicRecoveryTest.test_missing_segment.cloud_storage_type=CloudStorageType.ABS"
"rptest.tests.topic_recovery_test.TopicRecoveryTest.test_fast2.cloud_storage_type=CloudStorageType.ABS"
"rptest.tests.topic_recovery_test.TopicRecoveryTest.test_many_partitions.cloud_storage_type=CloudStorageType.ABS.check_mode=check_manifest_and_segment_metadata"
"rptest.tests.retention_policy_test.ShadowIndexingCloudRetentionTest.test_cloud_size_based_retention.cloud_storage_type=CloudStorageType.ABS"
"rptest.tests.retention_policy_test.ShadowIndexingCloudRetentionTest.test_cloud_time_based_retention.cloud_storage_type=CloudStorageType.ABS"
"rptest.tests.topic_recovery_test.TopicRecoveryTest.test_no_data.cloud_storage_type=CloudStorageType.ABS"
"rptest.tests.topic_recovery_test.TopicRecoveryTest.test_vcluster_id.cloud_storage_type=CloudStorageType.ABS"
"rptest.tests.topic_recovery_test.TopicRecoveryTest.test_empty_segments.cloud_storage_type=CloudStorageType.ABS"
"rptest.tests.topic_recovery_test.TopicRecoveryTest.test_size_based_retention.cloud_storage_type=CloudStorageType.ABS"

new failures in https://buildkite.com/redpanda/redpanda/builds/53587#01919243-275f-487d-9db9-479c6ea8a0d9:

"rptest.tests.e2e_shadow_indexing_test.ShadowIndexingInfiniteRetentionTest.test_segments_not_deleted.cloud_storage_type=CloudStorageType.S3"
"rptest.tests.retention_policy_test.ShadowIndexingCloudRetentionTest.test_cloud_size_based_retention_application.cloud_storage_type=CloudStorageType.S3"
"rptest.tests.e2e_shadow_indexing_test.EndToEndShadowIndexingTest.test_reset_from_cloud.cloud_storage_type=CloudStorageType.ABS"
"rptest.tests.shadow_indexing_compacted_topic_test.ShadowIndexingCompactedTopicTest.test_upload.cloud_storage_type=CloudStorageType.ABS"
"rptest.tests.archival_test.ArchivalTest.test_timeboxed_uploads.acks=-1.cloud_storage_type=CloudStorageType.ABS"
"rptest.tests.archival_test.ArchivalTest.test_timeboxed_uploads.acks=1.cloud_storage_type=CloudStorageType.ABS"

new failures in https://buildkite.com/redpanda/redpanda/builds/53587#01919243-2762-4723-b723-8d945538b78e:

"rptest.tests.archive_retention_test.CloudArchiveRetentionTest.test_delete.cloud_storage_type=CloudStorageType.ABS.retention_type=retention.ms"
"rptest.tests.e2e_shadow_indexing_test.EndToEndShadowIndexingTest.test_reset_spillover.cloud_storage_type=CloudStorageType.ABS"
"rptest.tests.archival_test.ArchivalTest.test_timeboxed_uploads.acks=0.cloud_storage_type=CloudStorageType.ABS"
"rptest.tests.cluster_config_test.ClusterConfigAzureSharedKey.test_live_shared_key_change.cloud_storage_type=CloudStorageType.ABS"
"rptest.tests.topic_recovery_test.TopicRecoveryTest.test_many_partitions.cloud_storage_type=CloudStorageType.ABS.check_mode=check_manifest_existence"

new failures in https://buildkite.com/redpanda/redpanda/builds/53587#01919243-275d-4c40-a027-dd23429cbd9b:

"rptest.tests.e2e_shadow_indexing_test.ShadowIndexingInfiniteRetentionTest.test_segments_not_deleted.cloud_storage_type=CloudStorageType.ABS"
"rptest.tests.retention_policy_test.ShadowIndexingCloudRetentionTest.test_cloud_size_based_retention_application.cloud_storage_type=CloudStorageType.ABS"
"rptest.tests.retention_policy_test.ShadowIndexingCloudRetentionTest.test_topic_recovery_retention_settings"
"rptest.tests.topic_recovery_test.TopicRecoveryTest.test_fast1.cloud_storage_type=CloudStorageType.ABS"
"rptest.tests.topic_recovery_test.TopicRecoveryTest.test_missing_partition.cloud_storage_type=CloudStorageType.ABS"
"rptest.tests.topic_recovery_test.TopicRecoveryTest.test_time_based_retention.cloud_storage_type=CloudStorageType.ABS"
"rptest.tests.topic_recovery_test.TopicRecoveryTest.test_prevent_recovery.cloud_storage_type=CloudStorageType.ABS"
"rptest.tests.topic_recovery_test.TopicRecoveryTest.test_many_partitions.cloud_storage_type=CloudStorageType.ABS.check_mode=no_check"

new failures in https://buildkite.com/redpanda/redpanda/builds/53587#0191922c-b7b2-4e4a-af49-64a93fc496f8:

"rptest.tests.e2e_shadow_indexing_test.ShadowIndexingInfiniteRetentionTest.test_segments_not_deleted.cloud_storage_type=CloudStorageType.S3"
"rptest.tests.retention_policy_test.ShadowIndexingCloudRetentionTest.test_cloud_size_based_retention_application.cloud_storage_type=CloudStorageType.S3"
"rptest.tests.cloud_retention_test.CloudRetentionTest.test_cloud_retention.max_consume_rate_mb=None.cloud_storage_type=CloudStorageType.ABS"
"rptest.tests.e2e_shadow_indexing_test.EndToEndShadowIndexingTestCompactedTopic.test_write.cloud_storage_type=CloudStorageType.ABS"
"rptest.tests.cloud_retention_test.CloudRetentionTimelyGCTest.test_retention_with_node_failures.cloud_storage_type=CloudStorageType.ABS"
"rptest.tests.e2e_shadow_indexing_test.EndToEndShadowIndexingTest.test_reset_from_cloud.cloud_storage_type=CloudStorageType.ABS"
"rptest.tests.shadow_indexing_compacted_topic_test.ShadowIndexingCompactedTopicTest.test_upload.cloud_storage_type=CloudStorageType.ABS"
"rptest.tests.topic_recovery_test.TopicRecoveryTest.test_fast1.cloud_storage_type=CloudStorageType.ABS"
"rptest.tests.topic_recovery_test.TopicRecoveryTest.test_missing_partition.cloud_storage_type=CloudStorageType.ABS"
"rptest.tests.topic_recovery_test.TopicRecoveryTest.test_time_based_retention.cloud_storage_type=CloudStorageType.ABS"
"rptest.tests.archival_test.ArchivalTest.test_timeboxed_uploads.acks=-1.cloud_storage_type=CloudStorageType.ABS"
"rptest.tests.archival_test.ArchivalTest.test_timeboxed_uploads.acks=1.cloud_storage_type=CloudStorageType.ABS"
"rptest.tests.topic_recovery_test.TopicRecoveryTest.test_prevent_recovery.cloud_storage_type=CloudStorageType.ABS"
"rptest.tests.topic_recovery_test.TopicRecoveryTest.test_many_partitions.cloud_storage_type=CloudStorageType.ABS.check_mode=no_check"

new failures in https://buildkite.com/redpanda/redpanda/builds/53587#0191922c-b7b0-4235-bef3-a44ffd4db104:

"rptest.tests.e2e_shadow_indexing_test.ShadowIndexingInfiniteRetentionTest.test_segments_not_deleted.cloud_storage_type=CloudStorageType.ABS"
"rptest.tests.retention_policy_test.ShadowIndexingCloudRetentionTest.test_cloud_size_based_retention_application.cloud_storage_type=CloudStorageType.ABS"
"rptest.tests.retention_policy_test.ShadowIndexingCloudRetentionTest.test_topic_recovery_retention_settings"
"rptest.tests.e2e_topic_recovery_test.EndToEndTopicRecovery.test_restore_with_aborted_tx.recovery_overrides=.cloud_storage_type=CloudStorageType.ABS"
"rptest.tests.topic_delete_test.TopicDeleteCloudStorageTest.partition_movement_test.cloud_storage_type=CloudStorageType.ABS"
"rptest.tests.topic_delete_test.TopicDeleteCloudStorageTest.topic_delete_cloud_storage_test.disable_delete=True.cloud_storage_type=CloudStorageType.ABS"
"rptest.tests.e2e_topic_recovery_test.EndToEndTopicRecovery.test_restore_with_config_batches.num_messages=2.cloud_storage_type=CloudStorageType.ABS"
"rptest.tests.e2e_topic_recovery_test.EndToEndTopicRecovery.test_restore.message_size=5000.num_messages=100000.recovery_overrides=.cloud_storage_type=CloudStorageType.ABS"
"rptest.tests.topic_recovery_test.TopicRecoveryTest.test_many_partitions.cloud_storage_type=CloudStorageType.ABS.check_mode=check_manifest_existence"
"rptest.tests.cluster_config_test.ClusterConfigAzureSharedKey.test_live_shared_key_change.cloud_storage_type=CloudStorageType.ABS"

new failures in https://buildkite.com/redpanda/redpanda/builds/53712#01919af2-9c3f-4ed9-b50d-f97d61e2c5be:

"rptest.tests.describe_topics_test.DescribeTopicsTest.test_describe_topics_with_documentation_and_types"

new failures in https://buildkite.com/redpanda/redpanda/builds/53719#01919b86-74f8-4cf4-bcdb-c914e778b976:

"rptest.tests.describe_topics_test.DescribeTopicsTest.test_describe_topics_with_documentation_and_types"

new failures in https://buildkite.com/redpanda/redpanda/builds/53719#01919b9f-8a1d-4545-9d2f-a4fa52280484:

"rptest.tests.describe_topics_test.DescribeTopicsTest.test_describe_topics_with_documentation_and_types"

@WillemKauf WillemKauf force-pushed the tombstone_deletion branch 2 times, most recently from 76e6110 to 4b57747 Compare August 27, 2024 19:00
@WillemKauf
Copy link
Contributor Author

/ci-repeat 1

@WillemKauf
Copy link
Contributor Author

/ci-repeat 1

@WillemKauf
Copy link
Contributor Author

/ci-repeat 1

@WillemKauf
Copy link
Contributor Author

/ci-repeat 1

@WillemKauf
Copy link
Contributor Author

/ci-repeat 1

@WillemKauf
Copy link
Contributor Author

/ci-repeat 1

@WillemKauf
Copy link
Contributor Author

/ci-repeat 1

This property should have been marked with needing a restart, since
`redpanda` only attempts to construct archival/tiered storage related
objects upon start-up, when it first checks this cluster property.

Correct the property by marking it as `needs_restart::yes`.
To keep with the style of other prints in `redpanda`, when the `tristate`
is in a `Not Set` state, output `{{nullopt}}` instead of simply `{{}}`.
Non-functional change.

The logic here is equivalent to the helper function `is_engaged()`.
Use it for improved readability and code de-duplication.
Non-functional change to make this segment of code entirely explicit
and not rely on knowledge of `tristate`'s default constructor.
There was buggy behavior for "sticky" properties and empty `std::optional<>`s created
as a result of `rpk topic alter-config [TOPIC] --delete [PROPERTY]`.

"Sticky" properties take the cluster config default at topic construction time,
but do not indicate their source by DYNAMIC_TOPIC_CONFIG (we have an override
mechanism to lie and force it as DEFAULT_CONFIG), and furthermore, do not
"fall back" on the cluster default at any time in the future.

Currently, `redpanda.remote.read` and `redpanda.remote.write` exhibit this "sticky"
topic property trait.

We had a bug, triggered by this sequence of events:
  1. `cloud_storage_enable_remote_read=false` at cluster level.
  2. `rpk topic create t`
  3. `rpk topic describe t` (correctly) describes `redpanda.remote.read=false`
  4. `rpk cluster config set cloud_storage_enable_remote_read=true`
  5. `rpk topic alter-config t --delete redpanda.remote.read`
  6. `rpk topic describe t` (incorrectly) describes `redpanda.remote.read=true`

We now have rpk topic describe printing the cluster default and indicating
topic `t` has remote read permissions, but this is not actually the case as
far as the topic is concerned (topic property is `std::nullopt`, once again,
we don't fall back on cluster default).

Fix the behavior here by always providing an `override` value in `make_topic_configs()`
for these two properties.
Consider the case in which we have 3 segments:

S: [S1] [S2] [ S3 ]
K: |K1| |K2| | K1 |
V: |V1| |V2| |null|

The current condition for `num_compactible_records > 1` in
`may_have_compactible_records()` would result in these segments being removed
from the range used for window compaction, and prevent the tombstone value
for `K1` in `S3` from being applied to `K1` in `S1`.

This condition is mostly due to historical reasons, in which we didn't
want to have completely empty segments post compaction. This issue is solved
by the placeholder feature.

Adjust it to `num_compactible_records > 0` to allow the above case to work
as expected. This change should not have any other dramatic effects on the
process of compaction.

Also modify tests that use `may_have_compactible_records()` to reflect
the updated behavior.
@WillemKauf WillemKauf force-pushed the tombstone_deletion branch 5 times, most recently from 79c0e85 to aa5abe8 Compare September 6, 2024 18:39
Previously, `has_value()` would consider an empty value as one with
a value, which is a confusing definition, as Kafka producers intend
for an empty value in a key-value pair to represent a tombstone.

Alter the logic of `has_value()` to consider empty values in a record
as the `false` case, and add `is_tombstone()` as a more descriptive helper
function.
Persist the timestamp at which every record up to and including
those in this segment were first compacted via sliding window in the
`index_state`.

This will indicate whether or not a segment can be considered "clean"
or "dirty" still during compaction.
We use `seg->mark_as_finished_window_compaction()` to indicate
that a segment has been through a full round of window compaction,
whether it is completely de-duplicated ("clean") or only partially
indexed (still "dirty").

Add `mark_segment_as_finished_window_compaction()` to `segment_utils`
as a helper function to help mark a segment as completed window compaction,
and whether it is "clean" (in which case we mark the `clean_compact_timestamp`
in the `segment_index`).
For use during the self-compaction and window compaction process in order to
tell whether a record should be retained or not (in the case that it is a tombstone
record, with a value set for `tombstone_delete_horizon`).
Utility function for getting the optional `timestamp` past which
tombstones can be removed.

This returns a value iff the segment `s` has been marked as cleanly
compacted, and the compaction_config has a value assigned for
`tombstone_retention_ms`. In all other cases, `std::nullopt` is returned,
indicating that tombstone records will not be removed if encountered.
During the copying process in self compaction, we can check for any tombstone
record that has been marked clean by the sliding window compaction process.

If it has been marked clean, and the current timestamp is past the tombstone
delete horizon defined by `clean_compact_timestamp + tombstone.retention.ms`,
it is eligible for deletion.

Add logic to the `should_keep()` function used in the `copy_reducer` which
removes tombstones during the copy process.
During the deduplication process in sliding window compaction,
if a tombstone record has already been seen and is past the tombstone
horizon set by the `clean_compact_timestamp + tombstone.retention.ms`,
it is eligible for deletion.

Add logic to the `copy_reducer` which removes tombstones during the
deduplication process.
For ease of adding tombstone records to a partition in fixture tests.
…mpacted`

This fixture test was timing out often in debug mode.

Reduce the number of test cases to allow it to pass without timing out.
Adding support for `tombstone_retention_ms`, which is a cluster property
that will have support for topic-level overriding. This is equivalent
to Kafka's `delete_retention_ms`, but is more precisely named, and
represents the retention time for tombstone records in compacted topics
in `redpanda`.
Plumb the `tombstone_retention_ms` config through `topic_configuration`,
`topic_properties`, and `ntp_config`.

Also modify necessary compatibility sites.

Also modify `tools/offline_log_viewer` to be compatible with the updated
`serde`.
When applying `update_topic_properties_cmd`, there may be some checks
that depend on multiple properties. The current motivating case is
ensuring that `tombstone.retention.ms` is not enabled at the same time
as any tiered storage properties.

Add `topic_table::topic_multi_property_validation()` to perform these
checks and abort the topic property update if the `properties` are
found to be invalid.
And fix tests that use kafka topic configuration properties.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants