Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

storage: fix integer overflow in max_timestamp #4010

Merged
merged 1 commit into from
Mar 17, 2022

Conversation

jcsp
Copy link
Contributor

@jcsp jcsp commented Mar 15, 2022

Cover letter

Batches with timestamps behind the base_timestamp of the segment could have negative relative timestamp, which overflows when cast to uint32_t to insert to index_state::relative_time_index.

index_state::relative_time_index is later used to update max_timestamp when truncating the log. Hence max_timestamp shooting about 6 weeks into the future.

Fixes #3924

Release notes

  • none

@jcsp jcsp added kind/bug Something isn't working area/storage labels Mar 15, 2022
Copy link
Member

@dotnwat dotnwat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice catch. there is something here that needs fixed. two things I'm thinking about, but not sure of:

  1. iirc the only query we do on time is related to finding a lower bound. so if a timestamp was artificially further ahead in the future, the affect should merely be to return more data than necessary but that seems like a compliant behavior. I'm less certain about the interaction with the max_timestamp calculation. what is the bug that results from Hence max_timestamp shooting about 6 weeks into the future.?
  2. iirc kafka also permits clients to send in arbitrary timestamps, so presumably kafka has to tackle a similar scenario. i wonder if we should poke around and see what upstream behavior is in this case?

@@ -64,7 +64,7 @@ bool index_state::maybe_index(
// We know that a segment cannot be > 4GB
add_entry(
batch_base_offset() - base_offset(),
last_timestamp() - base_timestamp(),
std::max(last_timestamp() - base_timestamp(), int64_t{0}),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

subtle.

I was thinking base_timestamp() as the bottom time, but you are right, it has to be 0.

@jcsp
Copy link
Contributor Author

jcsp commented Mar 15, 2022

what is the bug that results from Hence max_timestamp shooting about 6 weeks into the future.?

That bogus timestamp is used for retention.ms enforcement, so we end up failing to delete segments during gc.

I went with a minimal fix here because this is a relatively frequent CI failure that I'd like to get fixed quickly. The general case where clients can send arbitrary timestamps (they can go backward, they can differ by more than 2**32) is incompatible with the way we store relative timestamp in a uint32_t.

In terms of why we're ending up with timestamps <= base_timestamp in our tests, I'm not quite sure, but I don't think we really prevent it: the timestamps are generated in produce.cc before any ordering rules are in effect, so it's entirely possible for incoming batches to be stamped in a different order than they land in the segment. Maybe we should be applying timestamps later, once the order of storage is established.

@jcsp
Copy link
Contributor Author

jcsp commented Mar 15, 2022

I checked how we were getting base_timestamps in the future wrt subsequent messages -- it's raft configuration messages, which can easily be timestamped after actual kafka batches, where the kafka batches are held up waiting for the consensus group to be ready (after the configuration lands).

afaict we don't implement the timestamp->offsets Kafka feature (segment_index::find_nearest(model::timestamp t) is unused). As/when we do implement that:

  • it might be necessary to extend the index to track a minimum timestamp per segment, which would be permitted to be lower than the base, and would enable a searcher to correctly find the segment that contains the earliest message with a given timestamp.
  • or, maybe we can exclude raft configuration messages from the index so that at least the base+max timestamps reflect the kafka timestamps (including client timestamps if that's what the used).

@jcsp jcsp merged commit e4f344f into redpanda-data:dev Mar 17, 2022
@jcsp jcsp deleted the issue-3924-storage-max-timestamp branch March 17, 2022 21:31
This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Failure in test_changing_topic_retention.test_changing_topic_retention.test_changing_topic_retention
3 participants