-
Notifications
You must be signed in to change notification settings - Fork 577
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cloud_storage: use larger cache for scale test #5915
cloud_storage: use larger cache for scale test #5915
Conversation
748b274
to
b9b7f08
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me. Is this a good heuristic for the minimum SI cache size: parallel_consumers_count * segment_size
? If so, should we document it anywhere?
My reasoning behind this size was that in the worst case where each parallel read is reading from a different segment, we should have roughly enough space for these reads to continue without evicting each other, although we could probably get away with something smaller, like 0.75 x parallel_reads x segment size, but it would be borderline and probably fail sometimes. This failure (if we set cache to the smaller 0.75x ration) in my tests happens especially with smaller segment size, I guess with larger segments the random reads often find their offset in the large existing segments on cache, with smaller segments the probability of that offset being in cache decreases. On the other hand if the cache is much larger we won't exercise eviction often enough. The official recommendation is 20GiB https://docs.redpanda.com/docs/data-management/tiered-storage/#caching. Please let me know if it makes sense, we can tune it further if this seems too relaxed of a setting cc @Lazin |
The cloud storage cache used to be cleared in regular intervals. This allowed the test to actually grow the cache to larger than limit in short bursts between cleanup, and pass because the consumers could progress. With the new more strict/realtime cache eviction, this does not happen and the test fails for multiple reasons. This change allows the cache size to be a multiple of segment size so that there is not a continuous cycle of hydrate current segment -> read from segment -> evict old segment in the code.
b9b7f08
to
84ea1ac
Compare
Makes sense. Thanks for the extra context. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
The cloud storage cache used to be cleared in regular intervals. This allowed the franz go verifier test to grow the cache to larger than limit in short bursts between cleanup, and pass because the consumers could progress.
With the new more strict/realtime cache eviction, this does not happen and the test fails for multiple reasons. This change allows the cache size to be a multiple of segment size so that there is not a continuous cycle of hydrate current segment -> read from segment -> evict old segment in the code.
partially fixes #5753 . still need to investigate the memory usage growing in the pathological case
Backport Required
UX changes
None
Release notes
None