cloud_storage: use larger cache for scale test #5915

abhijat · 2022-08-09T10:22:28Z

The cloud storage cache used to be cleared in regular intervals. This allowed the franz go verifier test to grow the cache to larger than limit in short bursts between cleanup, and pass because the consumers could progress.

With the new more strict/realtime cache eviction, this does not happen and the test fails for multiple reasons. This change allows the cache size to be a multiple of segment size so that there is not a continuous cycle of hydrate current segment -> read from segment -> evict old segment in the code.

partially fixes #5753 . still need to investigate the memory usage growing in the pathological case

Backport Required

UX changes

None

Release notes

None

VladLazar

Looks good to me. Is this a good heuristic for the minimum SI cache size: parallel_consumers_count * segment_size? If so, should we document it anywhere?

tests/rptest/scale_tests/franz_go_verifiable_test.py

abhijat · 2022-08-09T14:28:24Z

Looks good to me. Is this a good heuristic for the minimum SI cache size: parallel_consumers_count * segment_size? If so, should we document it anywhere?

My reasoning behind this size was that in the worst case where each parallel read is reading from a different segment, we should have roughly enough space for these reads to continue without evicting each other, although we could probably get away with something smaller, like 0.75 x parallel_reads x segment size, but it would be borderline and probably fail sometimes.

This failure (if we set cache to the smaller 0.75x ration) in my tests happens especially with smaller segment size, I guess with larger segments the random reads often find their offset in the large existing segments on cache, with smaller segments the probability of that offset being in cache decreases.

On the other hand if the cache is much larger we won't exercise eviction often enough.

The official recommendation is 20GiB https://docs.redpanda.com/docs/data-management/tiered-storage/#caching.

Please let me know if it makes sense, we can tune it further if this seems too relaxed of a setting

cc @Lazin

The cloud storage cache used to be cleared in regular intervals. This allowed the test to actually grow the cache to larger than limit in short bursts between cleanup, and pass because the consumers could progress. With the new more strict/realtime cache eviction, this does not happen and the test fails for multiple reasons. This change allows the cache size to be a multiple of segment size so that there is not a continuous cycle of hydrate current segment -> read from segment -> evict old segment in the code.

VladLazar · 2022-08-09T14:33:36Z

Makes sense. Thanks for the extra context.

LenaAn

LGTM

Lazin

LGTM

tests/rptest/scale_tests/franz_go_verifiable_test.py

abhijat requested review from dotnwat and NyaliaLui as code owners August 9, 2022 10:22

abhijat force-pushed the adjust-cache-size-for-scale-test branch 4 times, most recently from 748b274 to b9b7f08 Compare August 9, 2022 12:00

abhijat requested review from Lazin and LenaAn August 9, 2022 13:29

mmedenjak added kind/bug Something isn't working ci-failure area/cloud-storage Shadow indexing subsystem labels Aug 9, 2022

abhijat mentioned this pull request Aug 9, 2022

Failure of ConnectionRateLimitTest.connection_rate_test #5276

Closed

VladLazar previously approved these changes Aug 9, 2022

View reviewed changes

tests/rptest/scale_tests/franz_go_verifiable_test.py Outdated Show resolved Hide resolved

VladLazar mentioned this pull request Aug 9, 2022

Semaphore timeout in FranzGoVerifiableWithSiTest.test_si_with_timeboxed #5895

Closed

abhijat dismissed VladLazar’s stale review via 84ea1ac August 9, 2022 14:28

abhijat force-pushed the adjust-cache-size-for-scale-test branch from b9b7f08 to 84ea1ac Compare August 9, 2022 14:28

abhijat mentioned this pull request Aug 9, 2022

filesystem errors during KgoVerifierWithSiTestLargeSegments.test_si_without_timeboxed #5878

Closed

VladLazar mentioned this pull request Aug 9, 2022

Franz-go sequential consumer timeout in KgoVerifierWithSiTestLargeSegments.test_si_without_timeboxed and KgoVerifierWithSiTestLargeSegments.test_si_with_timeboxed #5898

Closed

LenaAn approved these changes Aug 9, 2022

View reviewed changes

Lazin approved these changes Aug 9, 2022

View reviewed changes

tests/rptest/scale_tests/franz_go_verifiable_test.py Show resolved Hide resolved

abhijat merged commit 8ad74a2 into redpanda-data:dev Aug 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cloud_storage: use larger cache for scale test #5915

cloud_storage: use larger cache for scale test #5915

abhijat commented Aug 9, 2022

VladLazar left a comment

abhijat commented Aug 9, 2022

VladLazar commented Aug 9, 2022

LenaAn left a comment

Lazin left a comment

cloud_storage: use larger cache for scale test #5915

cloud_storage: use larger cache for scale test #5915

Conversation

abhijat commented Aug 9, 2022

Backport Required

UX changes

Release notes

VladLazar left a comment

Choose a reason for hiding this comment

abhijat commented Aug 9, 2022

VladLazar commented Aug 9, 2022

LenaAn left a comment

Choose a reason for hiding this comment

Lazin left a comment

Choose a reason for hiding this comment