Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Store Gateway: Index header disk space management #7029

Open
yeya24 opened this issue Jan 3, 2024 · 3 comments · May be fixed by #7118
Open

Store Gateway: Index header disk space management #7029

yeya24 opened this issue Jan 3, 2024 · 3 comments · May be fixed by #7118

Comments

@yeya24
Copy link
Contributor

yeya24 commented Jan 3, 2024

Is your proposal related to a problem?

#6984 adds support for lazy downloaded index headers. It can be useful if only some blocks are accessed more often than other blocks. For example, data from recent one month are usually accessed more often than data older than one year. With lazy downloaded index header feature, we can speed up store gateway start up time and index header disk space usage because we download less index headers.

However, there is no way to clean up index headers on local disk if they are not accessed anymore. If an index header file is downloaded and accessed only once, it remain on the disk forever. We only do mmap unload but there is no local data retention.

Describe the solution you'd like

  1. Purge local index headers periodically if files are not used after some time.
  2. Implement size based strategy as well for ^

We can start with option 1 and add size based strategy later.

Describe alternatives you've considered

NA

@douglascamata
Copy link
Contributor

@yeya24 I think we can plug into the idle timeout for unloading lazy index readers. After the mmap gets removed, we would also remove the downloaded file.

Other options:

  • Add yet one more CLI flag to specify some sort of delete-idle-lazy-index-reader-timeout.
  • Use an LRU list and remove the least used ones. This sounds better than a size strategy to me because if an index is constantly used, the size doesn't matter -- you better keep it in the disk cache to avoid having to redownload it often.

@douglascamata
Copy link
Contributor

Ah, I just noticed we already have a PR for my first alternative option: #7118. WDYT?

@yeya24
Copy link
Contributor Author

yeya24 commented Aug 4, 2024

Yeah I think #7118 is a good start. But that PR would delete index headers as long as they are not accessed for idle timeout. I hope we can also check if the index header is eager downloadable or lazy downloadable and we don't clean up those blocks that are eagerly downloadable to ensure latency.

Use an LRU list and remove the least used ones. This sounds better than a size strategy to me because if an index is constantly used, the size doesn't matter -- you better keep it in the disk cache to avoid having to redownload it often.

LRU is just an eviction policy. Size is what we use to trigger the LRU based eviction so it will be a combination of both. It can be a complex strategy and we can start small.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants