Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(mito): support write cache for index file #3144

Merged

Conversation

zhongzc
Copy link
Contributor

@zhongzc zhongzc commented Jan 11, 2024

I hereby agree to the terms of the GreptimeDB CLA

What's changed and what's your intention?

Regarding the upcoming SST Inverted Index Creator for the Parquet Writer, its implementation unavoidably has to be coupled with ParquetWriter::write_all (for which I am very sorry, and if there's a better solution, I am open to suggestions). Consequently, depending on the WriteOptions configuration, ParquetWriter::write_all will generate either a Parquet data file or a Parquet data file plus a Puffin index file.

This breaks some assumptions, for instance, with delete: the deletion of index files needs to be considered; with disk usage: the size of the index files needs to be taken into account. The recently added Write Cache feature also assumes that ParquetWriter::write_all only produces a single Parquet data file, an assumption that is unfortunately no longer correct.

This PR is to fix the soon-to-be outdated assumptions of the Write Cache:

  • Add FileType to the IndexKey in FileCache to indicate that the WriteCache contains not only Parquet files but also Puffin files.
  • Have the WriteCache's write_and_upload_sst method upload Puffin files as well (if generated).
  • Add a FileCache dependency to SstIndexApplier to benefit from local cache acceleration.

Checklist

  • I have written the necessary rustdoc comments.
  • I have added the necessary unit tests and integration tests.
  • This PR does not require documentation updates.

Refer to a related PR or issue link (optional)

#2705 #2965

@github-actions github-actions bot added docs-not-required This change does not impact docs. Size: M labels Jan 11, 2024
@zhongzc zhongzc force-pushed the zhongzc/write-cache-support-index branch 2 times, most recently from f70325f to 782724b Compare January 11, 2024 10:36
Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>
Copy link

codecov bot commented Jan 11, 2024

Codecov Report

Attention: 25 lines in your changes are missing coverage. Please review.

Comparison is base (8ec1e42) 85.41% compared to head (d432890) 84.97%.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3144      +/-   ##
==========================================
- Coverage   85.41%   84.97%   -0.45%     
==========================================
  Files         823      823              
  Lines      134797   134917     +120     
==========================================
- Hits       115138   114640     -498     
- Misses      19659    20277     +618     

src/mito2/src/access_layer.rs Outdated Show resolved Hide resolved
src/mito2/src/cache/file_cache.rs Outdated Show resolved Hide resolved
src/mito2/src/cache/file_cache.rs Show resolved Hide resolved
src/mito2/src/error.rs Outdated Show resolved Hide resolved
src/mito2/src/sst/parquet/writer.rs Outdated Show resolved Hide resolved
Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>
Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>
Copy link
Contributor

@QuenKar QuenKar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@killme2008 killme2008 added this pull request to the merge queue Jan 12, 2024
Merged via the queue into GreptimeTeam:main with commit c1190ba Jan 12, 2024
15 checks passed
@zhongzc zhongzc deleted the zhongzc/write-cache-support-index branch January 12, 2024 05:29
This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
docs-not-required This change does not impact docs.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants