[GraphBolt] CPU RAM Feature Cache for DiskBasedFeature #7339

mfbalin · 2024-04-22T04:20:07Z

🚀 Feature

When we use a DiskBasedFeatureStore, we will need to cache frequently accessed items in a CPU cache so that the disk read bandwidth requirements are reduced.

Motivation

Will improve performance immensely on large datasets whose data do not fit the CPU RAM.

[GraphBolt] Partitioned S3-FIFO Feature Cache implementation #7492 develops the cache primitive
[GraphBolt] Add CPUCachedFeature. [2] #7537, [GraphBolt] CPUCachedFeature tests. [3] #7538 and [GraphBolt] Rename to CPUFeatureCache and bug fixes. [1] #7539 implements CPUCachedFeature
[GraphBolt] Refactor feature store #7274 The FeatureStore and Feature classes need to be extended to support cached features in a better way that allows overlap.
The cache needs to be incorporated into gb.DataLoader.

The text was updated successfully, but these errors were encountered:

Rhett-Ying · 2024-04-26T01:30:04Z

@mfbalin
what is the difference between manually cache frequently accessed items with DiskBasedFeature and TorchBasedFeature with in_memory=False in which cache is automatically applied by OS?
Actually, this raise me the basic question: in what kind of scenario we prefer DiskBasedFeature than TorchBasedFeature with in_memory=False? What is the advantages of DiskBasedFeature?

mfbalin · 2024-04-26T16:27:28Z

@Rhett-Ying io_uring is more efficient and faster compared to using mmap. With io_uring, you need fewer threads to saturate the SSD bandwidth. When it comes to caching, the OS caches pages usually in sizes 4KB, however, feature dimension * dtype_bytes is usually smaller than that. Thus when the OS caches a page, it will cache unnecessary vertex features along with it too. The cache will be less effective because of that.

mfbalin · 2024-04-26T16:32:11Z

And I believe we can use a better caching strategy than the one used inside the Linux kernel. For example, see this paper on a state-of-the-art simple caching policy: https://dl.acm.org/doi/10.1145/3600006.3613147

Rhett-Ying · 2024-04-28T01:05:38Z

As the indices of feature data are random and scattered, it requires separate I/O request to be submitted to submission queue without any explicit optimization in application level. As for the cache, it also requires app-level optimization to make io_uring perform comparable to mmap which involves cache automatically.

With io_uring, you need fewer threads to saturate the SSD bandwidth.

Is it achieved by submit many I/O request to submission queue and wait for completion?

mfbalin · 2024-04-28T01:15:07Z

As the indices of feature data are random and scattered, it requires separate I/O request to be submitted to submission queue without any explicit optimization in application level. As for the cache, it also requires app-level optimization to make io_uring perform comparable to mmap which involves cache automatically.

With io_uring, you need fewer threads to saturate the SSD bandwidth.

Is it achieved by submit many I/O request to submission queue and wait for completion?

Yes, that is how io_uring works, you batch your requests and submit them with a single linux system call. When we also have a cache, it will outperform mmap approach significantly.

Rhett-Ying · 2024-04-28T01:22:35Z

I am not sure if it's easy and clean to implement caching policy in app-level. The trade-off on performance improvement and code logic complexity needs to be taken into consideration.

@pyynb Please read the paper @mfbalin suggested for caching policy: https://dl.acm.org/doi/10.1145/3600006.3613147.

pyynb · 2024-06-17T12:00:15Z

Last month, we compared three different cache libraries and various cache eviction policies. Regarding the eviction policies, we found that the hit rate of S3-FIFO cache was higher than LRU, but the time usage was slightly higher. Both of them are significantly better than other eviction methods(see documentation for details).
As for cache libraries, cachelib performed the best. However, cachelib uses CXX11 ABI, and Torch does not support CXX11 (wrote in the TorchConfig.cmake file), so cachelib is not compatible with Torch. And the performance of cachetools and cachemoncache libraries was not very well (see documentation for details), so we have decided to temporarily suspend the development of Cache for DiskBasedFeature.
https://docs.google.com/document/d/1idVOwZTc_wX9u1UUFC4Ms-lBTkmEavEDnDocE1-Qp5E/edit

mfbalin · 2024-06-17T21:14:48Z

Thank you for the preliminary study.

mfbalin · 2024-06-21T20:05:03Z

I have decided to implement a parallel S3-fifo cache implementation in the upcoming weeks. Assigning the issue to myself.

mfbalin · 2024-06-29T01:04:03Z

#7492 implements the s3-fifo caching policy and the FeatureCache classes. The design is made to be easily extendible in case we want to try more caching policies in the future. @frozenbugs @Rhett-Ying

mfbalin added the feature request Feature request label Apr 22, 2024

mfbalin changed the title ~~[GraphBolt] CPU RAM Feature Cache for DiskBasedFeatureStore~~ [GraphBolt] CPU RAM Feature Cache for DiskBasedFeature Apr 22, 2024

Rhett-Ying assigned pyynb Apr 26, 2024

mfbalin unassigned pyynb Jun 17, 2024

Rhett-Ying added this to the 2024 Graphbolt Misc milestone Jun 18, 2024

mfbalin self-assigned this Jun 21, 2024

mfbalin added the Work Item Work items tracked in project tracker label Jul 18, 2024

mfbalin modified the milestones: 2024 Graphbolt Misc, 2024 GraphBolt MLPerf Jul 18, 2024

mfbalin linked a pull request Jul 22, 2024 that will close this issue

[GraphBolt] Refactor FeatureFetcher to use read_async. #7559

Merged

8 tasks

mfbalin closed this as completed in #7559 Jul 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[GraphBolt] CPU RAM Feature Cache for DiskBasedFeature #7339

[GraphBolt] CPU RAM Feature Cache for DiskBasedFeature #7339

mfbalin commented Apr 22, 2024 •

edited

Loading

Rhett-Ying commented Apr 26, 2024

mfbalin commented Apr 26, 2024

mfbalin commented Apr 26, 2024 •

edited

Loading

Rhett-Ying commented Apr 28, 2024

mfbalin commented Apr 28, 2024

Rhett-Ying commented Apr 28, 2024

pyynb commented Jun 17, 2024

mfbalin commented Jun 17, 2024

mfbalin commented Jun 21, 2024

mfbalin commented Jun 29, 2024

[GraphBolt] CPU RAM Feature Cache for DiskBasedFeature #7339

[GraphBolt] CPU RAM Feature Cache for DiskBasedFeature #7339

Comments

mfbalin commented Apr 22, 2024 • edited Loading

🚀 Feature

Motivation

Rhett-Ying commented Apr 26, 2024

mfbalin commented Apr 26, 2024

mfbalin commented Apr 26, 2024 • edited Loading

Rhett-Ying commented Apr 28, 2024

mfbalin commented Apr 28, 2024

Rhett-Ying commented Apr 28, 2024

pyynb commented Jun 17, 2024

mfbalin commented Jun 17, 2024

mfbalin commented Jun 21, 2024

mfbalin commented Jun 29, 2024

mfbalin commented Apr 22, 2024 •

edited

Loading

mfbalin commented Apr 26, 2024 •

edited

Loading