-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add versioning to all DGLDatasets to force reloading when codes are changed #4293
Comments
Review of the current practiceWe use the table below to track the current practice of cache versioning for the existing datasets and the cases it fails to handle.
In addition, the versioning mechanism should detect:
In both cases, the cache files need to be re-generated. ProposalIn general, hashing is an effective way to prevent loading an undesired cached file. However, its downside is that there can be many cache files if there are a huge number of possible combinations of preprocessing options. One solution is to instead save a file storing only the hash code or preprocessing steps for a sanity check when data loading is attempted. If this fails, then the data will be re-processed from scratch. |
🚀 Feature
Add versioning to all DGLDatasets to detect:
Motivation
Brought up by #3987 which asks to revert the default reordering behavior of DGL builtin datasets. The issue is that even if we've implemented the request, users may still load cached datasets from local disk which may not reflect the latest change. Therefore, we require some versioning mechanism to detect those changes.
cc @mufeili
Alternatives
Use a different dataset folder whenever DGL updates. This could cause excessive disk storage use.
The text was updated successfully, but these errors were encountered: