Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SQLite for cache #12863

Closed
wants to merge 12 commits into from
Closed

SQLite for cache #12863

wants to merge 12 commits into from

Conversation

w-e-w
Copy link
Collaborator

@w-e-w w-e-w commented Aug 30, 2023

Description

as opposed to using json file use a SQLite database of cache

reason of this because we have seen some users with extremely large cache files
like catboxanon cache.json is > 236MB
it's becomes increasingly impractical to use it as a method of storage
and we have seen some cases of the the cache.json beeing being corruptedfor unknown reason reasons #11773

the data will not be stored under cache_database.db or the path specified by env SD_WEBUI_CACHE_DATABASE

to make as little changes as possible I essentially implemented a translation layer that essentially represents the database as a python object same as if we load and save a json file
assuming that I have done everything correctly it should be basically transparent
the implementation is meant to be simple, but this also means not truly optimized for an SQL database
I wrote json strings into the value column for the table for each individual entries
this would mean they will be a slight overhead when reading and writing to the database to do the json conversion

currently this is implemented as a toggle which can be changed under settings > system > Use sqlite for cache
I store the value under the key experimental_sqlite_cache if this is proven stable with no compatibility issues we can remove the enable this by default in the future

Other changes

I changed the file modification time comparison for that triggers a cache refresh from greater than to not equal
this makes sense because you might replace a new file with another file that has a older modification date
if this is the case then the cache refresh will not be triggered, this is undesirable

TODO

if in the future this is going to be set at as the default
it should be a simple matter to write a migration script to migrate cache.json to the db so that users don't have to spend time recalculating

Database table structure

Table: hashes

path (string) mtime (float) value (json string)
checkpoint/Drawing\Stable Diffusion\v1-5-pruned-emaonly.ckpt 1666314275.47775 "cc6cb2710......" (the full hash)
...... ...... ......

Table: safetensto-metadata

path (string) mtime (float) value (json string)
checkpoint/Drawing\Stable Diffusion\sd_xl_base_1.0.safetensors 1690539434.22002 {"modelspec.sai_model_spec": "1.0.0", ...... } (json meta date)
...... ...... ......

this will be read into memory as one big object, structure mimicking cache.json

cache_data = {
    "hashes": {
        "checkpoint/Drawing\Stable Diffusion\v1-5-pruned-emaonly.ckpt": {
            "mtime": 1666314275.47775,
            "value": "cc6cb2710......"  # (the full hash)
        },
        { ...... }
    },
    "safetensto-metadata": {
        "checkpoint/Drawing\Stable Diffusion\sd_xl_base_1.0.safetensors": {
            "mtime": 1690539434.22002,
            "modelspec.sai_model_spec": "1.0.0",
            # ......
        },
        { ...... }
    }
}

Checklist:

@wfjsw
Copy link
Contributor

wfjsw commented Aug 30, 2023

I'm reading it on mobile phone so I'm probably missing things

doing r/w on full file scale probably won't help with the performance. i might be able to come up with something that is better but only on weekends

from my experience, anything other than full wal is unreliable. it has to be enabled for sqlite to be robust against corruptions

@w-e-w
Copy link
Collaborator Author

w-e-w commented Aug 30, 2023

@wfjsw
I think you probably misread
I'm only doing full read but not full write

I read from the entire database once at the launch of web UI
and this is what we are doing with the json, this is faster than querying the entries individually as needed
however this is less memory efficient, so if we want we can change this easily

subsequent write if any are only per entry, it does not be rewrite entire database
so it should perform better as data size get larger

@w-e-w w-e-w marked this pull request as draft August 30, 2023 15:19
@w-e-w w-e-w marked this pull request as ready for review August 30, 2023 23:44
@freecoderwaifu
Copy link

image
image

Can confirm, almost 20k Loras (send help).

@w-e-w
Copy link
Collaborator Author

w-e-w commented Aug 31, 2023

wow 500MB 🤯
help test if you can

@freecoderwaifu
Copy link

Finally got around to testing it

image

Nice size reduction, startup took a while to create the new database though, unsure how long since I AFK'd while it was doing it, but .json cache rebuild times are also long.

Everything else seems working exactly as with the older cache, fresh launch after a (re)boot takes about the same time but "seemed" a tiny bit faster, subsequent relaunches are also fast as before.

@w-e-w
Copy link
Collaborator Author

w-e-w commented Sep 3, 2023

duction, startup took a while to create the new database though, unsure how long since I AFK'd while it was doing it, but .json cache rebuild times are also long.

when and if this is merge I intend to write a migration sequence that would use the json file build the database

because currently I consider this to be in "experimental" phase I didn't want to add extra logic fot migration yet

the migration logic should be relatively simple
if sqlite_cache and cache.json exist and sqlite_cache.db not exist
create db and dump the contents of cache.json into each individual tables in database

Everything else seems working exactly as with the older cache, fresh launch after a (re)boot takes about the same time but "seemed" a tiny bit faster, subsequent relaunches are also fast as before.

the main benefit of this should be when writing new cache it doesn't have to rewrite the entire file
then they should be far lesser chance of the database being corrupted
which if this happens means rebuilding of the entire cache, like what we have seen happen to some users

@akx
Copy link
Collaborator

akx commented Sep 10, 2023

TBH I think we should just use https://pypi.org/project/diskcache/ instead. It's well battle-tested, and SQLite (+disk for large objects) backed. Also, for the app it looks just like a dict, which is very useful.

@w-e-w
Copy link
Collaborator Author

w-e-w commented Sep 11, 2023

cool, didn't know about diskcache
that's why I just put together my own implementation

@AUTOMATIC1111
Copy link
Owner

merged #15287

@w-e-w w-e-w deleted the sqlite-cache branch June 28, 2024 03:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants