Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disable garbage collection in the default IPFS node #277

Closed
evan-forbes opened this issue Apr 9, 2021 · 3 comments
Closed

Disable garbage collection in the default IPFS node #277

evan-forbes opened this issue Apr 9, 2021 · 3 comments
Assignees

Comments

@evan-forbes
Copy link
Member

Previously we were pinning all block data during proposal and during testing. This had so much overhead that it made the tests timeout (see #275). This can be solved by simply not pinning by default, but this outsources our data retention policy to IPFS's garbage collection. In the short term, this should work fine. Beyond the short term, we need to have a well defined default data retention policy.

@Wondertan suggested to turn off garbage collection. This has the benefits of no overhead by GC or pinning, and basically does the same thing as pinning (at least to my understanding).

@liamsi suggested not waiting for pinning to finish, or pinning at a different point in time.

If possible our strategy should also include some details on which ipld nodes to delete should disk space be X% filled. The ipfs docs mention StorageGCWatermark GCPeriod but these only determine when GC gets turned on, not which ipld nodes to delete.

@liamsi
Copy link
Member

liamsi commented Apr 9, 2021

I'm in favor of @Wondertan's suggestion. It's certainly the simplest and probably the fastest as well. Later we can explore how to pin and tweak GC.

@evan-forbes evan-forbes changed the title Figure out a default data retention strategy Disable garbage collection in the default IPFS node Apr 12, 2021
@evan-forbes
Copy link
Member Author

It sounds like we're going to disable the garbage collection of IPFS, so I'm going to go ahead and change the title of this issue to reflect that.

@Wondertan Wondertan self-assigned this Apr 16, 2021
@Wondertan
Copy link
Member

Wondertan commented Apr 16, 2021

After code investigation, it turned out that GC is not enabled in IPFS by default and we were never actually using it, so I don't understand why we were using pinning at all. Maybe we misunderstood its purpose or whatever. Anyway, I am closing the issue and induce not to care about pinning henceforth.

The only reason pinning exists is to prevent GC from cleaning stale data blocks. It makes sense for IPFS use cases, but not for ours, as we don't have a concept of "outdated" or "transient" data blocks. Even in case, some light client configuration would need to store chain blocks or square samples temporarily, the GC from IPFS won't help here.

The place where IPFS daemon enables GC only if a flag provided. In case, IPFS is used as a library(our case) it never runs GC at all, and its configuration is ignored. Seems like, it is expected from a lib user to dig out the place somewhere deep in the package tree which actually runs GC through an undocumented API.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants