Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

blockstore: add an option to skip duplicate Puts by CID, mimicking carv1's Selective Writer API #123

Closed
mvdan opened this issue Jun 29, 2021 · 1 comment
Labels
P1 High: Likely tackled by core team if no one steps up
Milestone

Comments

@mvdan
Copy link
Contributor

mvdan commented Jun 29, 2021

Filecoin writes proofs into CAR files which are hashed, so we need their contents to be deterministic.

The way Filecoin currently generates those CARv1 files is via v1's selective writer API, which ensures canonical ordering via traversals, and also deduplicates by CID:

go-car/selectivecar.go

Lines 229 to 230 in 71cfa2f

if !sct.cidSet.Has(c) {
sct.cidSet.Add(c)

For Ignite's current project, they receive blocks via graphsync, which ensures the order of blocks as per the IPLD selector, just like v1's selective writer. However, we might receive duplicate blocks from a client. When graphsync receives blocks they end up getting "Put" to our carv2 read-write blockstore.

If we want to be compatible, we should support deduplicating by CID. I propose a ReadWrite blockstore option for it, like DeduplicateByCID; if one calls Put on the same CID twice, the second call will simply do nothing and return a nil error.

In the future we could satisfy this need by porting Selective Writers to carv2 (#104), but that can't happen for another month or two.

I could also ask Ignite to implement a Blockstore wrapper that does this deduplication on Put calls, but deduplicating by CID also seems like a reasonable opt-in feature that others might want in the future. It wouldn't make the API significantly more complex or the read-write blockstore significantly slower, either.

@mvdan mvdan added the P1 High: Likely tackled by core team if no one steps up label Jun 29, 2021
@mvdan mvdan added this to the CAR v2 milestone Jun 29, 2021
mvdan added a commit that referenced this issue Jul 1, 2021
And a test that uses duplicate hashes as well as duplicate CIDs.

We reuse the same insertion index, since it's enough for this purpose.
There's no need to keep a separate map or set of CIDs.

While at it, make the index package not silently swallow errors,
and improve the tests to handle errors more consistently.

Fixes #123.
Fixes #125.
@mvdan
Copy link
Contributor Author

mvdan commented Jul 1, 2021

Fixed by #127.

@mvdan mvdan closed this as completed Jul 1, 2021
mvdan added a commit that referenced this issue Jul 16, 2021
And a test that uses duplicate hashes as well as duplicate CIDs.

We reuse the same insertion index, since it's enough for this purpose.
There's no need to keep a separate map or set of CIDs.

While at it, make the index package not silently swallow errors,
and improve the tests to handle errors more consistently.

Fixes #123.
Fixes #125.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P1 High: Likely tackled by core team if no one steps up
Projects
None yet
Development

No branches or pull requests

1 participant