Skip to content
This repository has been archived by the owner on Feb 12, 2024. It is now read-only.

Quirks of ipfs.add API #3137

Closed
Gozala opened this issue Jul 3, 2020 · 9 comments · Fixed by #3167
Closed

Quirks of ipfs.add API #3137

Gozala opened this issue Jul 3, 2020 · 9 comments · Fixed by #3167

Comments

@Gozala
Copy link
Contributor

Gozala commented Jul 3, 2020

I'm working on #3029 and having a really hard time making changes to ipfs.add without changing it's current behavior. In the process I'm discovering some quirks that I wanted to point out & hopefully find a way to address:

So it takes arbitrary input and normalizes it. However rules for single vs multiple files are of particular interest here.

* AsyncIterable<Bytes> [single file]
* AsyncIterable<Bloby> [multiple files]
* AsyncIterable<String> [multiple files]
* AsyncIterable<{ path, content: Bytes }> [multiple files]
* AsyncIterable<{ path, content: Bloby }> [multiple files]
* AsyncIterable<{ path, content: String }> [multiple files]
* AsyncIterable<{ path, content: Iterable<Number> }> [multiple files]
* AsyncIterable<{ path, content: Iterable<Bytes> }> [multiple files]
* AsyncIterable<{ path, content: AsyncIterable<Bytes> }> [multiple files]

So all AsyncIterators represent input with multiple files, except if it is iterator of ArrayBufferView|ArrayBuffer. This leads to some interesting questions:

  • Is this supposed to produce single file with hi\nbye content or two files ?

    ipfs.add(asyncIterable([Buffer.from('hi\n'), 'bye')])

    According to docs AsyncIterable<Bytes> is interpreted as single file, however AsyncIterable<string> is interpreted as multiple files. However implementation only checks first the rest are just yielded from content, so I'm guessing it would produce single file.

  • And how about if we switch those around ?

    ipfs.add(asyncIterable(['hi\n', Buffer.from('bye'))])

    According to the documentation AsyncIterable<string> is interpreted as multiple, files however since only first chunk is checked I'm guessing this one would produce two files.

  • Even more interesting would be if we did this:

    ipfs.add(asyncIterable([Buffer.from('hi\n'), {content: 'bye' })])

    Which would produce error, although one might expect two files.

Maybe this is not as confusing as I find it to be, but I really wish there was more way to differentiate multiple file add from single file add so implementation would not need to probe async input to decide.

I think things would be so much simpler for both user and implementation if ipfs.add just always worked with multiples, that would mean that user would have to wrap Bytes|Blob|string|FileObject|AsyncIterable<Bytes> into array [content] but it would make API much cleaner also AsyncIterable<*> as result would make more sense as it's many to many.

Alternatively there could be ipfs.add and ipfs.addAll APIs to deal with single files and multiple files respectively. That would also reduce complexity when adding a single file const cid = (await ipfs.add(buffer).next()).value.cid

@Gozala Gozala added the need/triage Needs initial labeling and prioritization label Jul 3, 2020
@Gozala
Copy link
Contributor Author

Gozala commented Jul 3, 2020

I think ipfs.add tries to do several things under the same API:

  1. Add a single file to the IPFS (I have a file I want CID back, no streams no nothing).
  2. Add batch of files to the IPFS (I have these n files (synchronously) and I want n CIDs back).
  3. Import some files into IPFS (I want to import content as I crawl / traverse some source & don't want to let importer drive the process).

I think current API is great for the case 3, but at the expense of 1 and 2. Same code path also makes it really difficult to special case and optimize cases 1 and 2 because to decide the code path input needs to be probed, sometimes asynchronously, so it gets all normalized into import style input.

@achingbrain
Copy link
Member

Is this supposed to produce single file with hi\nbye content or two files ?

ipfs.add(asyncIterable([Buffer.from('hi\n'), 'bye')])

The user should pass homogenous input. I don't think it's reasonable to try to cater for this sort of use.

If the input is an (async)iterable we could watch the type of the contents of the and throw if it changes or take some other action but so far no-one has asked for this and if they did, we'd certainly recommend that they pass homogenous input instead.

Alternatively there could be ipfs.add and ipfs.addAll APIs

I'm not massively against this, it's just trivial to accomplish the same thing with:

const { cid } = await last(ipfs.add(blob))

..which is why it never made it into the API in the first place.

If you dislike using the it-* modules or streaming-iterables or what have you, https://github.com/tc39/proposal-iterator-helpers is at the draft stage and will hopefully make that all obsolete one day, but right now there's no need to suffer through weird contortions like const cid = (await ipfs.add(buffer).next()).value.cid.

In fact all of the examples above are (to my eye anyway) quite straightforward:

Add a single file to the IPFS (I have a file I want CID back, no streams no nothing).

const { cid } = await last(ipfs.add(blob))

Add batch of files to the IPFS (I have these n files (synchronously) and I want n CIDs back).

const added = await all(ipfs.add(blobs))

Or if you really just want an array of CIDs:

const cids = await all(map(ipfs.add(blobs), added => added.cid))

Import some files into IPFS (I want to import content as I crawl / traverse some source & don't want to let importer drive the process).

// some sort of filter criteria
const onlyFiles = (file) => Boolean(file.content)

const added = await all(ipfs.add(filter(globSource('/dir/**/*'), onlyFiles)))

Same code path also makes it really difficult to special case and optimize cases 1 and 2 because to decide the code path input needs to be probed, sometimes asynchronously, so it gets all normalized into import style input.

Having ipfs.add only operate on single items will make this a bit easier in that case, but ipfs.addAll will still have to look at the contents of the input so you still have to solve the problem somewhere.

@achingbrain achingbrain added exploration and removed need/triage Needs initial labeling and prioritization labels Jul 3, 2020
@Gozala
Copy link
Contributor Author

Gozala commented Jul 3, 2020

I'm not massively against this, it's just trivial to accomplish the same thing with:

It is trivial, if you import library and will be even more so once JS engines provide a built-in equivalnts... However having to use library just to get a result of the API call seems like an incidental complexity.

To me this feels like if async function were returning Array<Promise<T>> instead of Promise<T>, it is trivial to to de-structure and get a result you want, but then again why not just return actual thing ?

Another sign of this complexity is that collection can be empty, which should never happen here, but good APIs make impossible states impossible and not considering empty case is both discomforting & something that type-checkers (if you use one) will point out every time.

Even if implementation under the hood would just do (input) => last(ipfs.addAll(input)) that would be a great improvement because it removes burden from user to:

  • Pull in extra library.
  • Makes pedants and type checkers happy by avoiding empty collection case.
  • Allows alternative more optimal implementation without increasing complexity of already complex implementation.
  const added = await all(ipfs.add(blobs))

Or if you really just want an array of CIDs:

const cids = await all(map(ipfs.add(blobs), added => added.cid))

Problem I'm hinting on is not that it is hard to get a result you want, but rather that API is tailored to a specific use case and all others (more common) use cases suffer with that extra bit of added complexity (be it importing a library, doing other mapping, etc...).

It is also shows in the implementation. In #3022 I end up using different encoding strategies for each of these use case, now I'm facing very same issue with #3029 and logic to differentiate between them is really complex. Implementation complexity alone would not be a good argument, however if you consider that users also need to do little bit extra to go from async collections back to single result (in use case 1) or a sync collection (in use case 2) I really don't get what is the value.

@Gozala
Copy link
Contributor Author

Gozala commented Jul 3, 2020

If the input is an (async)iterable we could watch the type of the contents of the and throw if it changes or take some other action but so far no-one has asked for this and if they did, we'd certainly recommend that they pass homogenous input instead.

My problem is that dealing with homogenous input would have streamlined the logic, however current implementation would work even if inputs are homogenous and I'm guessing that is by a chance because all the Buffer.from uses, which I am trying to rip out in browser scenario.

If we're ok with changes that would assume homogenous inputs and start breaking when they are not please let me knew, in that case I would like to reflect that both in code and docs.

@achingbrain
Copy link
Member

If we're ok with changes that would assume homogenous inputs and start breaking when they are not please let me knew, in that case I would like to reflect that both in code and docs.

AFAIK we do assume homogenous inputs and it'd be a case of garbage-in, garbage-out if they are not. That is, we don't have tests for non-homogenous inputs and the docs do not say they are supported so I think that's ok.

Please submit PRs with tests & fixes where the docs can be improved or there are bugs in the implementation, but changing this API is not a priority.

@d10r
Copy link

d10r commented Jul 7, 2020

@achingbrain

In fact all of the examples above are (to my eye anyway) quite straightforward:
Add a single file to the IPFS (I have a file I want CID back, no streams no nothing).
const { cid } = await last(ipfs.add(blob))

This is imo not straightforward for the average developer.
Looking at this, I have no idea what last is. It's not part of the Javascript language.
I'm a random dev who just started using this API wrapper (I'm not new to IPFS itself however) and have just spent about an hour looking for a less weird way to add a single file than proposed in https://github.com/ipfs/js-ipfs/tree/master/packages/ipfs-http-client#readme:

for await (const file of ipfs.add(urlSource('https://ipfs.io/images/ipfs-logo.svg'))) {
  console.log(file)
}

Sure, this does the job. But it's imo rather awkward to have a loop here.
For an expert dev, it may not be a big deal to refactor it to something more elegant without a loop.
But like many others I'm not a fulltime JS dev who's intimately familiar with advanced concepts like generator functions.
I think a good API intended for a broad audience should not rely on advanced knowledge of the underlying language, but be designed such that basic operations can be done with simple code which can easily be "guessed" by an average dev based on the intuitions given by the interface and its naming.

Sticking to this example, I then found an alternative way in the tests: https://github.com/ipfs/js-ipfs/blob/master/packages/interface-ipfs-core/src/add.js#L75:

const all = require('it-all')
...
const filesAdded = await all(ipfs.add(new self.File(['should add a File'], 'filename.txt', { type: 'text/plain' })))

That results in more elegant code. But adding an unknown dependency just to "fix" that lack of elegance of the API doesn't look like a great choice either. It adds complexity and cognitive overhead (e.g. devs reading over this may have a hard time telling what exactly is happening here).

This API reminds me of v0.x of web3.js, which had a lot of magic (individual endpoints trying to cover a plethora of possible uses based on the parameters given) like this.
Based on feedback of devs, in v1 they somewhat gave up that approach, making the interfaces more focused and explicit. I recommend to consider that as it helps reduce brain aches (scnr) for everybody starting to use the API. This should be a priority if adoption of IPFS is a priority.

@drewstaylor
Copy link

drewstaylor commented Jul 11, 2020

I seriously miss the old IPFS release where you could simply add a single file easily without importing async iterators. In my 10 years of development I don't think I've struggled so much and I've previously been using IPFS since 2017.

In the current example from the API:

for await (const file of ipfs.add(urlSource('https://ipfs.io/images/ipfs-logo.svg'))) {
  console.log(file)
}

You need to serve a file from urlSource, or you can iterate a globSource; however the core API documentation claims it supports JSON string uploads (https://github.com/ipfs/js-ipfs/blob/master/docs/core-api/FILES.md#parameters)

It's almost seems as if the current version of js-ipfs is trying to be intentionally difficult to use.

@achingbrain
Copy link
Member

Like I said, I'm not massively against this. There's even precedent in our own codebase in that interface-datastore and ipld have separate methods for single/multiple inputs.

Anyway thanks, this feedback is useful.

however the core API documentation claims it supports JSON string uploads

Where does it say that? There's no mention of JSON on the page you linked to.

@drewstaylor
Copy link

Where does it say that? There's no mention of JSON on the page you linked to.

@achingbrain it actually does seem to work with a string input. No need to focus on the word JSON, I was just confused about whether it would accept a string data type which it does seem to do. The page I linked says:

data may be:

Bytes (alias for Buffer|ArrayBuffer|TypedArray) [single file]
Bloby (alias for: Blob|File) [single file]
string [single file]
// and so on...

For example, my current upload script contains:

  try {
    const filesAdded = await all(
      ipfs.add([json], { type: 'application/json' })
    );
    console.log('IPFS files added: \n', filesAdded);
    return filesAdded[0];
  } catch (e) {
    console.error(e);
    return false;
  }

which is working fine to produce content like: https://cloudflare-ipfs.com/ipfs/QmVFqnDeGZVMS6aK8QuRcWH6xyWmyR2rnTR8VPHw4o32nU

@achingbrain achingbrain linked a pull request Jul 15, 2020 that will close this issue
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants