Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

continuation of use string instead of struct backing #71

Merged
merged 18 commits into from
Sep 12, 2018
Merged

Conversation

kevina
Copy link
Contributor

@kevina kevina commented Aug 25, 2018

Rebased #47 and made some improvements. Most of the changes where noted in my review of #47.

I tried to keep each change in it's own commit, so if any of my changes are not agreeable I can back them out.

Closes #3. Closes #38.

@ghost ghost assigned kevina Aug 25, 2018
@ghost ghost added the status/in-progress In progress label Aug 25, 2018
@kevina kevina changed the title use string instead of struct backing continuation of use string instead of struct backing Aug 25, 2018
@kevina kevina mentioned this pull request Aug 25, 2018
cid.go Outdated Show resolved Hide resolved
cid.go Outdated
if c.Version() == 0 {
return DagProtobuf
}
bytes := c.Bytes()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may want to either:

  1. Implement a basic non-allocating "byte reader" for strings and use binary.ReadUvarint.
  2. Manually parse these.

Note: for the first one, we can just check c.string[0] == 1 and then skip it.

The ByteReader would probably look like:

type stringReader string

func (r *stringReader) ReadByte() (byte, error) {
    if len(*r) == 0 {
        return 0, io.EOF
    }
    b := r[0]
    *r = *r[1:]
    return b, nil
}

cid.go Outdated
@@ -159,9 +159,9 @@ func NewCidV1(codecType uint64, mhash mh.Multihash) Cid {
// - version uvarint
// - codec uvarint
// - hash mh.Multihash
type Cid string
type Cid struct{ string }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea. This will also allow us to add fields if we need them (as long as we can still put the struct in a map).

@Stebalien
Copy link
Member

This looks like a good way to do this. We should probably, eventually, do the same with Multihash but we can do that later.

@kevina
Copy link
Contributor Author

kevina commented Aug 25, 2018

This looks like a good way to do this. We should probably, eventually, do the same with Multihash but we can do that later.

The fact that the Cid type change from a pointer to a non-pointer type is going to make this change painful, I would vote to change multihash to a string at the same time and get it over with, but I am not of a strong option on this if you are eager to push this trough now.

@Stebalien
Copy link
Member

The fact that the Cid type change from a pointer to a non-pointer type is going to make this change painful, I would vote to change multihash to a string at the same time and get it over with, but I am not of a strong option on this if you are eager to push this trough now.

I think the multihash change will be painful either way, will it not?

@kevina
Copy link
Contributor Author

kevina commented Aug 27, 2018

@Stebalien Probably, it might be easier to do it at the same time though.

cid.go Outdated
// - hash mh.Multihash
type Cid struct{ str string }

var Nil = Cid{}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about calling this Zero?

"Zero" would more closely match https://golang.org/ref/spec#The_zero_value

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nil is a more intuitive name to me. Zero implies a numeric value to me.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are correct in that go would call this the "Zero" value. However, I really don't like it.

Arguments against Zero:

  • To me, "Zero" implies a number.
  • Zero could mean the CID for the number 0 (z7xsiH4JqWQudech83). I expect that users will define "constant" CIDs all the time although this one might be a stretch.
  • We've historically used nil for "empty CIDs".
  • We have a CIDv0 ("version zero").
  • Zero "looks funny" (entirely subjective) in the return position.
func Foo() (Cid, error) {
    return Nil, fmt.Errorf("bad stuff")
}
func Bar() (Cid, error) {
    return Zero, fmt.Errorf("bad stuff")
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some other options just to toss out:

  • cid.Invalid -- this shows up in a lot of other idiomatic Go code... but mostly in enum situations, so maybe less appropriate here.
  • cid.None -- avoids all the numeric connotations, less "looks funny" in returns I think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still like Nil better...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@warpfork @Stebalien instead of Nil how about Undef? I like it a bit better then Invalid and about the same as Nil.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could go for Undef 👍

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👎 on undef and invalid. The value isn't undefined and isn't the only invalid value. I'd prefer None (although I still prefer Nil).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, seeing cid.Defined(), I'm onboard with this (if we return Undef and check with cid.Defined()).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay I am going with Undef.

@Stebalien
Copy link
Member

@kevina up to you.

@kevina
Copy link
Contributor Author

kevina commented Aug 28, 2018

@kevina up to you.

Once we switch to using Multihashes in the blockstore, the cid.Hash() method is going to be called a lot more. To prevent performance regressions it would probably be a good idea if that method doesn't alloc.

@Stebalien
Copy link
Member

Go ahead.

@kevina
Copy link
Contributor Author

kevina commented Aug 29, 2018

Go ahead.

Done. See multiformats/go-multihash#82

@Stebalien
Copy link
Member

Probably, it might be easier to do it at the same time though.

Given that this isn't looking to be the case, let's go ahead with this and then try tackling the multihash issue.

@kevina
Copy link
Contributor Author

kevina commented Aug 29, 2018

Given that this isn't looking to be the case, let's go ahead with this and then try tackling the multihash issue.

I left an additional option on the p.r. that might make this easier.

Please let me know what you intended time frame for landing this is. My current plan was to add the --cid-base to go-ipfs and then swing back around to merging this p.r. and fixing any broken code along the way.

@Stebalien
Copy link
Member

I'd like to get this done sooner or later. It's blocking ipfs/go-ipld-cbor#30.

@kevina
Copy link
Contributor Author

kevina commented Aug 29, 2018

Given that this isn't looking to be the case, let's go ahead with this and then try tackling the multihash issue.

Okay I factored out those changes into a separate p.r.

Copy link
Member

@Stebalien Stebalien left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are probably some optimizations we can toy with later but this looks good for now (from my end).

cid.go Outdated Show resolved Hide resolved
@kevina
Copy link
Contributor Author

kevina commented Aug 30, 2018

There are probably some optimizations we can toy with later but this looks good for now (from my end)

Exactly what did you have in mind for this?

@Stebalien
Copy link
Member

Exactly what did you have in mind for this?

  • We can assume that the version is one byte (and check against the constant 0x1).
  • We can probably rearrange some of the logic related to Prefix() to avoid parsing varints multiple times.
  • etc...

But let's handle that later. The key part here is changing the interfaces.

@kevina
Copy link
Contributor Author

kevina commented Aug 30, 2018

We can assume that the version is one byte (and check against the constant 0x1).

Actually I already did this, but it was part of the commit the used the new Multihash. (#73)

We can probably rearrange some of the logic related to Prefix() to avoid parsing varints multiple times.

Ditto. I think.

@Stebalien
Copy link
Member

@kevina want to go ahead and do the refactor?

@kevina
Copy link
Contributor Author

kevina commented Aug 30, 2018

@kevina want to go ahead and do the refactor?

I'm confused? To make it clear. Most of those optimizations I did as part of the switch to using a string multihash which in its self is an optimization. Did you want me to split #73 to separate out those optimizations?

@Stebalien
Copy link
Member

I'm confused?

No no, I'm talking about the whole cid type refactor (that is, fixing all the dependent packages). We can do any additional optimizations to this package later.

@kevina
Copy link
Contributor Author

kevina commented Aug 30, 2018

Sure, not sure I will get to it tonight though.

@kevina
Copy link
Contributor Author

kevina commented Sep 8, 2018

Ok I bubbled this all up all the way to go-ipfs but created p.r. to give others a chance to review and know what is going on. I will merge within a week.

@Stebalien @warpfork On second thought I am not super happy with using Undef and still think Nil (or maybe None) is better, but I don't want to bikeshed on this.

While bubbling this up I often used cid.Cid{} returning an error from a function that also returns a Cid (i.e. return cid.Cid{}, err). That may be ugly, but it will be immanently obvious what is going on. In a few limited cases I used cid.Undef, in particular when assigning to a variable where we care if the Cid is defined or not. I used Defined() to test if a Cid is defined and never c == Undef.

All the p.r. involved have gx publish commit as part of the p.r. I will rebase that commit out and republish as I merge them.

}

return Cid{string(buf[:n+hashlen])}
}

// Cid represents a self-describing content adressed
// identifier. It is formed by a Version, a Codec (which indicates
// a multicodec-packed content type) and a Multihash.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kevina Could you please add a high level description of how the version, codec, and hash are packed into a string, similar to how the multihash docs do, to the godoc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We decided against that. The format varies according to the CID type (see #71 (comment)). The layout is according to the cid spec. see https://github.com/ipld/cid.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair, thoughts on adding a link to the cid spec to the comment then? The location of various specs isn't the easiest thing to deduce, i.e. the cid spec is under the ipld org whereas the implementation is under the ipfs org. Just a suggestion, not a blocker.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There entire documentation could use cleaning up. The format is mentioned in multiple places and the link to the spec is in the documentation for the package. https://godoc.org/github.com/ipfs/go-cid.

cid.go Outdated
}
if dec.Code != mh.SHA2_256 || dec.Length != 32 {
panic("invalid hash for cidv0")
}
return Cid{string(mhash)}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Stebalien I panic here in order to avoid introducing another API change. However, the patch to strict CIDs (#40) does change the return type to (Cid, error) for both NewCidV0 and NewCidV1. Should we just change them and get it now and get it over with?

I don't have strong felling either way?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's keep this simple.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, the only reason I suggested doing it now is to avoid another API change in the near future. But we can keep this simple and change the return type later.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just don't want to drag this out.

@Stebalien
Copy link
Member

@kevina are we going to need to bubble that multibase update? If so, we should do that first. The list of changes we'll need to make for that will be much larger.

@kevina
Copy link
Contributor Author

kevina commented Sep 11, 2018

@Stebalien actually it won't I checked.

Using my new tool:

go-ipfs$ gx-update-helper preview go-multibase
github.com/multiformats/go-multibase

github.com/ipfs/go-cid :: go-multibase

github.com/ipfs/go-block-format :: go-cid
github.com/ipfs/go-cidutil :: go-cid
github.com/ipfs/go-ipfs-ds-help :: go-cid
github.com/libp2p/go-libp2p-routing :: go-cid
github.com/ipfs/go-verifcid :: go-cid

github.com/ipfs/go-ipfs-blocksutil :: go-block-format
github.com/ipfs/go-ipfs-chunker :: go-block-format
github.com/ipfs/go-ipfs-exchange-interface :: go-block-format
github.com/ipfs/go-ipld-format :: go-block-format
github.com/libp2p/go-libp2p-kad-dht :: go-libp2p-routing
github.com/libp2p/go-libp2p-routing-helpers :: go-libp2p-routing
github.com/ipfs/go-ipfs-blockstore :: go-block-format go-ipfs-ds-help
github.com/ipfs/go-ipfs-routing :: go-ipfs-ds-help go-libp2p-routing

github.com/ipfs/go-ipfs-posinfo :: go-ipld-format
github.com/ipfs/go-ipld-cbor :: go-ipld-format
github.com/ipfs/go-ipld-git :: go-ipld-format
github.com/libp2p/go-libp2p-pubsub-router :: go-libp2p-routing-helpers
github.com/ipfs/go-ipfs-exchange-offline :: go-ipfs-blockstore go-ipfs-blocksutil go-ipfs-exchange-interface
github.com/ipfs/go-bitswap :: go-ipfs-blockstore go-ipfs-blocksutil go-ipfs-exchange-interface go-ipfs-routing

github.com/ipfs/go-blockservice :: go-bitswap go-ipfs-exchange-offline

github.com/ipfs/go-merkledag :: go-blockservice

github.com/ipfs/go-path :: go-merkledag
github.com/ipfs/go-unixfs :: go-merkledag

github.com/ipfs/go-mfs :: go-path go-unixfs

github.com/ipfs/go-ipfs :: go-mfs

Which is saying anything that depends on go-multibase also depends on go-cid.

@Stebalien
Copy link
Member

Hm. Yeah, you're right. But whatever we do, let's really try to avoid tacking on random additions at the last minute.

@kevina kevina merged commit 6e296c5 into master Sep 12, 2018
@ghost ghost removed the status/in-progress In progress label Sep 12, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants