Skip to content
This repository has been archived by the owner on Sep 9, 2020. It is now read-only.

Warn on use of abbreviated sha1 commit hash #582

Closed
wants to merge 5 commits into from

Conversation

zknill
Copy link
Contributor

@zknill zknill commented May 15, 2017

Update manifest validation to warn when an abbreviated version of a
sha1 commit hash is used.

Add test for warn.

Fixes #567

Update manifest validation to warn when an abbreviated version of a
sha1 commit hash is used.

Add test for warn.
Copy link
Collaborator

@carolynvs carolynvs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sweet! Thanks for picking this up.

manifest.go Outdated
// valid key
case "revision":
// Check if sha1 hash is abbreviated
if valueStr, ok := value.(string); ok && len(valueStr) < 40 {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This assumes that revisions are only hg/git (which are both commit hashes) but dep also supports other version control systems, such as bazaar. Let's add a test with bazaar revision id's as well to make sure that those won't be affected by this new validation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you know if it's the bazaar revision-id or revno that's used as the revision in the manifest?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pretty sure it's a revision-id, see https://github.com/golang/dep/blob/master/cmd/dep/ensure_test.go#L24 for what we are validating against in our tests. I think that's a revision-id, yeah?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Brilliant! thanks for incredibly speedy reply

Update manifest validation to accept a valid bzr revision id.
Update git abbreviated hash check to use a regex instead of len check.

Add tests for passing and failing bzr revision.
@zknill zknill force-pushed the warn-on-abbreviateed-sha1-hash branch from 811717f to 57a79c5 Compare May 15, 2017 19:15
Copy link
Collaborator

@carolynvs carolynvs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks better to me but I worry that I'm missing something around why the code didn't originally use regex. 😀

@jessfraz Was there a reason why you didn't originally use regex for validating a bazaar revision id that we should watch out for here?

manifest.go Outdated
case "revision":
if valueStr, ok := value.(string); ok {
// Check if sha1 hash is abbreviated
validGit := regexp.MustCompile("[a-f0-9]{40}").MatchString(valueStr)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would recommend renaming this variable to validHash instead, since it could be from git or hg

@darkowlzz
Copy link
Collaborator

darkowlzz commented May 16, 2017

@zknill Can you also correct this typo I introduced in the test file along with this PR? 😅
constains to contains

@zknill
Copy link
Contributor Author

zknill commented May 16, 2017

Yeah absolutely! I was looking that and wondering

@sdboyer
Copy link
Member

sdboyer commented May 17, 2017

@carolynvs the validation logic we use there is originally inherited from me - @jessfraz is in the blame b/c she discovered that i was totally wrong about the final component of the revid being hex-encoded 😄

No, there's no reason we don't use a regex. Before we go down that path, though...maybe it'd be best to look through the bzr codebase to find where they validate the shape of these identifiers, and mimic that directly?

@ibrasho
Copy link
Collaborator

ibrasho commented May 17, 2017

this is how bzr generate new rev ids

I'm not able to find the code that validate the format of revids yet.

@ibrasho
Copy link
Collaborator

ibrasho commented May 17, 2017

The tests validates revids through regex.

@ibrasho
Copy link
Collaborator

ibrasho commented May 17, 2017

Also, I would recommend storing the result of regexp.MustCompile() calls instead of generating a new one for each check.

manifest.go Outdated
case "revision":
if valueStr, ok := value.(string); ok {
// Check if sha1 hash is abbreviated
validHash := regexp.MustCompile("[a-f0-9]{40}").MatchString(valueStr)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's prefer to use encoding/hex.DecodeString() here - faster than compiling a regexp, and also just accurately reflects the data.

Update the regex to match that from the bzr revision-id generation tests
and reuse it's compiled form.

Update to match sha1 revision hash on hex string and length rather than
regex.

Add a test for an even length, abbreviated, sha1 revision hash.
Use raw string with backticks to avoid having to double escape inside a
regex.
@sdboyer
Copy link
Member

sdboyer commented May 19, 2017

OK, my apologies, but I'm just actually thinking this all through now properly, and I've realized that we're kinda going about this the wrong way. 😦

First, we've disregarded svn here in a way that might be problematic. svn revision numbers will eventually be able to go into this field, then the algorithm as-written will mistake those numbers for abbreviated hex-encoded SHA1s. So, that's no good. We need to narrow the scope of our search to a particular, expected form, rather than trying to validate the field more broadly.

To that end, we actually need to be looking for just one specific pattern - a 7-char long (git's abbrev length) or 12-char long (hg's abbrev length, I believe) string containing only [0-9a-f], with at least one character being in [a-f] (so as to disambiguate from actual numbers). My suggestion about using hex decoding to validate won't work, as that necessarily can't work on an odd-length string.

Being that we've already done the work to figure out how to validate bzr revs, though, let's not throw that work away. Let's open up a separate PR in which we apply it to deduceConstraint(), where we actually are trying to pattern-match these inputs.

@zknill
Copy link
Contributor Author

zknill commented May 19, 2017

Ok, I agree this seems wrong and / or problematic
However if the ultimate goal is to ensure that the full revision id from [insert vcs] is given in the manifest, I struggle to see how matching just the most common abbreviations will satisfy that goal?

I am not super familiar with the code base, and I don't know if it's even possible, but it feels to me that it would be better to know which vcs we were using at the point at which we validate the revision, and ensure that we are using the correct validation rules for that vcs. Else you are right, it does feel a little like blindly guessing at what's acceptable.

@sdboyer
Copy link
Member

sdboyer commented May 19, 2017

I am not super familiar with the code base, and I don't know if it's even possible, but it feels to me that it would be better to know which vcs we were using at the point at which we validate the revision, and ensure that we are using the correct validation rules for that vcs.

Yes, this definitely would make it easier. Getting that information, though, isn't zero-cost:

  1. If we allow the user to explicitly state it (e.g. with a vcs field), then it a) could, and will, be wrong, introducing a new failure mode and b) becomes yet one more annoying thing that the user has to specify.
  2. If we want to infer it for these validation purposes, then now we have a network dependency for some of our most basic validation routines. Avoiding network access where we can is a major way that we can avoid making the tool slooooow.

Fortunately, we can make these changes now, then improve them down the line - say, once we have #431 and can reliably avoid network access to get vcs type. (Note that we actually probably could do this all locally with the information we cache now, but because , I think it may be better to not try to tackle this until after #431)

@zknill
Copy link
Contributor Author

zknill commented May 19, 2017

Ok nice! Thanks for the explanation. So plan going forward? Closing this and opening another with changes to match the specific cases you laid out above?

Should that logic live in validateManifest or else where?

@ibrasho
Copy link
Collaborator

ibrasho commented May 19, 2017

@sdboyer I'm not sure about this part:

To that end, we actually need to be looking for just one specific pattern - a 7-char long (git's abbrev length) or 12-char long (hg's abbrev length, I believe) string containing only [0-9a-f], with at least one character being in [a-f] (so as to disambiguate from actual numbers). My suggestion about using hex decoding to validate won't work, as that necessarily can't work on an odd-length string.

In git it's not uncommon to get commits where the first 7 (or more) chars are all digits. see ibrasho-forks@28792060519c And I assume the same happens in hg.

Unless I misunderstood your point...

@sdboyer
Copy link
Member

sdboyer commented May 19, 2017

Ok nice! Thanks for the explanation. So plan going forward? Closing this and opening another with changes to match the specific cases you laid out above?

Sure 😄 I'd say this PR should implement the character-counting solution I outlined; if it's easier for you to move that to a separate PR, though, we can close this and do that.

Should that logic live in validateManifest or else where?

Yep, that's still the right place.

In git it's not uncommon to get commits where the first 7 (or more) chars are all digits. see ibrasho-forks/dep@2879206 And I assume the same happens in hg.

Unless I misunderstood your point...

Right - assuming a perfectly random distribution, it would happen (10/16)^7 = 3.7% of the time. My thinking here is that it's worse to have false positives than miss a small percentage of cases, especially when this is a relative rare situation we're guarding against, anyway.

@sdboyer
Copy link
Member

sdboyer commented May 23, 2017

@zknill have you decided? would you rather close this and start fresh, or just push up revised commits to this PR?

@zknill
Copy link
Contributor Author

zknill commented May 28, 2017

just making the changes, I will close and push to another PR. Should be easier to review etc.

@zknill
Copy link
Contributor Author

zknill commented May 28, 2017

Closing in favour of #671

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants