Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Changes to file: URL formatting in Node v7 cause problems for npm #9521

Closed
iarna opened this issue Nov 8, 2016 · 13 comments
Closed

Changes to file: URL formatting in Node v7 cause problems for npm #9521

iarna opened this issue Nov 8, 2016 · 13 comments
Labels
url Issues and PRs related to the legacy built-in url module.

Comments

@iarna
Copy link
Member

iarna commented Nov 8, 2016

  • Version: v7.0.0
  • Platform: platform agnostic
  • Subsystem: url.format

IMPACT: Before I get into the specifics, I want to be clear about what the impact is for npm. It's actually pretty limited. As best as we can tell this only impacts when saving a file-type specifier to a package.json with npm, not when installing from one. As such, it can be worked around by editing the package.json.

#7234 introduced a change to how url.format printed invalid file: type URLs. Specifically if someone gave it the invalid file:/absolute/path/to/file.txt it would translate that into the valid file:///absolute/path/to/file.txt. Previously invalid URLs were left unaltered.

npm's use of file URLs are usually to point at relative paths, which isn't something file URLs are really setup to support. The way npm does this is tack file: on the front, so npm -i -S ./foo/ becomes file:foo and npm -i -S ../foo becomes file:../foo.

As of v7, those become file://foo and file://../foo respectively. The first being a reference to / on the foo host, and the latter being a reference to /foo on the .. host.

Ok, so Node v7 is making legacy URLs "healthy" where it didn't used to and that bit our particular use case. We can update npm to work around that, but there is an additional wrinkle…

We don't actually call url.format('file:foo'). What we call is:

url.format({protocol: 'file', slashes: false, pathname: 'foo'})

It'd be legit to call this a doc bug and update there, imo, though ignoring properties in the object form feels odd to me. Either way, we'll be updating our code to not rely onurl.format.

@addaleax addaleax added the url Issues and PRs related to the legacy built-in url module. label Nov 8, 2016
@rvagg
Copy link
Member

rvagg commented Nov 9, 2016

Thanks for the use-case @iarna

@nodejs/ctc since we approved this we ought to consider the impact and whether and how we want to apply mitigation in this case. I see a few ways forward:

  • Accept that the new behaviour is what we intended and make our docs more clear about the non-absolute case (edit: also a test case to cover this!)
  • Revert our changes and embrace url.URL as the future to fix these kinds of mistakes and just leave url alone
  • Find a middle-ground that takes the relative path case into account differently to the absolute. If you look at url: return valid file: urls fom url.format() #7234 you can see the intention was to deal with absolute paths, there's no explicit test included for relative paths. When the input is sufficiently vague, as is the case with url.format({protocol: 'file', slashes: false, pathname: 'foo'}), it's arguably bad form of us to make an assumption that it's an absolute path, even if file urls officially don't do relative.

@Trott
Copy link
Member

Trott commented Nov 9, 2016

@iarna Just to make sure I'm understanding: npm can and will work around this, so even if we update the docs and that's it, things will be OK with npm (assuming people upgrade npm). In other words, you are informing us of a use case that we seem to have failed to consider and we should make a decision as to whether or not we should accommodate that use case. But either way, npm will be fine. Or am I misunderstanding?

@Trott
Copy link
Member

Trott commented Nov 9, 2016

I'm OK with any of the options @rvagg listed, but (assuming @iarna's answer to my question in the preceding comment is "yes, that is correct") I'd prefer the first one: retain the current behavior and update the docs.

And my second choice is the last one (basically, keep the current behavior generally, but add relative paths as a possibility even though relative paths are not permitted in file urls according to spec).

But I can see reasons to be for and against each of the possibilities.

And I would (of course) feel differently if this were causing widespread breakage in the ecosystem. If that were the case, I'd probably be all for reverting.

@Trott
Copy link
Member

Trott commented Nov 9, 2016

Actually...I just did a test in a browser, and Chrome, at least, understands relative file paths. Given that "do what the browser does" seems to be the general philosophy for url, I have to change my preference to: Support relative paths.

@jasnell
Copy link
Member

jasnell commented Nov 9, 2016

While I would generally prefer url.parse() to work correctly, the key reason I implemented url.URL separately was to avoid ecosystem breakage. I'd rather leave url.parse() as it is, warts and all, and move users over to url.URL as we can.

@sam-github
Copy link
Contributor

I gotta say, while I understand that there are times when a strictly conformant URL implementation is useful, I have found url's easy going nature to be very useful, in both formatting and parsing, it allows it to be used for all kinds of non-internet standard encoding of url-like data, as npm found.

@Trott
Copy link
Member

Trott commented Nov 10, 2016

I have found url's easy going nature to be very useful

At the risk of getting a little too far off topic: I mostly agree, but...there are things lurking... Well, here's an example: Given a FQDN with one component longer than the officially-allowed 63 characters, url.parse() will truncate it to 63 characters (which is acceptable IMO) and decide that the leftover part should be pushed over to the path (which is a problem IMO). Behold:

> url.parse('http://www.ThisIsAnAbsurdlyLongHostNameButSurelyNoneOfThisShouldEndUpAsPartOfThePathBecauseWhoCouldPossiblyWantThatAMIRITE.com/CorrectPathBeginning')
Url {
  protocol: 'http:',
  slashes: true,
  auth: null,
  host: 'www.thisisanabsurdlylonghostnamebutsurelynoneofthisshouldendupaspar',
  port: null,
  hostname: 'www.thisisanabsurdlylonghostnamebutsurelynoneofthisshouldendupaspar',
  hash: null,
  search: null,
  query: null,
  pathname: '/tOfThePathBecauseWhoCouldPossiblyWantThatAMIRITE.com/CorrectPathBeginning',
  path: '/tOfThePathBecauseWhoCouldPossiblyWantThatAMIRITE.com/CorrectPathBeginning',
  href: 'http://www.thisisanabsurdlylonghostnamebutsurelynoneofthisshouldendupaspar/tOfThePathBecauseWhoCouldPossiblyWantThatAMIRITE.com/CorrectPathBeginning' }
>

Maybe I just have failed to think of an appropriate use case (a 63-character TLD maybe?), but I cannot imagine a situation where the above behavior is desirable. It's easy to think of situations where it leads to bugs. (Basically: Every situation where it might happen.) And I would not be surprised to find that it has the potential for security issues in the right context, although I may lack the creativity to contrive a reasonable example of that myself.

In any event, I think we should do what browsers do, which appears to be to not bother validating the char length of components for the FQDN. Garbage in/garbage out, but at least make it sensible garbage that clearly maps to the input.

But that's not what's being discussed here. Just a long aside. Thanks for indulging me. (If that topic is of interest, it's being discussed in #9292.)

@iarna
Copy link
Member Author

iarna commented Nov 11, 2016

@Trott

Just to make sure I'm understanding: npm can and will work around this, so even if we update the docs and that's it, things will be OK with npm (assuming people upgrade npm).

Yes, that's right.

@iarna
Copy link
Member Author

iarna commented Nov 11, 2016

@jasnell It's not url.parse that's causing issues. It's url.format. Though I suppose your point still stands. =)

@Trott
Copy link
Member

Trott commented Nov 22, 2016

Looking at the docs right now, I don't think this is a doc omission (although the docs could be more clear). For url.format(), it says:

  • If either the urlObject.slashes property is true, urlObject.protocol
    begins with one of http, https, ftp, gopher, or file, or
    urlObject.protocol is undefined, the literal string // will be appended
    to result.

That's not exactly the clearest prose expression of all time. But it does say that if the protocol property is set to file, the slashes are appended.

(FWIW, the word either should ideally be removed, as there are more than two options listed.)

So, unless I'm missing something, I think we're covered in the docs if it is determined that sticking with the current behavior is the most desirable course of action.

@Trott
Copy link
Member

Trott commented Nov 22, 2016

(I am going to file a PR to reword it for clarity, though.)

Trott added a commit to Trott/io.js that referenced this issue Nov 22, 2016
Trott added a commit to Trott/io.js that referenced this issue Nov 24, 2016
PR-URL: nodejs#9731
Ref: nodejs#9521
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
addaleax pushed a commit that referenced this issue Dec 5, 2016
PR-URL: #9731
Ref: #9521
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
MylesBorins pushed a commit that referenced this issue Dec 20, 2016
PR-URL: #9731
Ref: #9521
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
MylesBorins pushed a commit that referenced this issue Dec 21, 2016
PR-URL: #9731
Ref: #9521
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
@iarna
Copy link
Member Author

iarna commented Dec 25, 2016

We shipped our own change for this in npm@4.1.0 which I'll be downstreaming to you all this Thursday, Dec 22nd. (Well, 4.1.1 actually, but yeah.)

@Trott
Copy link
Member

Trott commented Jul 15, 2017

This issue has been inactive for sufficiently long that it seems like perhaps it should be closed. Feel free to re-open (or leave a comment requesting that it be re-opened) if you disagree. I'm just tidying up and not acting on a super-strong opinion or anything like that.

@Trott Trott closed this as completed Jul 15, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
url Issues and PRs related to the legacy built-in url module.
Projects
None yet
Development

No branches or pull requests

6 participants