incorrect URL parsing #2114

trevnorris · 2015-07-06T19:32:55Z

A regression was introduced between v0.10 and v0.12 for the URL parsing of an http.get() request. Basically, multi-byte characters are decoded as 'binary' instead of either

Being decoded as UTF-8
Properly decoded into their '%' counterparts.

Test and additional information is located at nodejs/node-v0.x-archive#25634 (comment)

The text was updated successfully, but these errors were encountered:

rvagg · 2015-07-07T01:54:03Z

@trevnorris so to clarify, you're saying that io.js is impacted by this also (since we're not strictly an 0.12 fork that's not obvious)?

trevnorris · 2015-07-07T08:08:49Z

Yes. The test in the linked issue has the same result in io.js. A possible solution would be to simply parse the string with encodeURI() before turning it into a buffer using binary encoding.

vkurchatkin · 2015-07-07T09:45:21Z

This could be related: #1693

bnoordhuis · 2015-07-07T10:01:29Z

It's probably the result of this change. Before f674b09, headers and the status line were parsed as UTF-8, now they're parsed as ISO-8859-1.

trevnorris · 2015-07-07T16:38:00Z

@bnoordhuis Parsing the string this way follows more closely to the spec. Even though browsers may show the unicode characters in the URL bar, checking the network request shows it also decodes them before firing the request. io.js' http module would also barf on this request since we decode incoming headers using ISO-8859-1.

@vkurchatkin It does look like the same issue. IMO the options are to let the user know they need to encode their header strings before sending them, or we should consider doing that automatically before turning them into a buffer.

Fishrock123 · 2015-08-31T18:26:08Z

Duplicate of #1693. See also: #2629

vkurchatkin · 2015-08-31T18:32:53Z

I wouldn't say it's duplicate. There are two problems with UTF8: parsing and writing, and they seem unrelated.

http would previously accept paths with non-ASCII characters. This proved problematic, because multi-byte characters were encoded as 'binary', that is, the first byte was taken and the remaining bytes were dropped for that character. There is no sensible way to fix this without breaking backwards compatibility for paths containing U+0080 to U+00FF characters. We already reject paths with unescaped spaces with an exception. This commit does the same for paths with non-ASCII characters too. The alternative would have been to encode paths in UTF-8, but this would cause the behaviour to silently change for paths with single-byte non-ASCII characters (eg: the copyright character U+00A9 ©). I find it preferable to to add to the existing prohibition of bad paths with spaces. Bug report: nodejs#2114

Trott · 2016-04-05T23:17:38Z

IMO the options are to let the user know they need to encode their header strings before sending them, or we should consider doing that automatically before turning them into a buffer.

Is there consensus at this time that one of these two options is superior to the other?

trevnorris · 2016-04-06T20:42:17Z

@Trott Nope. May just want to throw this into the CTC meeting for quick vote for fast resolution.

jasnell · 2016-07-03T04:55:56Z

Should be addressed by the WHATWG URL impl here: #7448

ChALkeR · 2016-07-03T06:00:47Z

@jasnell Does that change what http.get sends?

jasnell · 2016-07-20T15:59:23Z

@ChALkeR ... actually no, it doesn't, you're right.

jasnell · 2017-05-01T19:38:16Z

Does this need to remain open?

jasnell · 2017-05-30T04:13:57Z

Closing given the lack of any further progress on this. It's not even clear if this is still an issue

Flimm · 2017-05-30T13:27:44Z

I've created a very similar issue (with a failing test-case) here: #13296

trevnorris added the http Issues or PRs related to the http subsystem. label Jul 6, 2015

Fishrock123 added the confirmed-bug Issues with confirmed bugs. label Aug 27, 2015

Fishrock123 closed this as completed Aug 31, 2015

Fishrock123 added the duplicate Issues and PRs that are duplicates of other issues or PRs. label Aug 31, 2015

Fishrock123 reopened this Aug 31, 2015

Fishrock123 removed the duplicate Issues and PRs that are duplicates of other issues or PRs. label Aug 31, 2015

This was referenced Sep 25, 2015

http: Reject paths containing non-ASCII characters #3062

Closed

Check unicode links correctly DavidAnson/check-pages#3

Closed

ChALkeR mentioned this issue Feb 15, 2016

Node.js 4.2.0 encoding problems #3382

Closed

jasnell added the url Issues and PRs related to the legacy built-in url module. label Jun 7, 2016

jasnell mentioned this issue Jun 7, 2016

proposal: WHATWG URL standard implementation nodejs/node-eps#28

Closed

sotarok mentioned this issue Apr 5, 2017

Unfurl Slack URL crowi/crowi#204

Closed

1 task

jasnell closed this as completed May 30, 2017

snyk-bot mentioned this issue Apr 1, 2020

[Snyk] Upgrade prismjs from 1.17.1 to 1.19.0 O330oei/node#14

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

incorrect URL parsing #2114

incorrect URL parsing #2114

trevnorris commented Jul 6, 2015

rvagg commented Jul 7, 2015

trevnorris commented Jul 7, 2015

vkurchatkin commented Jul 7, 2015

bnoordhuis commented Jul 7, 2015

trevnorris commented Jul 7, 2015

Fishrock123 commented Aug 31, 2015

vkurchatkin commented Aug 31, 2015

Trott commented Apr 5, 2016

trevnorris commented Apr 6, 2016

jasnell commented Jul 3, 2016

ChALkeR commented Jul 3, 2016

jasnell commented Jul 20, 2016

jasnell commented May 1, 2017

jasnell commented May 30, 2017

Flimm commented May 30, 2017

incorrect URL parsing #2114

incorrect URL parsing #2114

Comments

trevnorris commented Jul 6, 2015

rvagg commented Jul 7, 2015

trevnorris commented Jul 7, 2015

vkurchatkin commented Jul 7, 2015

bnoordhuis commented Jul 7, 2015

trevnorris commented Jul 7, 2015

Fishrock123 commented Aug 31, 2015

vkurchatkin commented Aug 31, 2015

Trott commented Apr 5, 2016

trevnorris commented Apr 6, 2016

jasnell commented Jul 3, 2016

ChALkeR commented Jul 3, 2016

jasnell commented Jul 20, 2016

jasnell commented May 1, 2017

jasnell commented May 30, 2017

Flimm commented May 30, 2017