[whatwg/url] Forbidden host code points should include all C0 controls & U+007F (#627)

Right now, [forbidden host code points](https://url.spec.whatwg.org/#forbidden-host-code-point) only includes U+0000 NULL and ASCII whitespaces from the C0 controls range. However, there seems to be no good reason to allow the other unprintable characters. Right now according to spec, `new URL('https://a\u0002b.com')` parses with `u.hostname === 'a\u0002b.com'`, which deviates from what a user might expect from a URL parser.

Most implementations agree with this assessment. Chrome and Firefox both forbid all C0 controls and U+007F (whether verbatim or percent-encoded). Safari aligns with the current spec. For non-browsers:

* Go forbids any percent-encoding in the hostname
* curl forbids C0 controls but not `%7F`
* Python and Ruby accepts any percent-encoded characters in hosts
* Node.js legacy parser breaks when it sees any percent-encoded characters in hosts (it tries to add a `/` before the `%`)

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/whatwg/url/issues/627

Received on Tuesday, 10 August 2021 18:52:37 UTC