Re: [whatwg/url] Disallow all C0 control and DEL codepoints in hosts (PR #673)

One thing I'd like to add (before I'm too full of "festive cheer" and forget it):

Failing on certain forbidden host code-points can unlock optimisation opportunities. For example, currently when parsing the hostname portion of a URL string, we have to precisely track whether we're inside or outside of a square bracket (in order to determine whether a colon is part of an IPv6 address or is a hostname/port delimiter). This slows down all URL strings. I've been experimenting with an approach that simply splits on the first colon after a closing square bracket.

So for the hostname `"][::][:80"`, the split is:
| Method | Host | Port
| ---- | --- | --- |
| Std | `"][::]["` | `"80"` |
| Mine | `"]["` | `":][:80"` |

Currently that's fine (I think?). Since the hostname doesn't begin with an opening bracket, it won't be parsed as IPv6, and ultimately the opaque host parser will reject them both because square brackets are forbidden. If we changed to percent-encode forbidden code-points, this input would potentially be accepted.

Perhaps this could be limited to the setter function?

-- 
Reply to this email directly or view it on GitHub:
https://github.com/whatwg/url/pull/673#issuecomment-1000768296
You are receiving this because you are subscribed to this thread.

Message ID: <whatwg/url/pull/673/c1000768296@github.com>

Received on Friday, 24 December 2021 10:10:05 UTC