Re: [whatwg/url] Forbid < and > in hosts (#458)

On Thu, Dec 12, 2019 at 4:10 PM achristensen07 <notifications@github.com>
wrote:

> I have three concerns:
> 0. Are there any real registered domains with '*', '^', '|', or '"' in
> them? I imagine there are rules somewhere in ICANN preventing this, but it
> would be good to reference them.
>

RFC1034 sets out the LDH rule for preferred name syntax, and the ICANN
Registry Agreement (specifically, Spec 6 - Interoperability and Continuity)
restricts registerable domains to that.

Of course, the world knows that beyond that, madness lies - because DNS
wire-form is 8-bit, it can have any form, and even though A/AAAA are
“supposed” to follow preferred name syntax (as host records), buggy servers
combined with generic client libraries (that support other non-DNS
resolution paths) can let anything through. The most obvious case is
underscores.

So these URLs would appear either in private networks, non-DNS host
schemes, or as subdomains of registered names taking advantage of lax
client behavior. While you can’t issue TLS certificates to these names
directly, you can sneak by with wildcards, sadly.


>    1. What led to these characters being forbidden in Gecko? Will we want
>    to change this set of forbidden characters again after this?
>    2. Are there any URLs with custom schemes with those in their host?
>    This is harder to find out. I hope the compatibility risk is minimal but I
>    don't have a good way to find out except changing it and seeing which
>    things break.
>
>
Yeah, this is the analysis I mentioned we’d have to do for Chrome. URL
parsing changes are generally accompanied by analyzing corpora like the
entire Google search index to see what compatibility risk might be had, and
that’s a Lot Of Work compared to changing a few characters in a lookup
table to zero. 😕

I think it’s worth doing, and I think it’s worth aligning on.


>    1. This is also a nudge to Chrome and Firefox to implement hosts in
>    URLs with custom schemes according to the spec.
>
>
Yes. It’s well deserved and the biggest issue with our URL parsing 😔


-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/whatwg/url/issues/458#issuecomment-565273240

Received on Friday, 13 December 2019 02:26:34 UTC