Re: [whatwg/url] IPv4 host parser + site definition seems potentially dangerous. (#560)

I've added the logging to Chrome, and we now have some numbers.  The numbers vary a lot by platform.  Sorry for the delay - didn't have much time as I thought I'd have to do this.

Some numbers by platform. Values are % of total DNS attempts are for non-numeric domains with a numeric component, % of those lookups that succeeded, % of successful lookups that were for hostnames with only a single terminal non-numeric component (as the options here are ban 2+ such terminal components, ban 1+ of them, or do nothing). Values are rounded a fair bit:

Linux: 0.0002% lookups, 60% succeed, 99% one numeric component
OSX:  0.0003% of lookups, 90% succeed, 2% one numeric component.
Windows: < 0.0001% of lookups, 2% succeed.
Android:  0.0003% of lookups, 92% succeed, 1% one numeric component.
ChromeOS:  Looks a fair bit like Windows - few lookups, almost all fail.

For file URLs, where Chrome applies the same logic on the domain portion (on Windows), the numbers are low enough not to be a concern (also unclear if other browsers apply the same hostname validation logic here as well).

So these are surprisingly common on Linux, Android, and OSX.  If we only allowed a single terminal numeric component, we would only have minimal breakage on Linux, but we would see real breakage on OSX and Android, though admittedly, for under 3 in 1,000,000 DNS lookups.  It's also unclear if these were DNS lookups for actual URL requests or for something else.

Note that we don't have numbers for how common URLs have these weird hostnames but don't make it to the DNS resolver (e.g., used by the Javascript URL API, or intercepted by a ServiceWorker and remapped to something more reasonable).  Updating the URL spec would affect those use cases as well.

Since this can potentially cause real security issues, I do think we should go ahead and update the spec, but it is a breaking change, and looks like there are likely some dependencies on the current behavior.

Anyhow, thoughts/concerns/feedback welcome, but I plan to write a draft change to the spec.  I will, of course, CC everyone who has commented here on that issue.

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/whatwg/url/issues/560#issuecomment-875162770

Received on Wednesday, 7 July 2021 00:05:12 UTC