Re: [whatwg/url] IPv4 host parser + site definition seems potentially dangerous. (#560) from Matt Menke on 2020-11-30 (public-webapps-github@w3.org from November 2020)

From: Matt Menke <notifications@github.com>
Date: Mon, 30 Nov 2020 12:17:11 -0800
To: whatwg/url <url@noreply.github.com>
Cc: Subscribed <subscribed@noreply.github.com>
Message-ID: <whatwg/url/issues/560/736018280@github.com>

@mozfreddyb:  You are indeed correct - Chrome behaves that way as well.  I must have made a mistake when testing it (Or thought I knew how it worked in Chrome without testing).

It looks to me like at least some paths in Chrome extracts a URL from a site for storage quota, so Chrome, at least, has some real bugs if non-IPv4 hostnames ending in two numbers in [0,255] ever resolve.

I'm not an expert in this space, but a solution that rejected these at the URL parsing layer seems safest to me.

It would be simpler to reject any with a final number that's not an IPv4 address, making for a more predictable web platform for developers, though a more targeted carveout might result in less fallout.

I think the simplest thing to do would be to update the IPv4 parser as follows:

...

4. If parsing the last item in parts using the IPv4 number parser is failure, return input.

5. If parts’s size is greater than 4, then return failure.

6.  <Previous step 5 here, all further steps are unchanged, except replace "return input" cases with "return failure"

...

Also, I think the IPv4 number parser has a bug - it should fail if input is the empty string, not return 0.  The claim that 0..0x300 is a domain and not an IPv4 address seems to require that behavior.

That having been said, I'm completely open to other ways to resolve this.

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/whatwg/url/issues/560#issuecomment-736018280

Received on Monday, 30 November 2020 20:17:24 UTC