Re: [whatwg/url] IPv4 host parser + site definition seems potentially dangerous. (#560)

> > and then we invoke the IPv4 number parser 4 times on empty strings
> 
> Why? The first step of the _parts_ loop does:
> 
> > If part is the empty string, then return input.
> 
> (I did misunderstand though as you were talking about a different empty string, i.e., one resulting from splitting a sequence of dots.)

Ah, you're right about that.

> > I'm not seeing why this is?
> 
> Yeah, I clearly misread that. Sorry about that. I guess the one change needed to your set of steps is that we also want to return input if the last part (at that point, i.e., after removing an initial last part empty string, if any) is the empty string.
> 
> And we could also consider moving that into the host parser as being the decider between IPv4 and domain names. (Though I'm not sure how to restructure it nicely yet.)

I agree that would make things cleaner.  I think we should also consider anything that ends in all numbers IPv4, even if it doesn't parsed with the current logic (e.g. 0.0.0.09, which my proposed modification considers a valid non-IPv4 hostname.  I don't want that to go from a valid hostname to an invalid one if/when we remove base-8 support).

Before writing anything up, think I'll invest in instrumenting getting data to estimate breakage, which take a couple months to roll out.  We'll need numbers before I can land any behavior changes, anyways.

I'm thinking I'll instrument the DNS layer, rather than the URL parser, to see how often we see such domain names, and how often they resolve, instead of how often they're typed.  If they don't resolve to anything, I'm fine with failing with an invalid hostname error instead of a DNS error.  May need to instrument both layers, though, as I suppose the Javascript URL API does expose what URLs we consider invalid to the web platform.

Thanks so much for the feedback!

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/whatwg/url/issues/560#issuecomment-737676388

Received on Thursday, 3 December 2020 05:32:43 UTC