Re: [whatwg/url] Change host parser for non-special schemes (#148)

@annevk Sorry, somehow I missed these pings in my inbox due to accidentally muting the thread (likely fat fingered something)

I admit, I haven't read the WHATWG spec that closely for the context of the issue - much of our GURL implementation reflected (or tried to) RFC3986. However, where I suspect the incompat has arisen is with respect to standard URLs (e.g. those with authority components in a structured form). Despite https://tools.ietf.org/html/rfc3986#section-1.1.1 remarking that

> It thus defines the syntax and semantics needed to implement a scheme-
>   independent parsing mechanism for URI references, by which the
>   scheme-dependent handling of a URI can be postponed until the
>   scheme-dependent semantics are needed.

namely, that all unrecognized schemes can be assumed generic unless/until they're supported, the GURL implementation treats all unrecognized schemes as non-standard (specifically, https://cs.chromium.org/chromium/src/url/url_util.cc?rcl=1478435172&l=105 ). That is, if it's not a scheme GURL explicitly knows how to parse, it treats it as opaque, and everything as the path.

This would likely explain the divergence here, as well as the rationale. This decision has then cascaded into a number of design decisions throughout Chrome with respect to non-standard schemes it exposes (e.g. chrome-extension://, chrome://, chrome-guest://, etc), which all assume they can safely be extended as non-standard schemes (omitting authority, and following any number of internal structural rules)

As such, it's beyond my ken to know what the extent of complexity or negative affect would be - much of it exists in those consuming our low-level URL parser, but I don't have good knowledge about those or their implementation implications. In order for us to get that desired behaviour, it would effectively mean treating unrecognized schemes (asdf://) as generic/standard schemes, so that they get the same behaviours - but because we don't 'force' internal callers to pre-register their schemes (... something we definitely should do, to prevent the unknown-unknowns like this), I don't know who would break, and I don't personally have the time to explore making the change and seeing what breaks and guiding it. I suspect it'd be non-trivial, but I suspect that the above code really is as simple as changing that logic so that non-standard schemes are explicitly registered as such, in line with RFC 3986.

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/whatwg/url/issues/148#issuecomment-258945464

Received on Monday, 7 November 2016 20:00:28 UTC