Re: [whatwg/url] How should "everything after the scheme" URLs work? (#385)

Maybe it makes sense to define a few “special exceptions” like `http:`, `https:`, etc. being treated uniquely, and the same for `javascript:`, `data:`, etc., but then also allow URLs to somehow specify that they want to take that mode of parsing explicitly, e.g. with a prefix, so `web-myscheme:` would work the same as `http[s]:` and `raw-myscheme:` would work the same as `javascript:` and `data:`.

However, maybe it also makes sense to allow for implementations that give value to specific URLs to interpret and parse them specially. I know that `hyper:` URLs (for the Hypercore stuff) actually uses a hash instead of an address for the host. I think currently the hash will be parsed as a domain with the WHATWG spec, but that’s not accurate to what it actually represents (e.g. it can’t have a port, for example).

Of course, that would be awful in a way, because then different implementations would parse the same URL differently, so people couldn’t rely on manipulating URLs working the same way across implementations, which is what this spec is aiming to solve.

Maybe a good approach could be to establish a (limited) set of normalization rules that can be applied to URLs by implementations, enforcing specific normalization rules for certain URLs like `http[s]:`, but allowing the implementations to choose among other normalization rules for their own URLs.

So, for example, the spec could allow implementations to change the port of URLs freely depending on the scheme without requiring it to be fetched and redirected (as long as they do it consistently), then e.g. `http:` would take away the port if it is `80` (enforced by the spec), and `hyper:` URLs would always take away the port in implementations that support it (allowed by the spec).

Some other modifications and normalizations could likewise be done in a similar way, by being required for well‐known URLs, and allowed for other URLs.

The key here, I think, is that the set of normalization rules that can even be applied to URLs is already well known beforehand and is not arbitrary, so it is possible for authors to enjoy a consistent URL handling across implementations.

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/whatwg/url/issues/385#issuecomment-884045305

Received on Wednesday, 21 July 2021 09:37:46 UTC