Re: [whatwg/url] Provide a succinct grammar for valid URL strings (#479)

> Content wise there is one significant thing that I’d change right now, which is to handle opaque paths in the grammar. I’d say that this is the most significant grammatical difference between the RFC and the WHATWG standard. I’m too tired to get into that now though.

To complete my remark above,

To be compatible with the whatwg, it is useful to distinguish two parsing behaviours:

1. URLs that always have a parsed aka. hierarchical path. 
2. URLs that have an hierarchical path if the path starts with `/` and an opaque path otherwise.

I find it useful to complete the picture by adding a third: URLs that always have an opaque path (and no authority). This could be useful for data, javascript and mailto URLs. However,

The whatwg uses 1. for special URLs, ie. URLs with a scheme being http, https, ws, wss, ftp, or file.
It uses 2. for all other URLs.

The other key difference is on the semantic level, in the resolve operation. 

Specifically, the whatwg uses what RFC3986 calls non-strict resolution for special URLs, and strict resolution for other URLs. However, it does not allow resolving references against a base-URL that has an opaque path, unless the reference has a scheme, or consists of only a fragment.

The other key difference is that the whatwg forces special URLs to always have an authority after resolution. You can simulate this behaviour by using non-strict resolution as defined in the RFC first, and then check if the resulting URL has an authority. If it does not, then the first non-empty path segment must be parsed- and converted to the URLs authority. 

These differences are quite natural and really not large if you characterise them in this way.




-- 
Reply to this email directly or view it on GitHub:
https://github.com/whatwg/url/issues/479#issuecomment-1231491543

You are receiving this because you are subscribed to this thread.

Message ID: <whatwg/url/issues/479/1231491543@github.com>

Received on Tuesday, 30 August 2022 10:45:38 UTC