Re: [whatwg/url] Support relative URLs (#531)

Passing a fallback protocol to the parser to select certain behaviour for the otherwise ambiguous, scheme-less URLs does work, and this is what I have done so far. 

But having to pass options around, becomes cumbersome and I can see how that would cause confusing problems. So I’ve taken on the challenge to structure things in a way that avoids that, as much as possible. And there are some interesting things to note about this. 

What works well for most issues, is to loosen the constraints on URLs somewhat whilst modifying or combining them, and to enforce them later by calling a separate method to convert the (possibly) relative URL to an absolute/resolved URL, something that was suggested to me by @zamfofex. 

An example: The URL standard requires that special-scheme URLs have an authority with a host that is either an IP address or a valid domain. If the domain is not valid, then the parser fails. Now, first of all, `http:foo` is actually a usable relative URL. It is host-relative; if it is resolved, then it will take the host from the base URL if that is an http URL too. (RFC 3986 calls this non-strict resolution). Thus, a relative http URL need not have a host, and if you do enforce that too soon, then you cannot express reference resolution in a way that matches the standard. Ok. Second, if the API allows modifying eg. the scheme, then it is possible to create an http URL with an opaque host that maybe cannot be parsed as a domain  (it could have encoded forbidden-domain-codepoints, for example). Here, rather than throwing an error right away, it is again useful to allow (for relative/ non-resolved) http URLs to temporarily have an opaque host. The host might be changed later in code, or it may be possible to parse it as a domain just in time before resolving it.

The IETF standards have made this distinction between such more- and less constrained URLs before. The name for such more tolerant and/or relative URL is an URIReference, or an IRIReference. The WHATWG equivalent to that would be slightly more tolerant still, as it would allow a few more codepoints, and invalid-percent-escape sequences in various components so as to remain consistent with WHATWG URLs. 


-- 
Reply to this email directly or view it on GitHub:
https://github.com/whatwg/url/issues/531#issuecomment-1030678672

You are receiving this because you are subscribed to this thread.

Message ID: <whatwg/url/issues/531/1030678672@github.com>

Received on Saturday, 5 February 2022 18:47:27 UTC