- From: Alwin Blok <notifications@github.com>
- Date: Sun, 09 Feb 2025 06:30:12 -0800
- To: whatwg/url <url@noreply.github.com>
- Cc: Subscribed <subscribed@noreply.github.com>
- Message-ID: <whatwg/url/issues/531/2646328707@github.com>
I am no fan of having a separate class for *strictly* relative URLs. You end up trying to write a heavily type overloaded resolve-like operation and try to trace how various relative and absolute URLs end up producing either absolute or strictly relative URLs again. Note that it is not even entirely clear how one would formally define a relative URL. Case in point being that `http:foo` is relative in the sense that it takes the host from an http base URL if one is provided, But is ‘parsed’ as `http://foo/` by browsers if it’s used stand-alone. So what I do is that I use one class (both API wise and for the theory) that encompasses both relative and absolute URLs. In fact this class of things is very well known and has really few issues, it is known as a (generic) URI reference. But one of the goals that this standard mentions is to abandon the distinction between URI and URL since it is (admittedly) confusing, and so the entire algorithm has stopped making the distinction between a generic URI (which in fact must have a scheme, but is allowed to not have an authority as per RFC3986) an URI reference (which may or may not have a scheme, and may or may not have an authority, a.o.) and an URL. So again, note that `http:foo` is a **valid** generic URI. But it is *not* an http URL according to the http spec, nor is it considered valid by the WHATWG. To coerce it to an URL web browsers ‘parse it as’ `http://foo` and do this annoying branching on the presence of a base URL in the parser to see if it should pass through the authority parsing states or not. But you could just parse it as a generic URI reference; which means `foo` will be its pathname. Then resolve it agains a base URL if one is provided, using ‘non-strict’ transformation of references for the http schemes and alike, and the strict version for other schemes. All of that is described in RFC3986 in the section on ‘transforming references’ and it’s possible to make only very slight amendments to make it work in a browser compatible way. Most importantly of all, by only applying the coercion to an absolute http URL (that must have a host) as a separate final step. And indeed this approach has been pointed out lucidly already in 2012 when the first version of the WHATWG URL standard was introduced and heavily discussed on an IETF mailing list. But the solution was discarded, because it “was not understood”. In any case, API wise my solution is to introduce an URIReference class. (I considered calling it URLReference in line with the WHATWG trying to retire the URI name. But it might create more confusion than it solves. Personally I also find the ‘reference’ part confusing, but I’ve not been able to come up with any better name). It has two methods for combining two URIReferences. Rebase, and Resolve. Rebase allows producing schemeless and otherwise relative URIReferences, including the `http:foo` example (which does not have a host, but does have a relative path being `foo`). Resolve in addition does the coercive conversion to an absolute URL (which in the theory is a proper subclass of both URI and URI reference). In the case of http, this tries to convert the first path component to an authority if it otherwise would not have one, and it also ensures that the host is a valid domain name (it could have been an opaque host value). The whole point of all this, is that URLs just have more constraints on them, specifically on the presence of the authoroty and also on the type of the authority (eg. opaque versus domain) and the presence/ absence of user info snd ports (not allowed on file URLs). These additional constraints make it difficult to work with them directly; they’re too constrained. You’d want to loosen some of these constraints whilst manipulating and combining them. URIReferences are both more general than URLs and less constrained; and therefore are much easier to work with. If you’re into the kind of design pattern thing, it functions a bit as an URL ‘builder’ class. -- Reply to this email directly or view it on GitHub: https://github.com/whatwg/url/issues/531#issuecomment-2646328707 You are receiving this because you are subscribed to this thread. Message ID: <whatwg/url/issues/531/2646328707@github.com>
Received on Sunday, 9 February 2025 14:30:16 UTC