Re: [whatwg/url] [Editorial] Replace the term 'cannot-be-a-base' with hierarchical/non-hierarchical (#634)

> It seems to be very clear in describing what makes them different - they are URLs that cannot be a base. It seems you're looking for a definition that, rather than being based on capability, is based on some other property of the URL. 

The issue is that it does not adequately describe their capabilities. Setting the `hostname` or `username` via the URL object's setters does not appear, to a user, to invoke any behaviour related to base or relative URLs.

But yes, I am looking for a definition which is based on the structure of the URL, rather than being a flag set by the parser under arbitrary conditions. FWIW, the JS model described in the standard does not even expose this flag, so users have no way of predicting whether a given URL might be "cannot be a base" (awkward grammar intentional).

> That approach hasn't really worked out well historically, which is why there's an understandable difficulty trying to see this as clearer.

Can you elaborate? More specifically, how is a user of a URL library supposed to know which operations are allowed if the syntax has no predictable relationship to the capabilities?

> That's not a correct understanding of hierarchal URLs though. Indeed, the normalization of the base URL is an entirely optional part of RFC 3986, and that normalization is where the ".." removal happens (Specifically, https://datatracker.ietf.org/doc/html/rfc3986#section-5.2.4 is an optional step). As 3986 tries to make clear, the path is actually opaque (notwithstanding that optional syntax).

I'm not sure why you keep bringing up 3986. I'm talking about this standard. This is what the WHATWG URL Standard says about a URL's path:

> A URL ’s path is a list of zero or more ASCII strings , usually identifying a location in hierarchical form. It is initially empty.

If you follow the parsing logic, there are 2 kinds of paths:

1. Opaque/non-hierarchical - belonging to cannot-be-a-base URLs. These are parsed in the ["cannot-be-a-base-URL path state"](https://url.spec.whatwg.org/#cannot-be-a-base-url-path-state).
2. Hierarchical - belonging to all non-cannot-be-a-base URLs. These are parsed in the regular ["path state"](https://url.spec.whatwg.org/#path-state).

The latter are always interpreted as hierarchical, and always normalized by this standard (even if the ".." components are percent-escaped!). Truly, the "cannot-be-a-base" flag (and logic which sets it) determines whether the path is interpreted as hierarchical or not. That is what I'm trying to make clearer.

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/whatwg/url/issues/634#issuecomment-908550795

Received on Monday, 30 August 2021 17:44:41 UTC