Re: URL, URI and the w3c

> On Jun 14, 2022, at 2:35 PM, Daniel Stenberg <daniel@haxx.se> wrote:
> 
> On Tue, 14 Jun 2022, Roberto Polli wrote:
> 
>> I am curious then what a specification like OAuth-Somethin which relies on both browsers and generic user agents should adopt...
> 
> There are as many answers to that as there are URL/URI parser authors (= many). URL interop is very poor these days. I say this as someone who works with this challenge on a daily basis.
> 
> The current state of URLs and URIs cannot be described as anything less than a horrible mess, a security nightmare [*] and an infected area that lots of persons will not go near due to the past experiences and personal conflicts.
> 
> [*] = https://daniel.haxx.se/blog/2022/01/10/dont-mix-url-parsers/

It would help immensely if people would just use a common terminology
of references and addresses.

What URI (RFC3986) defines is a standard naming format that uses
hierarchical name delegation to cover the entire Internet with identifiers.
URL is just another name for URI. It's uniform and restricted to what
will interoperate, like the postal code addresses, and like postal regulations
this doesn't prevent people from using non-standard references that
can be reinterpreted into some standard form.

URI has one appendix that defines how to parse any reference into the
common components (even when the characters are not allowed in the
standard URI grammar). That's what most implementations interop upon.

The WHATWG url spec defines a set of rules for interpreting references
and placing them in a url data structure within browser memory. url != URL.
href != URL. The spec says that this is somehow replacing URI, but it isn't
even defining the same thing. The algorithms are designed to support
1997-era browser compatibility (even when there is no desire for that).

The WHATWG spec uses the same name for (last I checked) five different
concepts with five different sets of rules associated with them, each of
which are very important for browser consistency. The specification is
owned and controlled by four corporations, but isn't fully implemented
by any of them. It aspires to be implemented.

These specs could easily exist in harmony if WHATWG would stop
insisting it is defining the URL standard and instead define an HTML
href standard that makes sense for HTML processors. The deviations
are mostly due to the variances in constructing/interpreting references
that are generated via forms, javascript, etc. Well, that and the i18n
hostname processing that keeps changing in weird ways. None of
that impacts the interoperability of URIs.

There are other specs that have been implemented (XML, IRN, URN, etc.)
that also exist, with their own fundamental flaws, that have tried to fix
the perceived limitations of URI by changing the identifiers, but not
actually recognizing that references >> identifiers. Anyway, they exist,
the world has changed several times over, and the URI spec still only
deals with the interoperable standard address, not how to make every
possible reference into a valid URI.

I could write a spec that formally defines Hypertext References
(and whatever other updates are needed for 3986) for the Internet.
Actually, I started to do that and stopped to revise HTTP instead.
It's not an easy thing to do, mostly because everyone wants a little
thing done and there's a lot of people wanting (for a very long time).
The hard part is resisting temptation.

I would not do so at the WHATWG. This is not because I am somehow
antagonistic to WHATWG folks; it's because I find it unethical to take
a public standard and place it under the ownership of four companies
that have shown no interest in supporting the needs of the entire
Web/Internet, even if I like those companies, use their products, and
enjoy working with their employees to make the IETF standards better.

This is not my fault, not a fault in the IETF process, nor any attempt
to "go political" on WHATWG: it's a fact of "https://whatwg.org/policies"
which was entirely and artfully created by those companies even
though they were (and are) fully capable of participating in nonprofits
like the rest of us. Power corrupts. I am happy to work with those same
people within the IETF framework, where they can participate as
individuals and not have veto power over the resulting spec.

....Roy

Received on Wednesday, 15 June 2022 17:58:20 UTC