RE: resolving the URL mess

> Many software applications utilize universal identifiers so that systems can refer to resources residing in other systems entirely.

This kind of introduction is just confusing. We're not talking about identifiers in general, just URI/URL.  And frankly I think we should include the political/organizational power struggle which seems to fuel much of the angst that gets in the way of a technical solution.


> The URI, while seeing near-universal adoption, has many subtle inconsistencies

This phrase almost contradicts itself and begs the question.   Is it that 3986 and 3987 are unclear, imprecise, incomplete, or is it that implementations didn't pay attention 

> in implementations that threaten the ability for different systems to refer to each other's resources, creating fragmentation and development of workarounds.

Are we trying to solve an implementation compatibility problem, or just a specification compatibility problem? Or a situation where implementations don't agree, but that for the most part the differences are inconsequential?


> A mission statement (charter?) would follow:

> To promote the convergence of the behavior of universal identifiers across all applications,

I think this is way too ambitiously scoped. We're not interested in "all applications" in the universe, just ones that use URL, as you start to enumerate, but of course not exhaustively. And not "the behavior of universal identifiers ..."  for three reasons.

* it is scoped not to _all_ universal identifiers but just these
* software has behavior, a URL doesn't 'behave'. The hope is to specify some
   kinds of behavior using URLs: parsing, translating, comparing, relative resolution,
     
   Other behavior is specified elsewhere (like 'Fetch').
* It's too ambitious: getting implementations to converge isn't something a spec can do.
   I think there are two tasks that are feasible
   a) document current widely deployed behavior as it is, in sufficient detail
     that liberal implementations can know how other software will operate to the
     extent that differences matter
   b) recommend future best practices for URL creators to improve interoperability
 

>  by identifying inconsistencies and proposing resolutions, including in:
> Databases, Web browsers, other Web user-agents, XML parsers,
> a plethora of JSON Hypermedia formats like JSON Schema and Hydra, 
> Semantic Web applications and file formats like Turtle, protocols like
> HTTP and CoAP, databases, compact notations like CURIE, and more.

This kind of list can't be exhaustive. And putting it into the charter, I think should be done more carefully than "and more", make it clear this list just shows how wide deployment of non-web URLs is.

> Part of the problem is we need to be absolutely crystal-clear and consistent about what terms we use. 

I’m afraid we have no control over how terms are used in the world, where everyone knows what a URL is. So "absolutely crystal-clear" is way beyond us. I'm just hoping for improved clarity in WHATWG and W3C documents.

> There's many specs _about_ identifiers, but they all do different things:

You give your list, I think mine in my blog post is more complete. (I left out the work on fragment identifiers). 

> * URI: Authoritatively defined in RFC3986, ...
> * IRI: Defined in RFC3987 as a....
> * URL: The URI was created as a generalization of the URL....
> * URN: Likewise defined in terms of the URI, ...


Getting consensus on the history and characterizations of these protocol elements is very hard. I'm not sure it's possible, or necessary. I *am* sure trying to put one history and overview in the charter is a non-starter.

> Because an IRI, URI, URL, and URN all contain a scheme, they are called "absolute".

Wha? Total non-sequitur and not really accurate anyway.

> Some standards defined their own set of strings largely compatible with URIs, mostly for technical reasons. For example, RDF 1.0 for example defined "RDF URI References" due to predating RFC3986 (so named despite being absolute). RDF 1.1 now formally uses IRIs.

I don't think this is true, actually. Doesn't it use LEIRIs? (The XML "Legacy Extended IRI" ?).


> There's also the class of strings called URI References...

This story is confusing.

> If we need to talk about how Web browsers implement URIs (or implement it _differently_), I propose the term Web Browser Address. I might adopt the acronym WBA.

Oh please, not another term! How does this help?

> For the concept of the URI, URL, IRI, etc, where the meaning of the string
> is uniform across time and space (as opposed to document-local ids), 
The "meaning" of "http://example.com/blah" is as uniform as it can be
across all time and space, in that it's just a little bit of syntax to join together
"http" and "example.com" and "/blah". 

> I will simply use the term "identifier" or "universal identifier".

For what?

> I would propose the following deliverables:

> (1) Can we formally adopt this terminology? ...

No. Not formally or informally.


> (2) What, exactly, are the incompatibilities between implementations? 

"exactly" is impossible. And specifying exactly may not be worthwhile. And which implementations count?  What incompatibilities result in interoperability problems.

> Why do Web browsers have a different spec or implementation *at all*?

"Why" is a risky question, and also not really worth pursuing. Likely suspects are NIH, laziness, artifact of "browser wars" or the current "web standards king-of-the-mountain", overly helpful or shortcut engineering "race to the bottom" ....

But while that might be fun to talk about, it's mainly irrelevant. Just focus on current state and providing a path forward.


Larry
--
http://larry.masinter.net

Received on Monday, 6 October 2014 22:49:22 UTC