Re: resolving the URL mess

On Fri, Oct 3, 2014 at 5:47 PM, Larry Masinter <masinter@adobe.com> wrote:

> I think working on the problem statement is a good idea.
>
> I raised an issue trying to be neutral about who is 'right', just
> referencing (indirectly) the issues and specs.
>
> https://github.com/urispec/urispec/issues/1
>
>
I propose the following problem statement:

Many software applications utilize universal identifiers so that systems
can refer to resources residing in other systems entirely. The URI, while
seeing near-universal adoption, has many subtle inconsistencies in
implementations that threaten the ability for different systems to refer to
each other's resources, creating fragmentation and development of
workarounds.

A mission statement (charter?) would follow:

To promote the convergence of the behavior of universal identifiers across
all applications, by identifying inconsistencies and proposing resolutions,
including in: Databases, Web browsers, other Web user-agents, XML parsers,
a plethora of JSON Hypermedia formats like JSON Schema and Hydra, Semantic
Web applications and file formats like Turtle, protocols like HTTP and
CoAP, databases, compact notations like CURIE, and more.

Part of the problem is we need to be absolutely crystal-clear and
consistent about what terms we use. There's many specs _about_ identifiers,
but they all do different things:

* URI: Authoritatively defined in RFC3986, being a 7-bit ASCII string
consisting of a scheme, colon, then hier-part, then optional query and
optional fragment. This is by far the most cited and implemented
specification for network-addressable identifiers, and possibly one of the
most cited RFCs period (if not RFC2119).

* IRI: Defined in RFC3987 as a generalization of the URI, how to generalize
the URI with Unicode.

* URL: The URI was created as a generalization of the URL, though as of
RFC3986 it's defined in terms of the URI, as the subset of the URIs that
are network-addressable (i.e. "//"). Most times people want a URI, they
really want a URL, and more specifically, an HTTP URL.

* URN: Likewise defined in terms of the URI, as a subset consisting only of
URIs that are not network addressable.

Because an IRI, URI, URL, and URN all contain a scheme, they are called
"absolute".

Some standards defined their own set of strings largely compatible with
URIs, mostly for technical reasons. For example, RDF 1.0 for example
defined "RDF URI References" due to predating RFC3986 (so named despite
being absolute). RDF 1.1 now formally uses IRIs.

There's also the class of strings called URI References, or URIRefs for
short. They are resolved against an absolute URI, and thus said to be
"relative". The same term exists for IRIs. It tends to be called a URI
Reference even if the class in question is a IRI or URL, though this
doesn't produce any ambiguity to my knowledge.

If we need to talk about how Web browsers implement URIs (or implement it
_differently_), I propose the term Web Browser Address. I might adopt the
acronym WBA.

For the concept of the URI, URL, IRI, etc, where the meaning of the string
is uniform across time and space (as opposed to document-local ids), I will
simply use the term "identifier" or "universal identifier".

I would propose the following deliverables:

(1) Can we formally adopt this terminology? (An adoption does not mean we
are re-defining the term and further adding to confusion, but we'd be
saying "That over there, found in pre-existing normative text, is the
authoritative definition we refer to.") Any corrections or additions? Would
I be correct in saying this is value-free?

(2) What, exactly, are the incompatibilities between implementations? Why
do Web browsers have a different spec or implementation *at all*?

... Or at least number (2) (as I proposed earlier).

Austin.

Received on Sunday, 5 October 2014 03:07:35 UTC