on the names URI, URL, URN, LEIRI from Larry Masinter on 2014-12-09 (public-ietf-w3c@w3.org from December 2014)

From: Larry Masinter <masinter@adobe.com>
Date: Tue, 9 Dec 2014 16:59:18 +0000
To: "julian.reschke@gmx.de" <julian.reschke@gmx.de>, Bjoern Hoehrmann <derhoermi@gmx.net>, Sam Ruby <rubys@intertwingly.net>
CC: "uri@w3.org" <uri@w3.org>
Message-ID: <DM2PR0201MB0960DEDB685D66C000E17F1FC3650@DM2PR0201MB0960.namprd02.prod.outlook.>

In response to discussion on public-ietf-w3c:


> >         * Standardize on the term URL. URI and IRI are just
> >           confusing. In practice a single algorithm is used for both
> >           so keeping them distinct is not helping anyone. URL also
> >           easily wins the [36]search result popularity contest.
>  > ...
> 
> This ignores the fact that RFC 3986 defines URI as the superset of URNs
> and URLs. (And yes, some schemes can be both).

> I understand that the browser people are not very interested in URNs,
> but many people in the IETF are. Pretending that they do not exist and
> that it makes sense to call them URLs will IMHO not work very well.

In computer science theory, the role of "identifier" can be played
by almost any string or data structure which is communicated
to "stand for" something else; the role of "identifier" can further
be described as a "location" or a "name" depending on how
much the "identifier" corresponds to information useful in
computing the location or access method for whatever  is being
identified. But these are not precisely defined roles.

In the history of the web, the terms URL, URI, URN, IRI, and
various other even more obscure have been variously used
for different constructs, to capture some distinctions:

URI = URL + URN:  that is, to split the space of identifiers
between those that are defined by a 'namespace authority'
and thus not a locator, and the rest. Even informally, it
is acknowledged that the distinction is fuzzy.

IRI vs URI: to separate those that are restricted to 
sequences of a limited subset of characters (not
even all of ASCII) and those that are not, with some
ambiguity of which repertoire is or isn't allowed
(E.g., spaces), giving to odd constructs like LEIRI
(Legacy Extended IRI being IRIs in which spaces
are allowed).

Relative vs. Absolute: we variously include or exclude
relative forms, to be combined with a "Base".

These distinctions, while well-intentioned, have
also been confusing.  What is the name for
Identifiers that start with "urn:" but contain
non-ASCII characters? If a URN is a URI, then
are we also defining IRNs (Internationalized
Resource Names)? 

The question is: what name should we use for
what this document defines, and which other
constructs should also be defined in this document,
vs. leaving alone the current definitions.

The documents Sam points us to currently
define "URL" as the superset, and includes 
as a goal to remove RFC 3986 (defining URI) and
RFC 3987 (defining IRI), but it doesn't yet
include a sufficient new definition of those
other terms (URI, IRI) even though they
are still in use. 

 I'm OK with using "URL" as the most liberal
noun, and introducing qualifiers as adjectives.

I’m OK with defining "URN" as "a kind of
URL that starts with 'urn:'", and explaining
how they're not currently generally useful
as locators, although many have had ambition
to make them so. That is, we're not claiming
they don't exist, but we are claiming that
it can make sense to say a URN is a kind of URL,
just because that's how we define it.

Larry
--
http://larry.masinter.net

Received on Tuesday, 9 December 2014 17:00:09 UTC