RE: Best practices using URIs from Hammond, Tony (ELSLON) on 2003-11-13 (www-rdf-dspace@w3.org from November 2003)

From: Hammond, Tony (ELSLON) <T.Hammond@elsevier.com>
Date: Thu, 13 Nov 2003 15:01:55 -0000
To: 'Stefano Mazzocchi' <stefano@apache.org>, Nick Matsakis <matsakis@mit.edu>
Cc: SIMILE public list <www-rdf-dspace@w3.org>
Message-ID: <54A600C436EA694581B93E4BD4D4788A06B73D80@elslonexc004.eslo.co.uk>
Hi Stefano:

I hope I can add some insights into our thinking here. IMO the terms URL and
URN are not very conducive to a good understanding of URI. Rather, the
difference between an HTTP URI say and a URN URI is that the former has a
location-dependent resolution mechanism (i.e. a location is expressed in
terms of a network authority component), while the latter has a
location-independent resolution mechanism. A URN URI resolution mechanism is
/not/ application dependent, as you assert,  but is (or should be) specified
in the URN NID template. For example, RFC 2648 specifies a URN NID for IETF
documents and proposes a resolution mechanism via HTTP using a set of Perl
scripts annexed in that document. So, both location-dependent and
location-independent URIs are fundamentally premised on resolution at the
URI scheme level. At the instance level, of course, (or even at the URN NID
level) URIs may or may not be resolvable, but that is another matter.

The "info" URI scheme as a class of URIs specifically eschews resolution. Or
to quote from the FAQ that we are currently compiling:

===== "info" FAQ =====

	"info" is focused exclusively on supporting identity alone. As such,
this 

		* dramatically simplifies "info" resolution behaviours, and
consequently the operational expectations of an "info" URI 

		* avoids overhead in new "info" namespace registrations by
rejecting any notion of resolution mechanisms 

		* avoids overhead in the management of a resolution
infrastructure and maintenance of resolution targets for "info" URIs 

		* requires that a Namespace Authority make an independent
URI or URN NID application if services or other functionalities are required
(e.g. authority metadata) 

		* does not interfere with the mantra "many URI schemes
considered harmful" as this is primarily concerned with the need for
applications to support resolution mechanisms 

	Note that this stance on non-dereference is in contrast to other URI
schemes which are generally dereferenceable, even if not all examples within
a particular scheme are actually dereferenceable. This holds true for both
location-dependent URIs (e.g. HTTP URIs) as well as for location-independent
URIs (e.g. URN URIs). 

===== "info" FAQ =====

The rationale behind the "info" URI scheme is to support legacy (or
otherwise non-URI) public identification systems within a URI naming
environment so that Web description technologies like RDF or the OpenURL
Framework, can make use of identifiers from these naming systems as globally
unique references. The intent is to provide a lightweight, early URI
registration mechanism which will not compromise any possible future URI
registrations by the namespace authorities concerned. For example, DOI could
be registered now under "info" to allow this example URI to be used

	info:doi/10.1234/567

In the meantime the IDF is also proceeding with an independent URI
application and at some future point in time would expect to be able to get
the "doi" namespace entered into the IANA registry, so that the same
resource could also be referred to as 

	doi:10.1234/567

Note however that the former URI confers identity only while the latter is
capable of supporting resolution behaviours, e.g. one could imagine URIs of
the form

	doi:10.1234/567?service=pdf

This is just a clumsy example to show how a "doi" URI scheme could be used
for addressing into a set of resource representations.

Tony


-----Original Message-----
From: Stefano Mazzocchi [mailto:stefano@apache.org]
Sent: 13 November 2003 13:33
To: Nick Matsakis
Cc: Hammond, Tony (ELSLON); SIMILE public list
Subject: Re: Best practices using URIs



On 11 Nov 2003, at 19:45, Nick Matsakis wrote:

> On Tue, 11 Nov 2003, Hammond, Tony (ELSLON) wrote:
>
>> Not sure how appropriate (or relevant) this is to SIMILE, but thought 
>> I
>> would just mention the "info" URI scheme
>
> This is a great idea, as it allows unconnected people to give the same
> URIs to items, provided the items have previously been given a unique
> identifier such as a Dewey Decimal code or Library of Congress number.
> However, the URL given for NISO, http://info-uri.niso.org failes to
> resolve.  Is there a comprehensive list of proposed info-namespaces 
> about?

how would this differ from urn: ? or DOI, for that matter? DOI is 
already a namespaced URN. What's the point of having yet another 
namespace on top?

it seems to me that a URI like

  info:doi:isbn/0465026567

would be completely equivalent a URI like

  urn:isbn:0465026567

for any use.

Note, however, that there is a general tendency in the XML world to 
stay away from URIs that are no "potentially dereferencable". I made 
the mistake of creating my URI scheme in the past and, as TBL 
suggested, URN are poor substitutes for dereferencable URI because any 
lookup and discovery mechanism would be a poor mimic of HTTP anyway.

Keep in mind that the difference between

  urn:isbn:0465026567

  http://www.iso.org/ISBN/0465026567

even if treated as URI, is that the second *could* be used as a URL to 
lookup and discovery information on that particular resource, while the 
first does *NOT* include a methodology to do the above and it's left as 
application dependent.

It is true that the use of dereferencable URIs is generally harder 
because it requires two contracts: one with the URI and one with the 
potential addressing space of your web domain, and these must be kept 
in synch, while URN allow a completely separate management of the 
identification and the discovery.... but leave the discovery method 
unspecified.

One could be tempted to say that using HTTP as a discovery mechanism 
could be poor, as it forces one distribution and this looks like a 
potential bottleneck or single point of failure, but it should be noted 
that HTTP transparently leverages the TCP/IP decoupling from address 
identifier and actual IP address thru DNS.

As www.google.com rather eminently shows (or akamai, for that matter), 
it is possible to keep a single address space but distribute its 
implementation massively and in a completely transparent fashion. 
Looking it from this angle, it is clear why URN are poor sustitutes for 
http: based URIs.

In case the management of the URL address space cannot be monitored 
directly by the group that issues the URI, it is possible to follow the 
pattern used by the folks at Dublin Core and use PURL 
(http://purl.org/) which allows URL to be persistent and be redirected 
later without forcing contracts to brake.

for example, asking for http://purl.org/dc/elements/1.1/ (the DC 
namespace) currently redirects to 
http://dublincore.org/2003/03/24/dces# which discovers an RDFSchema for 
DC, but since the practice of what should be on the other side of a 
namespace is not yet reccommended, they could, in the future, redirect 
to another document without breaking any contract.

I personally tend to be against PURLs and in favor of a better 
management of private URL address spaces, but I understand that there 
are cases where political issues tend to get in the way of purely 
technical design (as I think it was the case before the DC group 
acquired their dublincore.org domain)

HTH

--
Stefano.
Received on Thursday, 13 November 2003 10:06:08 UTC