Minting http URIs

SYNOPSIS
How should http URIs be minted?  The TAG's httpRange-14 decision[6]
implies that two styles of http URIs could be used as identifiers: URIs
that use fragment identifiers ("hash URIs"), and URIs that provide
303-redirects to other URIs ("303-redirect URIs").  What are the
implications, pros and cons of these approaches?

Because the meaning of a hash URI is determined by the HTTP media type
returned when the URI is dereferenced[3], hash URIs should only be used
in cases where the thing to be identified actually is an information
resource or in cases where the returned media type will always be RDF
(or RDF-like media types).  303-redirect URIs are more general than hash
URIs: they can be used to identify any kind of resource, independent of
the returned HTTP media type.  This generality may come at a cost of
performance, administrative burden and hosting barriers, however these
negatives can be eliminated using the t-d-b.org URI minting strategy
that I previously proposed[5], provided you are willing to accept
somewhat longer URIs.

EXPLANATION

Assumptions:  
0. We are discussing the use of http URIs as identifiers in RDF.
1. We wish to conform to the TAG's httpRange-14 decision[6]
2. We wish to conform to the TAG's Web Architecture[7]
3. It is helpful if a URI dereferences (perhaps indirectly) to useful
information about that URI.

Because of a combination of the TAG's httpRange-14 decision[6] and the
TAG's existing Web Architecture specification, hash URIs have a more
limited range of application than 303-redirect URIs.  Their
applicability depends both on:

	- what kind of thing needs to be identified (information 
	  resources versus anything); and 

	- what HTTP media type(s) may be used to return descriptive 
	  information when the URI is dereferenced (RDF versus any 
	  media type).

If the returned HTTP media type is restricted to RDF, then a hash URI
can be used to identify anything.  However, if the returned HTTP media
type is unrestricted (e.g., if it might be HTML), then hash URIs should
only be used to identify information resources.

Why does the TAG's httpRange-14 decision result in these limitations for
hash URIs?  Consider what would happen if a hash URI such as
http://dbooth.org/2005/dbooth/#David_Booth_the_Person were used to
identify a non-information resource, such as the author of this message
(a particular person).  

The RDF Concepts spec[1] says that if this URI reference appears in the
context of RDF, and dereferencing the part without the fragment
identifier ( http://dbooth.org/2005/dbooth/ ) yields RDF/XML, then
#David_Booth_the_Person identifies whatever the resulting RDF/XML says
it identifies; but if the returned media type is not RDF/XML, then what
it identifies may be indeterminate.  So if the returned media type
actually is RDF/XML, then there is no problem: the returned RDF/XML can
say that http://dbooth.org/2005/dbooth/#David_Booth_the_Person
identifies a person.  

But if the returned media type is HTML (for example, if the author has
not yet figured out a way to describe the meaning of that URI in RDF,
and decides for the moment to describe it in English prose), then the
TAG's httpRange-14 decision says that http://dbooth.org/2005/dbooth/
should identify an information resource, and the text/html media type[2]
says that the fragment identifier #David_Booth_the_Person identifies an
HTML element within the docment.  Thus, if HTML is served today from
that URI, then a machine dereferencing the URI will conclude that the
URI identifies an HTML element from an information resource, whereas if
tomorrow the server is changed to serving RDF, then a machine
dereferencing the URI will conclude that the URI identifies a person.
This is not good, because we would like the URI to identify the same
thing all the time.  A key concept behind URIs is that they are supposed
to be universal identifiers -- having the same meaning across all
languages and document types.  Thus, hash URIs do not seem to meet this
need.

303-redirect URIs are more general purpose than hash URIs: they can be
used to identify any kind of resource, independent of the returned HTTP
media type.  If used naively, negatives of 303-redirect URIs include:

	- Performance: Dereferencing the URI to obtain descriptive 
	  information requires an extra network access (first to 
	  the original URI and then to the redirect target);

	- Adminstrative overhead: It requires two URIs to be maintained 
	  in sync (the original URI and the redirect target); and

	- Hosting barriers: 303 redirects require server configuration 
	  that many authors do not know how to configure or are not
	  permitted to configure.

These negatives can be eliminated (at the cost of possibly longer URIs)
if the URI minting strategy that I proposed[5] is widely adopted.  

Personally, based on the analysis above, I conclude that http URIs
should generally be minted as 303-redirect URIs -- not hash URIs --
because 303-redirect URIs can be used to identify anything independent
of the returned media type.

I suggest that the SWBP working group at least endorse the analysis and
pros/cons described above.  

References
1. RDF Concepts section on Fragment Identifiers:
http://www.w3.org/TR/2004/REC-rdf-concepts-20040210/#section-fragID
2. text/html media type:
http://www.ietf.org/rfc/rfc2854.txt
3. URI spec:
http://www.isi.edu/in-notes/rfc3986.txt
4. David Wood's message on httpRange-14 resolution:
http://www.w3.org/2002/02/mid/6F9F6968-CB73-427A-8682-AF9AB0F2E9C2@softw
arememetics.com;list=public-swbp-wg
5. David Booth's proposed URI minting strategy:
http://www.w3.org/2002/02/mid/A5EEF5A4F0F0FD4DBA33093A0B07559008911A37@t
ayexc18.americas.cpqcorp.net;list=public-swbp-wg
6. TAG's httpRange-14 decision:
http://lists.w3.org/Archives/Public/www-tag/2005Jun/0039.html
7. TAG's Web Architecture:
http://www.w3.org/TR/webarch/
8. thing-described-by.org:
http://thing-described-by.org/

David Booth

Received on Monday, 12 December 2005 09:11:25 UTC