- From: Booth, David (HP Software - Boston) <dbooth@hp.com>
- Date: Mon, 12 Dec 2005 04:10:39 -0500
- To: "David Wood" <dwood@softwarememetics.com>, <public-swbp-wg@w3.org>
SYNOPSIS How should http URIs be minted? The TAG's httpRange-14 decision[6] implies that two styles of http URIs could be used as identifiers: URIs that use fragment identifiers ("hash URIs"), and URIs that provide 303-redirects to other URIs ("303-redirect URIs"). What are the implications, pros and cons of these approaches? Because the meaning of a hash URI is determined by the HTTP media type returned when the URI is dereferenced[3], hash URIs should only be used in cases where the thing to be identified actually is an information resource or in cases where the returned media type will always be RDF (or RDF-like media types). 303-redirect URIs are more general than hash URIs: they can be used to identify any kind of resource, independent of the returned HTTP media type. This generality may come at a cost of performance, administrative burden and hosting barriers, however these negatives can be eliminated using the t-d-b.org URI minting strategy that I previously proposed[5], provided you are willing to accept somewhat longer URIs. EXPLANATION Assumptions: 0. We are discussing the use of http URIs as identifiers in RDF. 1. We wish to conform to the TAG's httpRange-14 decision[6] 2. We wish to conform to the TAG's Web Architecture[7] 3. It is helpful if a URI dereferences (perhaps indirectly) to useful information about that URI. Because of a combination of the TAG's httpRange-14 decision[6] and the TAG's existing Web Architecture specification, hash URIs have a more limited range of application than 303-redirect URIs. Their applicability depends both on: - what kind of thing needs to be identified (information resources versus anything); and - what HTTP media type(s) may be used to return descriptive information when the URI is dereferenced (RDF versus any media type). If the returned HTTP media type is restricted to RDF, then a hash URI can be used to identify anything. However, if the returned HTTP media type is unrestricted (e.g., if it might be HTML), then hash URIs should only be used to identify information resources. Why does the TAG's httpRange-14 decision result in these limitations for hash URIs? Consider what would happen if a hash URI such as http://dbooth.org/2005/dbooth/#David_Booth_the_Person were used to identify a non-information resource, such as the author of this message (a particular person). The RDF Concepts spec[1] says that if this URI reference appears in the context of RDF, and dereferencing the part without the fragment identifier ( http://dbooth.org/2005/dbooth/ ) yields RDF/XML, then #David_Booth_the_Person identifies whatever the resulting RDF/XML says it identifies; but if the returned media type is not RDF/XML, then what it identifies may be indeterminate. So if the returned media type actually is RDF/XML, then there is no problem: the returned RDF/XML can say that http://dbooth.org/2005/dbooth/#David_Booth_the_Person identifies a person. But if the returned media type is HTML (for example, if the author has not yet figured out a way to describe the meaning of that URI in RDF, and decides for the moment to describe it in English prose), then the TAG's httpRange-14 decision says that http://dbooth.org/2005/dbooth/ should identify an information resource, and the text/html media type[2] says that the fragment identifier #David_Booth_the_Person identifies an HTML element within the docment. Thus, if HTML is served today from that URI, then a machine dereferencing the URI will conclude that the URI identifies an HTML element from an information resource, whereas if tomorrow the server is changed to serving RDF, then a machine dereferencing the URI will conclude that the URI identifies a person. This is not good, because we would like the URI to identify the same thing all the time. A key concept behind URIs is that they are supposed to be universal identifiers -- having the same meaning across all languages and document types. Thus, hash URIs do not seem to meet this need. 303-redirect URIs are more general purpose than hash URIs: they can be used to identify any kind of resource, independent of the returned HTTP media type. If used naively, negatives of 303-redirect URIs include: - Performance: Dereferencing the URI to obtain descriptive information requires an extra network access (first to the original URI and then to the redirect target); - Adminstrative overhead: It requires two URIs to be maintained in sync (the original URI and the redirect target); and - Hosting barriers: 303 redirects require server configuration that many authors do not know how to configure or are not permitted to configure. These negatives can be eliminated (at the cost of possibly longer URIs) if the URI minting strategy that I proposed[5] is widely adopted. Personally, based on the analysis above, I conclude that http URIs should generally be minted as 303-redirect URIs -- not hash URIs -- because 303-redirect URIs can be used to identify anything independent of the returned media type. I suggest that the SWBP working group at least endorse the analysis and pros/cons described above. References 1. RDF Concepts section on Fragment Identifiers: http://www.w3.org/TR/2004/REC-rdf-concepts-20040210/#section-fragID 2. text/html media type: http://www.ietf.org/rfc/rfc2854.txt 3. URI spec: http://www.isi.edu/in-notes/rfc3986.txt 4. David Wood's message on httpRange-14 resolution: http://www.w3.org/2002/02/mid/6F9F6968-CB73-427A-8682-AF9AB0F2E9C2@softw arememetics.com;list=public-swbp-wg 5. David Booth's proposed URI minting strategy: http://www.w3.org/2002/02/mid/A5EEF5A4F0F0FD4DBA33093A0B07559008911A37@t ayexc18.americas.cpqcorp.net;list=public-swbp-wg 6. TAG's httpRange-14 decision: http://lists.w3.org/Archives/Public/www-tag/2005Jun/0039.html 7. TAG's Web Architecture: http://www.w3.org/TR/webarch/ 8. thing-described-by.org: http://thing-described-by.org/ David Booth
Received on Monday, 12 December 2005 09:11:25 UTC