Confusion on httpRange-14 decision from Booth, David (HP Software - Boston) on 2006-02-20 (www-tag@w3.org from February 2006)

From: Booth, David (HP Software - Boston) <dbooth@hp.com>
Date: Mon, 20 Feb 2006 14:24:37 -0500
To: <www-tag@w3.org>
Message-ID: <A5EEF5A4F0F0FD4DBA33093A0B075590097B688B@tayexc18.americas.cpqcorp.net>
 To: The TAG
 From: The Semantic Web Best Practices and Deployment Working Group

SYNOPSIS:
Is it okay to have the same URI identify both a location within an HTML
document and a concept in an ontology?  What is the class of
"information resources"?  Would the TAG wish to define a URI for it?  Is
it owl:disjointWith anything?  What can be concluded if an HTTP response
code is other than 2xx, 4xx or 303?

WHY THIS MESSAGE?
The SWBP working group wishes to note some confusion around the
implications of the httpRange-14 decision[8], in the hope that future
work by the TAG will clear up this confusion.  However, the SWBP WG does
*not* wish to press the TAG for an immediate resolution.  More thought
by appropriate experts is probably needed first, and the WG seems to be
able to work around this confusion at present.

WHAT IS THE CONFUSION?
Consider the URI http://www.w3.org/People/Connolly/#me .  The 
httpRange-14 decision says that if an HTTP GET of
http://www.w3.org/People/Connolly/ returns a 2xx status, then
http://www.w3.org/People/Connolly/ is an "information resource".

The WebArch says that the meaning of the fragment identifier ("#me") is
determined by the media type that is returned.  In the case of HTML, it
identifies a location with the HTML document.  Therefore, according to
the WebArch plus the httpRange-14 decision, 
if the HTTP GET returns 200 OK with Content-Type text/html then
http://www.w3.org/People/Connolly/#me identifies a location within an
HTML document.  

If Dan is also using http://www.w3.org/People/Connolly/#me to identify a
foaf:Person (for example by also serving RDF from
http://www.w3.org/People/Connolly/ , via 200 OK with Content-Type
application/rdf+xml), then the same URI is being used both to identify a
foaf:Person and an "information resource".  
(Note that Dan does not actually do this at present, but the scenario
described is one that people are likely to expect to be able to do.)
Is this okay?  Pat Hayes does not see this as a problem, as he has
eloquently explained[6].  However, the WebArch says that a URI should
only identify one resource, so this behavior would seem inconsistent
with the WebArch.  

On the other hand, the WebArch also says that if multiple media types
are served using content negotiation, then because the meaning of the
fragment identifier could differ for different media types, the URI
owner should ensure that all such meanings are "sufficiently
consistent"[1] (and hence conceptually would still only identify a
single resource).  The WebArch also says that "The representation
provider decides when definitions of fragment identifier semantics are
are sufficiently consistent".

Rhetorical question: Is it reasonable to think that the use of
http://www.w3.org/People/Connolly/#me as a location within a document
would be "sufficiently consistent" with its use as a foaf:Person?  A
document seems very different from a person.  In fact, one could well
imagine the class of 
tag:Location-Within-An-HTML-Document as being owl:disjointWith
foaf:Person.

On the other hand, if it *is* reasonable to use
http://www.w3.org/People/Connolly/#me as both a location within a
document and as a foaf:Person, then is it also reasonable to use
http://example.org/foo as both an "information resource" and a
foaf:Person?  If not, why is a URI with a fragment identifier permitted
to simultaneously identify such different things, while a URI without a
fragment identifier is not?  If URIs are opaque, shouldn't all http URIs
be created equal in their ability to be used as identifiers of things?

This dual use of identifiers becomes yet more explicit when a technology
such as GRDDL [9] is employed.  It seems likely that authors of XHTML
documents that offer GRDDL transforms might be tempted to use XML IDs as
both HTML fragment ids and as the identifiers of other things.

Furthermore, if http://example.org/foo returns a 2xx response, does this
represent an assertion such as (N3):

	<http://example.org/foo> a tag:InformationResource .

If so, what can be concluded if the response code is other than 2xx, 4xx
or 303?  In general, what algorithm should be used to determine the
meaning of a newly discovered URI?  (See David Booth's attempt[7] at
writing down such an algorithm based on his reading of the WebArch and
httpRange-14 decision.)

Again, the SWBP WG is *not* asking for a resolution at the this time,
but merely reporting the confusion that we have noticed.

References
[1] WebArch on fragment identifiers and content negotiation:
http://www.w3.org/TR/webarch/#frag-coneg

[2] TimBL thoughts on RDF in HTML: 
http://www.w3.org/2002/04/htmlrdf

[3] TAG issue RDFinXHTML-35:
http://www.w3.org/2001/tag/issues.html#RDFinXHTML-35

[4] RDF/A Primer: 
http://www.w3.org/2001/sw/BestPractices/HTML/2006-01-24-rdfa-primer

[5] Alistair's issues on URI usage in RDF/A primer:
http://lists.w3.org/Archives/Public/public-swbp-wg/2006Jan/0113.html

[6] Pat Hayes comments: 
http://lists.w3.org/Archives/Public/public-swbp-wg/2006Feb/0011
http://lists.w3.org/Archives/Public/public-swbp-wg/2006Jan/0153
http://lists.w3.org/Archives/Public/public-swbp-wg/2006Jan/0139

[7] David Booth draft algorithm:
http://lists.w3.org/Archives/Public/public-swbp-wg/2006Jan/0116

[8] TAG's httpRange-14 decision: 
http://lists.w3.org/Archives/Public/www-tag/2005Jun/0039.html

[9] GRDDL: http://www.w3.org/TeamSubmission/2005/SUBM-grddl-20050516/

Sincerely,
David Booth (on behalf of the SWBP working group)
Received on Monday, 20 February 2006 19:26:49 UTC