Re: Uniform access to descriptions from Harry Halpin on 2008-03-20 (www-tag@w3.org from March 2008)

From: Harry Halpin <hhalpin@ibiblio.org>
Date: Thu, 20 Mar 2008 16:26:47 -0400
To: "Williams, Stuart (HP Labs, Bristol)" <skw@hp.com>
Cc: Jonathan Rees <jar@creativecommons.org>, "www-tag@w3.org WG" <www-tag@w3.org>
Message-ID: <47E2C887.9020308@ibiblio.org>
Williams, Stuart (HP Labs, Bristol) wrote:
> Hello Harry,
>
>   
>> We also should make sure any solution
>> is *easy* to deploy over various levels and makes it perfectly clear
>> what's going on (somewhat unlike 303, which is rather hard to
>> deploy and minimalist).
>>     
>
> 303 is straight-forward and simple. If you want to use it to good effect to get agent to triple about things that aren't on the web then you can use it to good effect to do so for the things 'off-the-web' that you have chosen to give http: URI (sans frag) to.
>   
Thanks for the reply Stuart (the rest I talk about in my response to
Roy). I do think 303 *might* help the particular problem brought up by
DanC [1] a while back, but it does not address the issue about
connecting authoritative representations to URIs. Both you and Jonathan
may be interested in the following pre-print of a paper by myself and
Pat called "In Defense of Ambiguity" which comes out in the IJSWIS
Journal 4(3), later this year [2]. The pre-print is here:

http://www.ibiblio.org/hhalpin/homepage/publications/indefenseofambiguity.html

I'm just going to cut and paste a bit from the paper here, which comes
over the number of ways in which 303 is insufficient to distinguish not
just between information resources and other types, but also between a
rather simple relationship between access and reference (feel free to
substitute "information resource" for "access" and "thing that isn't an
information resource" for "reference" - that's close enough for reading
purposes to get the general geist).

Furthermore, we'll go into the numerous ways that the hash solution,
while I think very useful - I use it myself - needs various standards to
be fixed a bit to work and also doesn't really address the problem of
attaching normative descriptions to resources. I do hope this helps, but
I also think it should make us in the Web community a bit nervous about
rubber-stamping any solution to both httpRange-14 and
httpDescriptions-57 quite yet.

" Pragmatically, there are problems with the TAG's suggested
redirection. It uses a distinction in how a text is delivered (an HTTP
code) to disambiguate the accessible Web page itself; a category mistake
analogous to requiring the postman dance a jig when delivering an
official letter. Since the vast majority of names, even on the Web,
refer to things which are not accessible, this requires referring URIs
to perform a act of redirection with doubtful benefit. As shown earlier,
since the URI bears no trace of its delivery to the majority of human
Web users that do not monitor or understand HTTP status codes, no
disambiguation is achieved for the human. The TAG is correct in noticing
this solution could solve the problem of inference brought up by
Connolly (2006), but it does so in such a manner that not only makes
normally harmless overloading illegal but that does not even make the
distinction between access and reference clear. The particular solution
requires the use of an arcane redirection technique that most people
actually hosting URIs are not familiar with and cannot even deploy,
since deploying 303 redirection requires access to the web server many
users may not have. It also produces harmful effects by misusing HTTP
codes for an alien purpose. The particular code, 303, is only valid for
HTTP 1.1 and was originally introduced to solve a completely different
problem. As put by the specification, “this method exists primarily to
allow the output of a POST-activated script to redirect the user agent
to a selected resource,” not to distinguish access and reference
(Fielding et al., 1999). The 303 status code was invented due to the
over-use of the HTTP 1.0 302 status code to redirect both temporarily
and permanently. The 307 and 303 status codes in HTTP 1.1 could
disambiguate between the two cases of redirection, with the 303 status
code having future requests to that URI being automatically redirected
by the browser unlike the 307 status code, which is only a “temporary”
redirection. Given this history, it is unclear why 303 is suitable for
distinguishing between access and reference. Why not just invent a new
HTTP status code? The negative effects of this redirection requirement
will continue and achieve little in return.


The main alternative to using HTTP 303 is to have a fragment
identifier—the hash—attached to a URI to get redirection for free. So,
if one wanted a URI that referred to the Eiffel Tower itself without the
hassle of a 303 redirection, one would use the URI
http://www.tour-eiffel.fr/# to refer to the Eiffel Tower and the URI
http://www.tour-eiffel.fr/ to access a Web page about the Eiffel Tower.
Since browsers think the “#” URI means a fragment of a document or some
other representation, if a user tries to access via HTTP GET a “hash
URI” it will not return a “404 Not Found” status code, but instead
simply resolve to the URI before the hash. In this way machine reasoners
can keep the URI that refers to the Eiffel Tower and a Web page about
the Eiffel Tower separate, while a human can access the URI “about” the
Eiffel Tower and receive some information about it, in essence by taking
advantage of some predefined behavior in web browsers. This solution
would solve the inference problem where monuments and Web pages are
defined in OWL as disjoint. This is valid because according to the W3C
TAG's “Architecture of the Web,” using a fragment identifier technically
also identifies a separate and distinct “secondary resource” (Jacobs and
Walsh, 2004). Further, the TAG states that “primary and secondary simply
indicate that there is a relationship between the resources for the
purposes of one URI: the URI with a fragment identifier. Any resource
can be identified as a secondary resource” (Jacobs and Walsh, 2004). So,
using hash URIs has the exact same problem as 303 redirection, since it
doesn't normatively define any sort of relationship between the two
URIs, much less distinguish between access and reference.

It appears that the W3C may very well be contradicting the relevant IETF
specification by supporting the hash URIs. The URI specification says
“the semantics of a fragment identifier are defined by the set of
representations that might result from a retrieval action on the primary
resource. The fragment's format and resolution is therefore dependent on
the media type of a potentially retrieved representation, even though
such a retrieval is only performed if the URI is dereferenced”
(Berners-Lee et al., 2005). If the media type explicitly defines what
fragment identifiers do, then the user should obey the standard of the
media type. Only “if no such representation exists, then the semantics
of the fragment are considered unknown and are effectively
unconstrained” (Berners-Lee et al., 2005). In other words, only if you
get a 404 from http://www.tour-eiffel.fr/ can
http://www.tour-eiffel.fr/# mean anything you want. However, if a Web
page with the “text/html” media type is returned by accessing the
primary (no hash) URI, then according to the HTML specification, “for
documents labeled as text/html, the fragment identifier designates the
correspondingly named element; any element may be named with the id
attribute” (Connolly, 2000). In other words, fragment identifiers should
be used for named elements in the document, not as a shortcut for
distinguishing URIs used for reference and access. This defeats the
entire purpose of using hash URIs, since the supposed benefit is that
humans can “follow-their-noses” by accessing the primary URI and thereby
access some human readable HTML about the URI. In the case where the
“application/rdf+xml” media type is returned by the accessible URI,
things are different. “In RDF, the thing identified by a URI with
fragment identifier does not necessarily bear any particular
relationship to the thing identified by the URI alone” so the hash
convention can legitimately identify anything, including non-accessible
resources (Schwartz, 2004). This seems to defeat the point of returning
representations, since unlike rendered HTML, RDF/XML is much more easily
used by machines than humans. If people accessed
http://www.tour-eiffel.fr/ and received RDF/XML most would have no idea
what to do with it. It is most useful for machine processing, not
informing humans.


Strangely enough, the very idea that a media type determines the
semantics of the fragment identifier is in conflict with other
statements from the W3C. Even if one accepted a “URI identifies one
thing.” if by using content negotiation, both a “application/rdf+xml”
and “text/html” media type were available for a URI, then the meaning of
the URI with fragment identifier would be interpreted two different ways
depending on the media type received, and so the URI would not identify
a single resource with a global scope. This fundamentally breaks the
orthogonality of the specifications, as a single resource can return
different kinds of representations, so how a “hash URI” can be used is
dependent on media types. The URI specification explicitly says one
should not do this, for “whatever is identified by the fragment should
be consistent across all those representations” (Berners-Lee et al.,
2005). One could imagine the hash somehow being consistent across
representations, but if the fragment identifier exists in a RDF document
and in the HTML document, the meaning of the fragment identifier will be
muddled since it will identify both a portion of a document in HTML and
possibly some non-Web accessible thing. In cases where the fragment
identifier exists in RDF and not in HTML, it will be a broken fragment
identifier for an HTML document and perhaps specified by the RDF, and so
inconsistent. If the fragment identifier is non-existent in both the RDF
and HTML documents, in RDF the fragment identifier can identify a
non-Web accessible resource but not so in the HTML document, where it
will just be a broken fragment identifier for a particular document.
Regardless, there needs to be a mechanism in HTML for saying that either
the given use of a fragment identifier is for non-Web accessible things,
or that fragment identifiers that are not given by the HTML
representation can be anything, including non-Web accessible things. So,
this use of fragment identifiers, while convenient and much more
practical than 303 redirection, is as far from “a URI identifies one
thing” as one can get. One can assume that at some point the W3C will
fix the relevant specifications to be more inline with their proposed
solutions, but the hash URI is no panacea for distinguishing access and
reference. While easier for users to deploy than 303 redirection, it
still does not distinguish access and reference any better than 303
redirection.

References:

Berners-Lee, T., Fielding, R., and Masinter. L. (2005). IETF RFC 3986
Uniform Resource Identifier (URI): Generic Syntax.
http://www.ietf.org/rfc/rfc3986.txt.

Conolly, D. (2000). IETF RFC (Informational) 2854 The 'text/html' Media
Type. http://www.ietf.org/rfc/rfc2854.txt.

Connolly, D. (2006). A Pragmatic Theory of Reference for the Web.
Proceedings of the Identity, Reference, and the Web (IRW2006) Workshop
at the World Wide Web Conference (WWW2006). Edinburgh, United Kingdom.
May 22nd 2006.

Fielding, R., Gettys, J., Mogul, J., Frystyk, H., Masinter, L., Leach,
P. and Berners-Lee, T. (1999) IETF RFC 2616 - Hypertext Transfer
Protocol – HTTP/1.1. http://www.ietf.org/rfc/rfc1738.txt.

Jacobs, I. and Walsh, N. (2004). Architecture of the World Wide Web. W3C
Recommendation. http://www.w3.org/TR/webarch/.



Schwatrz, A. (2004). IETF RFC 3870 application/rdf+xml Media Type
Registration. http://www.ietf.org/rfc/rfc3870.txt.

[1]http://www.w3.org/2006/04/irw65/urisym.html
[2]http://www.ibiblio.org/hhalpin/homepage/publications/indefenseofambiguity.html

> Does 303 guarantee to get you to triple? No - but then you have probably provided very little help to anyone interested in the URI you deployed.
>
> Hard to deploy? Well, yes and no depending on the server software you are using and your access priviledges. That's a pragmatic problem induced by the design of servers and the admin policies under which they operate. It's not a problem of Architecture.
>
> Regard
>
> Stuart
> --
> Hewlett-Packard Limited registered Office: Cain Road, Bracknell, Berks RG12 1HN
> Registered No: 690597 England
>
>   


-- 
		-harry

Harry Halpin,  University of Edinburgh 
http://www.ibiblio.org/hhalpin 6B522426
Received on Thursday, 20 March 2008 20:29:28 UTC