- From: Harry Halpin <hhalpin@ibiblio.org>
- Date: Thu, 20 Mar 2008 16:26:47 -0400
- To: "Williams, Stuart (HP Labs, Bristol)" <skw@hp.com>
- Cc: Jonathan Rees <jar@creativecommons.org>, "www-tag@w3.org WG" <www-tag@w3.org>
Williams, Stuart (HP Labs, Bristol) wrote: > Hello Harry, > > >> We also should make sure any solution >> is *easy* to deploy over various levels and makes it perfectly clear >> what's going on (somewhat unlike 303, which is rather hard to >> deploy and minimalist). >> > > 303 is straight-forward and simple. If you want to use it to good effect to get agent to triple about things that aren't on the web then you can use it to good effect to do so for the things 'off-the-web' that you have chosen to give http: URI (sans frag) to. > Thanks for the reply Stuart (the rest I talk about in my response to Roy). I do think 303 *might* help the particular problem brought up by DanC [1] a while back, but it does not address the issue about connecting authoritative representations to URIs. Both you and Jonathan may be interested in the following pre-print of a paper by myself and Pat called "In Defense of Ambiguity" which comes out in the IJSWIS Journal 4(3), later this year [2]. The pre-print is here: http://www.ibiblio.org/hhalpin/homepage/publications/indefenseofambiguity.html I'm just going to cut and paste a bit from the paper here, which comes over the number of ways in which 303 is insufficient to distinguish not just between information resources and other types, but also between a rather simple relationship between access and reference (feel free to substitute "information resource" for "access" and "thing that isn't an information resource" for "reference" - that's close enough for reading purposes to get the general geist). Furthermore, we'll go into the numerous ways that the hash solution, while I think very useful - I use it myself - needs various standards to be fixed a bit to work and also doesn't really address the problem of attaching normative descriptions to resources. I do hope this helps, but I also think it should make us in the Web community a bit nervous about rubber-stamping any solution to both httpRange-14 and httpDescriptions-57 quite yet. " Pragmatically, there are problems with the TAG's suggested redirection. It uses a distinction in how a text is delivered (an HTTP code) to disambiguate the accessible Web page itself; a category mistake analogous to requiring the postman dance a jig when delivering an official letter. Since the vast majority of names, even on the Web, refer to things which are not accessible, this requires referring URIs to perform a act of redirection with doubtful benefit. As shown earlier, since the URI bears no trace of its delivery to the majority of human Web users that do not monitor or understand HTTP status codes, no disambiguation is achieved for the human. The TAG is correct in noticing this solution could solve the problem of inference brought up by Connolly (2006), but it does so in such a manner that not only makes normally harmless overloading illegal but that does not even make the distinction between access and reference clear. The particular solution requires the use of an arcane redirection technique that most people actually hosting URIs are not familiar with and cannot even deploy, since deploying 303 redirection requires access to the web server many users may not have. It also produces harmful effects by misusing HTTP codes for an alien purpose. The particular code, 303, is only valid for HTTP 1.1 and was originally introduced to solve a completely different problem. As put by the specification, “this method exists primarily to allow the output of a POST-activated script to redirect the user agent to a selected resource,” not to distinguish access and reference (Fielding et al., 1999). The 303 status code was invented due to the over-use of the HTTP 1.0 302 status code to redirect both temporarily and permanently. The 307 and 303 status codes in HTTP 1.1 could disambiguate between the two cases of redirection, with the 303 status code having future requests to that URI being automatically redirected by the browser unlike the 307 status code, which is only a “temporary” redirection. Given this history, it is unclear why 303 is suitable for distinguishing between access and reference. Why not just invent a new HTTP status code? The negative effects of this redirection requirement will continue and achieve little in return. The main alternative to using HTTP 303 is to have a fragment identifier—the hash—attached to a URI to get redirection for free. So, if one wanted a URI that referred to the Eiffel Tower itself without the hassle of a 303 redirection, one would use the URI http://www.tour-eiffel.fr/# to refer to the Eiffel Tower and the URI http://www.tour-eiffel.fr/ to access a Web page about the Eiffel Tower. Since browsers think the “#” URI means a fragment of a document or some other representation, if a user tries to access via HTTP GET a “hash URI” it will not return a “404 Not Found” status code, but instead simply resolve to the URI before the hash. In this way machine reasoners can keep the URI that refers to the Eiffel Tower and a Web page about the Eiffel Tower separate, while a human can access the URI “about” the Eiffel Tower and receive some information about it, in essence by taking advantage of some predefined behavior in web browsers. This solution would solve the inference problem where monuments and Web pages are defined in OWL as disjoint. This is valid because according to the W3C TAG's “Architecture of the Web,” using a fragment identifier technically also identifies a separate and distinct “secondary resource” (Jacobs and Walsh, 2004). Further, the TAG states that “primary and secondary simply indicate that there is a relationship between the resources for the purposes of one URI: the URI with a fragment identifier. Any resource can be identified as a secondary resource” (Jacobs and Walsh, 2004). So, using hash URIs has the exact same problem as 303 redirection, since it doesn't normatively define any sort of relationship between the two URIs, much less distinguish between access and reference. It appears that the W3C may very well be contradicting the relevant IETF specification by supporting the hash URIs. The URI specification says “the semantics of a fragment identifier are defined by the set of representations that might result from a retrieval action on the primary resource. The fragment's format and resolution is therefore dependent on the media type of a potentially retrieved representation, even though such a retrieval is only performed if the URI is dereferenced” (Berners-Lee et al., 2005). If the media type explicitly defines what fragment identifiers do, then the user should obey the standard of the media type. Only “if no such representation exists, then the semantics of the fragment are considered unknown and are effectively unconstrained” (Berners-Lee et al., 2005). In other words, only if you get a 404 from http://www.tour-eiffel.fr/ can http://www.tour-eiffel.fr/# mean anything you want. However, if a Web page with the “text/html” media type is returned by accessing the primary (no hash) URI, then according to the HTML specification, “for documents labeled as text/html, the fragment identifier designates the correspondingly named element; any element may be named with the id attribute” (Connolly, 2000). In other words, fragment identifiers should be used for named elements in the document, not as a shortcut for distinguishing URIs used for reference and access. This defeats the entire purpose of using hash URIs, since the supposed benefit is that humans can “follow-their-noses” by accessing the primary URI and thereby access some human readable HTML about the URI. In the case where the “application/rdf+xml” media type is returned by the accessible URI, things are different. “In RDF, the thing identified by a URI with fragment identifier does not necessarily bear any particular relationship to the thing identified by the URI alone” so the hash convention can legitimately identify anything, including non-accessible resources (Schwartz, 2004). This seems to defeat the point of returning representations, since unlike rendered HTML, RDF/XML is much more easily used by machines than humans. If people accessed http://www.tour-eiffel.fr/ and received RDF/XML most would have no idea what to do with it. It is most useful for machine processing, not informing humans. Strangely enough, the very idea that a media type determines the semantics of the fragment identifier is in conflict with other statements from the W3C. Even if one accepted a “URI identifies one thing.” if by using content negotiation, both a “application/rdf+xml” and “text/html” media type were available for a URI, then the meaning of the URI with fragment identifier would be interpreted two different ways depending on the media type received, and so the URI would not identify a single resource with a global scope. This fundamentally breaks the orthogonality of the specifications, as a single resource can return different kinds of representations, so how a “hash URI” can be used is dependent on media types. The URI specification explicitly says one should not do this, for “whatever is identified by the fragment should be consistent across all those representations” (Berners-Lee et al., 2005). One could imagine the hash somehow being consistent across representations, but if the fragment identifier exists in a RDF document and in the HTML document, the meaning of the fragment identifier will be muddled since it will identify both a portion of a document in HTML and possibly some non-Web accessible thing. In cases where the fragment identifier exists in RDF and not in HTML, it will be a broken fragment identifier for an HTML document and perhaps specified by the RDF, and so inconsistent. If the fragment identifier is non-existent in both the RDF and HTML documents, in RDF the fragment identifier can identify a non-Web accessible resource but not so in the HTML document, where it will just be a broken fragment identifier for a particular document. Regardless, there needs to be a mechanism in HTML for saying that either the given use of a fragment identifier is for non-Web accessible things, or that fragment identifiers that are not given by the HTML representation can be anything, including non-Web accessible things. So, this use of fragment identifiers, while convenient and much more practical than 303 redirection, is as far from “a URI identifies one thing” as one can get. One can assume that at some point the W3C will fix the relevant specifications to be more inline with their proposed solutions, but the hash URI is no panacea for distinguishing access and reference. While easier for users to deploy than 303 redirection, it still does not distinguish access and reference any better than 303 redirection. References: Berners-Lee, T., Fielding, R., and Masinter. L. (2005). IETF RFC 3986 Uniform Resource Identifier (URI): Generic Syntax. http://www.ietf.org/rfc/rfc3986.txt. Conolly, D. (2000). IETF RFC (Informational) 2854 The 'text/html' Media Type. http://www.ietf.org/rfc/rfc2854.txt. Connolly, D. (2006). A Pragmatic Theory of Reference for the Web. Proceedings of the Identity, Reference, and the Web (IRW2006) Workshop at the World Wide Web Conference (WWW2006). Edinburgh, United Kingdom. May 22nd 2006. Fielding, R., Gettys, J., Mogul, J., Frystyk, H., Masinter, L., Leach, P. and Berners-Lee, T. (1999) IETF RFC 2616 - Hypertext Transfer Protocol – HTTP/1.1. http://www.ietf.org/rfc/rfc1738.txt. Jacobs, I. and Walsh, N. (2004). Architecture of the World Wide Web. W3C Recommendation. http://www.w3.org/TR/webarch/. Schwatrz, A. (2004). IETF RFC 3870 application/rdf+xml Media Type Registration. http://www.ietf.org/rfc/rfc3870.txt. [1]http://www.w3.org/2006/04/irw65/urisym.html [2]http://www.ibiblio.org/hhalpin/homepage/publications/indefenseofambiguity.html > Does 303 guarantee to get you to triple? No - but then you have probably provided very little help to anyone interested in the URI you deployed. > > Hard to deploy? Well, yes and no depending on the server software you are using and your access priviledges. That's a pragmatic problem induced by the design of servers and the admin policies under which they operate. It's not a problem of Architecture. > > Regard > > Stuart > -- > Hewlett-Packard Limited registered Office: Cain Road, Bracknell, Berks RG12 1HN > Registered No: 690597 England > > -- -harry Harry Halpin, University of Edinburgh http://www.ibiblio.org/hhalpin 6B522426
Received on Thursday, 20 March 2008 20:29:28 UTC