- From: Tore Eriksson <tore.eriksson@gmail.com>
- Date: Thu, 29 Nov 2007 05:40:01 +0900
- To: www-tag@w3.org
Hello everybody, As most people reading this list, I have been contemplating the issues involved in the definition of information resources and the consequences of httpRange-14. It might be a bit impertinent to post my thoughts to the list, especially as this is my first contribution, but since I don't keep a blog I hope for your consideration in this matter. For a given representation received through the HTTP protocol, it could possibly contain a number of resources (used in a broad sense) that one would like to make statements about: A - The topic of the representation B - The textual or pictorial content (a.k.a. "document" or "information resource" or "conceptual work") C - The bitstream itself (a.k.a. "document instance") I would like to argue that C can be excluded from this discussion, since there is no consistent way of addressing this entity through a URI, and thus it is not a first-class member of the Semantic Web. (I guess you could use the E-tag and a b-node.) The primary way to make statements about C is through the HTTP response headers of this specific transaction. As for the other hypothetical resources A and B, they need to be denoted with different URIs as using the same URI for different resources will lead to semantic ambiguity and is not allowed. I will write the URIs as <A> and <B> for simplicity. The perma-discussion that is httpRange-14 has in my opinion two parts: (1) Can one serve a representation of A without giving the representation a corresponding information resource B? (2) How to you find <B> when you have <A>? The answer of httpRange-14 to (2) is to do a 303 redirect from <A> to <B>. By requiring a redirect, it also disallows responding directly with a 200 on <A> thus making the creation of <B> compulsory and consequently answering (1) with a NO. Let's leave the discussion concerning (1) for below. I would first like to focus on a practical problem with this proposal and the consequences thereof. Quoting from RFC2616 [1] on redirection: >> 10.3 Redirection 3xx This class of status code indicates that further action needs to be taken by the user agent in order to fulfill the request. The action required MAY be carried out by the user agent without interaction with the user if and only if the method used in the second request is GET or HEAD. ... 14.30 Location ...For 3xx responses, the location SHOULD indicate the server's preferred URI for automatic redirection to the resource. << In summary, a 303 response may be automatically redirected by the user agent with no way to detect this step by the user. This behavior is implemented in many current clients, some of them used by SW tools. Even Tim Berners-Lee seems to have encountered it during the Taverna project, and he claims that this is a bug in the specification of XMLHTTPRequest [2]. However, another way of seeing it is that the bug is in httpRange-14, since that to depend on non-automatic redirection clashes with the underlying specifications and current practice. Since you can't reliably tell from the HTTP response that there exists a resource denoted by <B>, redirection is not a practical solution to (2) and the negative answer on (1) is also not enforced. Another way of looking at the problem is from the representation side. httpRange-14 disallows serving the representation directly as a 200 response and advocates to redirect to <B>, thus implying that this problem is one of a direct relation between the two resources. However, the relationship between A and B is not direct, but conditional on serving the representation. Thus the natural place to put this information is as meta-data on the representation itself, which is constrained to the HTTP response headers. As has been pointed out by Mark Nottingham [3], the logical header to use is Content-Location. Quoting from RFC2616 once more: >> 14.14 Content-Location The Content-Location entity-header field MAY be used to supply the resource location for the entity enclosed in the message when that entity is accessible from a location separate from the requested resource's URI. A server SHOULD provide a Content-Location for the variant corresponding to the response entity; especially in the case where a resource has multiple entities associated with it, and those entities actually have separate locations by which they might be individually accessed, the server SHOULD provide a Content-Location for the particular variant which is returned. ... The Content-Location value is not a replacement for the original requested URI; it is only a statement of the location of the resource corresponding to this particular entity at the time of the request. Future requests MAY specify the Content-Location URI as the request- URI if the desire is to identify the source of that particular entity. << By adding the header Content-Location: <B> to the response to a 303 redirect from <A>, we will be able to find the information resource even when faced with automatic redirects in user agents. This resolves (2) but what happens to (1)? Since the redirected response code is a 200, the de facto result is that a representation is served directly from <A> from the point of the user. This means that we can scrap the redirection part from httpRange-14, and only worry about the Content-Location header. If the header is set its value is the URL denoting the content B. It doesn't matter whether you used redirection or served the representation straight from <A>. This makes HTTP redirection orthogonal to the Semantic Web, which I think is a good thing. What is left is meta-data on the representation, which already has a very clear semantic meaning as described in RFC2616 14.4. Going back to question (1), by making the Content-Location header compulsory for abstract resources (the complement of the class of "information resource") it is possible to answer this with a NO as intended in httpRange-14. My personal opinion is that this should be avoided. In my understanding, the main reason for introducing the concept of an information resource was a very practical one: to be able to reason about the content of a web-page regardless of what kind of resource the URI denotes. I guess that one reason for this polarization was the early adoption of RDF vocabularies for people on one hand (FOAF) and web documents on the other (DC). However, as trying to separate humanity in distinct races is easy when you look at the extremes but becomes very blurry and self-contradictory in the middle [4], I don't think this simplification has any base in reality, and Xiaoshu Wang has recently discussed this issue in detail [5]. By making the Content-Location header optional, people who don't want to identify the content resource B can avoid doing so, and I think this is their prerogative and not something that should be enforced. I am aware that there has been a long discussion about this issue by the very people involved in creating the web, and that the points I made have probably been considered before. I would be very interested in hearing where my arguments have gone astray. Tore Eriksson [1] <http://www.ietf.org/rfc/rfc2616.txt> [2] <http://lists.w3.org/Archives/Public/public-webapi/2006Sep/0002.html> [3] <http://lists.w3.org/Archives/Public/public-webapi/2006Sep/0005.html> [4] <http://www.genome.org/cgi/content/abstract/14/9/1679> [5] <http://dfdf.inesc-id.pt/tr/web-arch>
Received on Wednesday, 28 November 2007 22:15:36 UTC