httpRange-14: Consequences of redirection from Tore Eriksson on 2007-11-28 (www-tag@w3.org from November 2007)

From: Tore Eriksson <tore.eriksson@gmail.com>
Date: Thu, 29 Nov 2007 05:40:01 +0900
To: www-tag@w3.org
Message-ID: <17c5ffbf0711281240p5e4a082epc5a2d0ddbf3d1e20@mail.gmail.com>
Hello everybody,

As most people reading this list, I have been contemplating the issues
involved in the definition of information resources and the
consequences of httpRange-14. It might be a bit impertinent to post my
thoughts to the list, especially as this is my first contribution, but
since I don't keep a blog I hope for your consideration in this
matter.

For a given representation received through the HTTP protocol, it
could possibly contain a number of resources (used in a broad sense)
that one would like to make statements about:

A - The topic of the representation
B - The textual or pictorial content (a.k.a. "document" or
"information resource" or "conceptual work")
C - The bitstream itself (a.k.a. "document instance")

I would like to argue that C can be excluded from this discussion,
since there is no consistent way of addressing this entity through a
URI, and thus it is not a first-class member of the Semantic Web. (I
guess you could use the E-tag and a b-node.) The primary way to make
statements about C is through the HTTP response headers of this
specific transaction.

As for the other hypothetical resources A and B, they need to be
denoted with different URIs as using the same URI for different
resources will lead to semantic ambiguity and is not allowed. I will
write the URIs as <A> and <B> for simplicity. The perma-discussion
that is httpRange-14 has in my opinion two parts:

(1) Can one serve a representation of A without giving the
representation a corresponding information resource B?
(2) How to you find <B> when you have <A>?

The answer of httpRange-14 to (2) is to do a 303 redirect from <A> to
<B>. By requiring a redirect, it also disallows responding directly
with a 200 on <A> thus making the creation of <B> compulsory and
consequently answering (1) with a NO. Let's leave the discussion
concerning (1) for below. I would first like to focus on a practical
problem with this proposal and the consequences thereof. Quoting from
RFC2616 [1] on redirection:

>>
10.3 Redirection 3xx

 This class of status code indicates that further action needs to be
 taken by the user agent in order to fulfill the request.  The action
 required MAY be carried out by the user agent without interaction
 with the user if and only if the method used in the second request is
 GET or HEAD.
 ...

14.30 Location

 ...For 3xx responses, the location SHOULD indicate the
 server's preferred URI for automatic redirection to the resource.
<<

In summary, a 303 response may be automatically redirected by the user
agent with no way to detect this step by the user. This behavior is
implemented in many current clients, some of them used by SW tools.
Even Tim Berners-Lee seems to have encountered it during the Taverna
project, and he claims that this is a bug in the specification of
XMLHTTPRequest [2]. However, another way of seeing it is that the bug
is in httpRange-14, since that to depend on non-automatic redirection
clashes with the underlying specifications and current practice. Since
you can't reliably tell from the HTTP response that there exists a
resource denoted by <B>, redirection is not a practical solution to
(2) and the negative answer on (1) is also not enforced.

Another way of looking at the problem is from the representation side.
httpRange-14 disallows serving the representation directly as a 200
response and advocates to redirect to <B>, thus implying that this
problem is one of a direct relation between the two resources.
However, the relationship between A and B is not direct, but
conditional on serving the representation. Thus the natural place to
put this information is as meta-data on the representation itself,
which is constrained to the HTTP response headers. As has been pointed
out by Mark Nottingham [3], the logical header to use is
Content-Location. Quoting from RFC2616 once more:

>>
14.14 Content-Location

  The Content-Location entity-header field MAY be used to supply the
  resource location for the entity enclosed in the message when that
  entity is accessible from a location separate from the requested
  resource's URI. A server SHOULD provide a Content-Location for the
  variant corresponding to the response entity; especially in the case
  where a resource has multiple entities associated with it, and those
  entities actually have separate locations by which they might be
  individually accessed, the server SHOULD provide a Content-Location
  for the particular variant which is returned.
  ...
  The Content-Location value is not a replacement for the original
  requested URI; it is only a statement of the location of the resource
  corresponding to this particular entity at the time of the request.
  Future requests MAY specify the Content-Location URI as the request-
  URI if the desire is to identify the source of that particular
  entity.
<<

By adding the header Content-Location: <B> to the response to a 303
redirect from <A>, we will be able to find the information resource
even when faced with automatic redirects in user agents. This resolves
(2) but what happens to (1)? Since the redirected response code is a
200, the de facto result is that a representation is served directly
from <A> from the point of the user.

This means that we can scrap the redirection part from httpRange-14,
and only worry about the Content-Location header. If the header is set
its value is the URL denoting the content B. It doesn't matter whether
you used redirection or served the representation straight from <A>.
This makes HTTP redirection orthogonal to the Semantic Web, which I
think is a good thing. What is left is meta-data on the
representation, which already has a very clear semantic meaning as
described in RFC2616 14.4.

Going back to question (1), by making the Content-Location header
compulsory for abstract resources (the complement of the class of
"information resource") it is possible to answer this with a NO as
intended in httpRange-14. My personal opinion is that this should be
avoided. In my understanding, the main reason for introducing the
concept of an information resource was a very practical one: to be
able to reason about the content of a web-page regardless of what kind
of resource the URI denotes. I guess that one reason for this
polarization was the early adoption of RDF vocabularies for people on
one hand (FOAF) and web documents on the other (DC). However, as
trying to separate humanity in distinct races is easy when you look at
the extremes but becomes very blurry and self-contradictory in the
middle [4], I don't think this simplification has any base in reality,
and Xiaoshu Wang has recently discussed this issue in detail [5]. By
making the Content-Location header optional, people who don't want to
identify the content resource B can avoid doing so, and I think this
is their prerogative and not something that should be enforced.

I am aware that there has been a long discussion about this issue by
the very people involved in creating the web, and that the points I
made have probably been considered before. I would be very interested
in hearing where my arguments have gone astray.

Tore Eriksson

[1] <http://www.ietf.org/rfc/rfc2616.txt>
[2] <http://lists.w3.org/Archives/Public/public-webapi/2006Sep/0002.html>
[3] <http://lists.w3.org/Archives/Public/public-webapi/2006Sep/0005.html>
[4] <http://www.genome.org/cgi/content/abstract/14/9/1679>
[5] <http://dfdf.inesc-id.pt/tr/web-arch>
Received on Wednesday, 28 November 2007 22:15:36 UTC