RE: httpRange-14: Consequences of redirection from Williams, Stuart (HP Labs, Bristol) on 2007-11-29 (www-tag@w3.org from November 2007)

From: Williams, Stuart (HP Labs, Bristol) <skw@hp.com>
Date: Thu, 29 Nov 2007 15:16:26 +0000
To: Tore Eriksson <tore.eriksson@gmail.com>
Cc: "www-tag@w3.org" <www-tag@w3.org>
Message-ID: <9674EA156DA93A4F855379AABDA4A5C604FC841C5E@G5W0277.americas.hpqcorp.net>
Hello Tore,

A general comment that I'd make is that I think you may be confused between content negotiation and redirection.

The Content-Location header is use in content negotiation to indicate the URI of a specific variant of the resource. For example, the W3C logo is identified/denoted by the URI http://www.w3.org/Icons/w3c_home. An attempt to access (record in the wget debug log below) returns a .png representation of the logo and a Content-Location: which indicates where that particular variant of the logo may be obtained from in the future. A subsequent attempt to retrieve a jpeg variant reveals that only .png and .gig variants are available.

Anyway, AIUI, Content-Location: as used in content negotiation provides a way to identify a more specific variant of a generic resource; a way of identifying a resources that provides access to some specific subset of the representations available from the generic reference.

        http://www.w3.org/Icons/w3c_home        denotes the W3C Icon (a particular graphic/image)
        http://www.w3.org/Icons/w3_home.png     denotes a specific variant of the W3C Icon which provides only image/png representations

Both URI denote resources which stand in variantOf(w3c_home.png, w3c_home) relation.

The TAG's 303 advice is for a different situation, where the denoted resource defies representation, but is amendable to description or depiction. In that case the advices is to arrange for access attempts to result in a redirection to a resource which, amongst other things, provides information *about* the original referent. Here the relation between resources is such that one contains a descriptio/depiction of the other (provided that you organise for that to be the case). That last cavaet is there because it is not in general true that following a 303 will lead to a decription/depiction of the original referent - but is a mechanism that *can* be usefully employed to do so. 303 is a protocol level redirection; while #'ed URI are a client side redirection (because #<frag> is stripped from the URI exchanged with the server).




        $ wget -d  http://www.w3.org/Icons/w3c_home

...
        ---request begin---
        GET http://www.w3.org/Icons/w3c_home HTTP/1.0
        User-Agent: Wget/1.10.2
        Accept: */*
        Host: www.w3.org

        ---request end---
        ---response begin---
        HTTP/1.0 200 OK
        Date: Thu, 29 Nov 2007 13:54:57 GMT
        Server: Apache/2
        Content-Location: w3c_home.png
        Vary: negotiate,accept
        TCN: choice
        Last-Modified: Mon, 24 Jul 2006 14:58:33 GMT
        ETag: "4195514757840;43bbb9edc3f40"
        Accept-Ranges: bytes
        Content-Length: 1936
        Cache-Control: max-age=2592000
        Expires: Sat, 29 Dec 2007 13:54:57 GMT
        P3P: policyref="http://www.w3.org/2001/05/P3P/p3p.xml"
        Content-Type: image/png; qs=0.7
        ---response end---
        200 OK
        Length: 1,936 (1.9K) [image/png]

        Closed fd 3
        13:54:09 (25.03 MB/s) - `w3c_home.2' saved [1936/1936]

        $ wget -d --header=Accept:image/jpeg http://www.w3.org/Icons/w3c_home
        Setting --header (header) to Accept:image/jpeg
        ---request begin---
        GET http://www.w3.org/Icons/w3c_home HTTP/1.0
        User-Agent: Wget/1.10.2
        Accept: image/jpeg
        Host: www.w3.org

        ---request end---
        ---response begin---
        HTTP/1.0 406 Not Acceptable
        Date: Thu, 29 Nov 2007 13:54:10 GMT
        Server: Apache/2
        Alternates: {"w3c_home.png" 0.7 {type image/png} {length 1936}}, {"w3c_home.gif" 0.5 {type image/gif} {length 1865}}
        Vary: negotiate,accept
        TCN: list
        Content-Length: 428
        Content-Type: text/html; charset=iso-8859-1
        ---response end---
        406 Not Acceptable



> -----Original Message-----
> From: www-tag-request@w3.org [mailto:www-tag-request@w3.org]
> On Behalf Of Tore Eriksson
> Sent: 28 November 2007 20:40
> To: www-tag@w3.org
> Subject: httpRange-14: Consequences of redirection
>
>
> Hello everybody,
>
> As most people reading this list, I have been contemplating
> the issues involved in the definition of information
> resources and the consequences of httpRange-14. It might be a
> bit impertinent to post my thoughts to the list, especially
> as this is my first contribution, but since I don't keep a
> blog I hope for your consideration in this matter.
>
> For a given representation received through the HTTP
> protocol, it could possibly contain a number of resources
> (used in a broad sense) that one would like to make statements about:
>
> A - The topic of the representation

Can we try to stay within the bounds of resource (or thing) which the representation is a representation of (eg. the daily Oaxaca weather report ; representations (an html or postcript... rendering of the daily Oaxaca weather report); and one or more subjects (if any) that the resource may be *about* (eg. today's weather in Oaxaca, Oaxaca,  rain, wind, snow...in the Oaxaca area);

> B - The textual or pictorial content (a.k.a. "document" or
> "information resource" or "conceptual work")
> C - The bitstream itself (a.k.a. "document instance")

So...
        A would be "today's weather in Oaxaca" ie a/the subject of a daily weather report;

        B would be "today's daily weather *report* for Oaxaca" (from one of possibly many sources).
        B might be conceived of as a particular variant of a generic weather report (RDF, HTML,
          PDF, JPEG, GIF...) or as a generic resource whose information content is a weather report.

        C would be either a particular occurence of a message transferring B (a 'token') as bitstream
        or the 'type' of all messages that carry that particular bit stream.

> I would like to argue that C can be excluded from this
> discussion, since there is no consistent way of addressing
> this entity through a URI, and thus it is not a first-class
> member of the Semantic Web. (I guess you could use the E-tag
> and a b-node.) The primary way to make statements about C is
> through the HTTP response headers of this specific transaction.

I agree that it is at best hard to assign URI to 'C' like things. *If* you want to be able to refer to a particular bitstream and retrieve 'copies of it' at some point in the future you would have to set up a resource which then served invariant representations of the particular representation it was intended to denote. Modulo the type/token distinction mentioned above, the pattern of publishing current version and specific version URIs for its TR page reports does something like this (treating representations as a type rather than as token).

> As for the other hypothetical resources A and B, they need to
> be denoted with different URIs as using the same URI for
> different resources will lead to semantic ambiguity and is
> not allowed. I will write the URIs as <A> and <B> for
> simplicity. The perma-discussion that is httpRange-14 has in
> my opinion two parts:
>
> (1) Can one serve a representation of A without giving the
> representation a corresponding information resource B?
> (2) How to you find <B> when you have <A>?

So continuing with the Oaxaca example here. I'd argue that "today's weather in Oaxaca" defies representation; however it can be described (in the form of a weather report - forecast or post-hoc). So, A is conceptual and without representation. B is a(n) (information) resource which describes "today's weather in Oaxaca" (possibly amongst other things). So a redirect (whether protocol induced (303) or local client side (#'d URI)) from <A> to <B> is appropriate.

wrt 1) *IF* A has representations... then serve them from <A> with a 200 OK response!
    2) Use #'d URIs or protocol redirection map from <A> to <B>

> The answer of httpRange-14 to (2) is to do a 303 redirect
> from <A> to <B>.

Or use #'d URI.

> By requiring a redirect, it also disallows
> responding directly with a 200 on <A> thus making the
> creation of <B> compulsory and consequently answering (1)
> with a NO.

Well... either A in fact has no representations OR by it's very nature defies representation so... 200 would be entirely wrong! *IF* A in fact has representations (ie. they are indeed representations of A (ie. "todays weather in Oaxaca") rather than representations of something else (eg. "a daily Oaxaca weather report for today")) then send respond with 200 and a representation.

> Let's leave the discussion concerning (1) for
> below. I would first like to focus on a practical problem
> with this proposal and the consequences thereof. Quoting from
> RFC2616 [1] on redirection:
>
> >>
> 10.3 Redirection 3xx
>
>  This class of status code indicates that further action
> needs to be  taken by the user agent in order to fulfill the
> request.  The action  required MAY be carried out by the user
> agent without interaction  with the user if and only if the
> method used in the second request is  GET or HEAD.
>  ...
>
> 14.30 Location
>
>  ...For 3xx responses, the location SHOULD indicate the
> server's preferred URI for automatic redirection to the resource.
> <<
>
> In summary, a 303 response may be automatically redirected by
> the user agent with no way to detect this step by the user.
> This behavior is implemented in many current clients, some of
> them used by SW tools.
> Even Tim Berners-Lee seems to have encountered it during the
> Taverna project, and he claims that this is a bug in the
> specification of XMLHTTPRequest [2]. However, another way of
> seeing it is that the bug is in httpRange-14, since that to
> depend on non-automatic redirection clashes with the
> underlying specifications and current practice. Since you
> can't reliably tell from the HTTP response that there exists
> a resource denoted by <B>, redirection is not a practical solution to
> (2) and the negative answer on (1) is also not enforced.
>
> Another way of looking at the problem is from the representation side.
> httpRange-14 disallows serving the representation directly as
> a 200 response and advocates to redirect to <B>, thus
> implying that this problem is one of a direct relation
> between the two resources.
> However, the relationship between A and B is not direct, but
> conditional on serving the representation. Thus the natural
> place to put this information is as meta-data on the
> representation itself, which is constrained to the HTTP
> response headers. As has been pointed out by Mark Nottingham
> [3], the logical header to use is Content-Location. Quoting
> from RFC2616 once more:
>
> >>
> 14.14 Content-Location
>
>   The Content-Location entity-header field MAY be used to supply the
>   resource location for the entity enclosed in the message when that
>   entity is accessible from a location separate from the requested
>   resource's URI. A server SHOULD provide a Content-Location for the
>   variant corresponding to the response entity; especially in the case
>   where a resource has multiple entities associated with it, and those
>   entities actually have separate locations by which they might be
>   individually accessed, the server SHOULD provide a Content-Location
>   for the particular variant which is returned.
>   ...
>   The Content-Location value is not a replacement for the original
>   requested URI; it is only a statement of the location of
> the resource
>   corresponding to this particular entity at the time of the request.
>   Future requests MAY specify the Content-Location URI as the request-
>   URI if the desire is to identify the source of that particular
>   entity.
> <<
>
> By adding the header Content-Location: <B> to the response to
> a 303 redirect from <A>, we will be able to find the
> information resource even when faced with automatic redirects
> in user agents.

This is where I begin to see confusion between content negotiation and redirection.

Why would Content-Location: (which would be bogus because it refers to the content of the specific response - which has none) be better than the Location: header strongly suggested for use with 3xx responses?

> This resolves
> (2) but what happens to (1)? Since the redirected response
> code is a 200, the de facto result is that a representation
> is served directly from <A> from the point of the user.

Well the user (or User Agent) SHOULD be aware that the redirection has occurred - that the bits they end up with didn't come from the resource they originally referenced; and an SHOULD have an indication of where the bits they got came from.

> This means that we can scrap the redirection part from
> httpRange-14, and only worry about the Content-Location
> header.

I think you are confused and that Content-Location: serves a different purpose.

I suppose target of the redirection could make a self reference with Content-Location: which would be a way of not having to remember the intermediate redirection target.

> If the header is set its value is the URL denoting
> the content B. It doesn't matter whether you used redirection
> or served the representation straight from <A>.

Content-Location: makes a claim about the relation between the resource referenced by the URI received on the request line by the server and the resource referenced by the URI in the header, and it is saying that the latter is a variant of the former. <B> variantOf <B>  doesn't seem hugely useful (except as a pragmatic means to be able to forget what you asked for, or to discover that you've been given the answer to a different question than the one you asked). Content-Location: says nothing of the relation between <A> and <B>.

> This makes HTTP redirection orthogonal to the Semantic Web,
> which I think is a good thing. What is left is meta-data on
> the representation, which already has a very clear semantic
> meaning as described in RFC2616 14.4.
>
> Going back to question (1), by making the Content-Location
> header compulsory for abstract resources (the complement of
> the class of "information resource") it is possible to answer
> this with a NO as intended in httpRange-14. My personal
> opinion is that this should be avoided. In my understanding,
> the main reason for introducing the concept of an information
> resource was a very practical one: to be able to reason about
> the content of a web-page regardless of what kind of resource
> the URI denotes. I guess that one reason for this
> polarization was the early adoption of RDF vocabularies for
> people on one hand (FOAF) and web documents on the other
> (DC). However, as trying to separate humanity in distinct
> races is easy when you look at the extremes but becomes very
> blurry and self-contradictory in the middle [4], I don't
> think this simplification has any base in reality, and
> Xiaoshu Wang has recently discussed this issue in detail [5].
> By making the Content-Location header optional, people who
> don't want to identify the content resource B can avoid doing
> so, and I think this is their prerogative and not something
> that should be enforced.

Maybe being intentionally blind: I fail to see what using Content-Location: instead or aswell as Location: buys you. It certainly conveys no more information and it risks confusion between redirection and content negotiation.

I say intentionally, because I can see that a self-referential Content-Location: accompanying the final 200 response is a way to 'sneek' the information that some http client library failed to preserve from the redirection back to User/UserAgent - but FWIW IMO it is the http client library which is at fault - the redirection should be visible to the library client.

> I am aware that there has been a long discussion about this
> issue by the very people involved in creating the web, and
> that the points I made have probably been considered before.
> I would be very interested in hearing where my arguments have
> gone astray.
>
> Tore Eriksson
>
> [1] <http://www.ietf.org/rfc/rfc2616.txt>
> [2] <http://lists.w3.org/Archives/Public/public-webapi/2006Sep/0002.html>
> [3] <http://lists.w3.org/Archives/Public/public-webapi/2006Sep/0005.html>
> [4] <http://www.genome.org/cgi/content/abstract/14/9/1679>
> [5] <http://dfdf.inesc-id.pt/tr/web-arch>

regards

Stuart Williams
--
Hewlett-Packard Limited registered Office: Cain Road, Bracknell, Berks RG12 1HN
Registered No: 690597 England
Received on Thursday, 29 November 2007 15:21:03 UTC