Re: httpRange-14: Consequences of redirection from Tore Eriksson on 2007-11-30 (www-tag@w3.org from November 2007)

From: Tore Eriksson <tore.eriksson@gmail.com>
Date: Fri, 30 Nov 2007 23:18:52 +0900
To: "Williams, Stuart (HP Labs, Bristol)" <skw@hp.com>
Cc: "www-tag@w3.org" <www-tag@w3.org>
Message-ID: <17c5ffbf0711300618m7ba3edd5q4745e7af7ac8bf96@mail.gmail.com>
Hi again Stuart,

[While writing this it seems like you have respondend to my
response... It am probably repeating myself here and ignoring some
comments you have already made but I hope you don't mind me sending it
anyway.]

I would like to respond to some more things that I didn't have time
for in my previous reply. But first I have to apologize to a bunch of
people who have said approximately the same thing as me years before:

Mark Baker - http://lists.w3.org/Archives/Public/www-tag/2002Jul/0084
Dan Conolly - http://lists.w3.org/Archives/Public/www-tag/2002Mar/0188
Jonathan Borden - http://lists.w3.org/Archives/Public/www-tag/2002Mar/0260
Aaron Swartz - http://lists.w3.org/Archives/Public/www-tag/2002Jul/0319

Of course this selection might not represent the current opinion of
these persons, and I am sorry to have put you up there if that is the
case. However, since there are a few that have (or maybe had) the same
intuitive/gut feeling on this topic as me, I feel more secure in my
belief that this is how it should work:

I hesitate to argue about this since other people have said the same
thing, more eloquently. My main concern in my first mail was to point
out the problem with hidden redirects and httpRange-14. Please ignore
the following if you have had enough of the discussion of whether you
can serve representations for non-"information resources".

Stuart Williams said:
> The Content-Location header is use in content negotiation to indicate the URI of a specific variant of the resource. For example, the W3C logo is identified/denoted by the URI http://www.w3.org/Icons/w3c_home. An attempt to access (record in the wget debug log below) returns a .png representation of the logo and a Content-Location: which indicates where that particular variant of the logo may be obtained from in the future. A subsequent attempt to retrieve a jpeg variant reveals that only .png and .gig variants are available.
>
> Anyway, AIUI, Content-Location: as used in content negotiation provides a way to identify a more specific variant of a generic resource; a way of identifying a resources that provides access to some specific subset of the representations available from the generic reference.
>
>         http://www.w3.org/Icons/w3c_home        denotes the W3C Icon (a particular graphic/image)
>         http://www.w3.org/Icons/w3_home.png     denotes a specific variant of the W3C Icon which provides only image/png representations
>
> Both URI denote resources which stand in variantOf(w3c_home.png, w3c_home) relation.
>

If you don't mind I would like to restate this using another image:

<http://www.w3.org/Icons/SW/sw-horz-w3c>
   denotes the Semantic Web Icon (a particular resource that has
rdf:type foaf:Image?)
<http://www.w3.org/Icons/SW/sw-horz-w3c.png>
   denotes a resource that is a specific variant of the Semantic Web
Icon which provides only image/png representations

Can you really say that these resources stand in
variantOf(sw-horz-w3c.png, sw-horz-w3c) relation? When reading
<http://www.w3.org/2007/10/sw-logos.html>, it seems (to me) as if
<http://www.w3.org/Icons/SW/sw-horz-w3c> is a non-information resource
that except for the actual depiction also has other characteristics,
like:

<http://www.w3.org/Icons/SW/sw-horz-w3c>
    cc:license <http://www.w3.org/Consortium/Legal/2002/copyright-documents-20021231>;
    cc:license <http://www.w3.org/2007/10/sw-logos.html#LogosWithW3C>;
    rdf:label "W3C Semantic Web Logo"@en.

Since these are not a part of the png representation (they are in the
metadata of the svg one though), their relation is not variationOf but
rather that one "contains a description[n]/depiction of the other" –
your description of the relationship found in a 303 redirect. Further,
<http://www.w3.org/Icons/SW/sw-horz-w3c> does not "defy"
representation, the thing is just that the representation might be
incomplete, due to limitations on what can be described by the mime
type. According to some, this then makes
<http://www.w3.org/Icons/SW/sw-horz-w3c> a non-"Information resource",
but I do not see anything to gain from this. But maybe the WWW should
do a 303 redirect here, what do you think? Reformulating myself – the
question is how generic can a generic resource be until it becomes an
information resource? Tim Berners-Lee talks about generic "electronic
resources" in [1], but this restriction seems as a concept equivalent
to and as arbitrary as the concept of an "information resource".

As for my example:
> > For a given representation received through the HTTP
> > protocol, it could possibly contain a number of resources
> > (used in a broad sense) that one would like to make statements about:
> >
> > A - The topic of the representation
>
> Can we try to stay within the bounds of resource (or thing) which the representation is a representation of (eg. the daily Oaxaca weather report ; representations (an html or postcript... rendering of the daily Oaxaca weather report); and one or more subjects (if any) that the resource may be *about* (eg. today's weather in Oaxaca, Oaxaca,  rain, wind, snow...in the Oaxaca area);
>
> > B - The textual or pictorial content (a.k.a. "document" or
> > "information resource" or "conceptual work")
> > C - The bitstream itself (a.k.a. "document instance")
>
> So...
>         A would be "today's weather in Oaxaca" ie a/the subject of a daily weather report;
>
>         B would be "today's daily weather *report* for Oaxaca" (from one of possibly many sources).
>         B might be conceived of as a particular variant of a generic weather report (RDF, HTML,
>           PDF, JPEG, GIF...) or as a generic resource whose information content is a weather report.
>
>         C would be either a particular occurence of a message transferring B (a 'token') as bitstream
>         or the 'type' of all messages that carry that particular bit stream.
>

We agree here:
A - "today's weather in Oaxaca"
B - "today's daily weather *report* for Oaxaca" – as a generic resource

I would like to change your C into
C - a particular occurrence of a message transferring a representation
of both A and B as a bit stream

Further:
> > (1) Can one serve a representation of A without giving the
> > representation a corresponding information resource B?
> > (2) How to you find <B> when you have <A>?
>
> So continuing with the Oaxaca example here. I'd argue that "today's weather in Oaxaca" defies representation; however it can be described (in the form of a weather report - forecast or post-hoc). So, A is conceptual and without representation. B is a(n) (information) resource which describes "today's weather in Oaxaca" (possibly amongst other things). So a redirect (whether protocol induced (303) or local client side (#'d URI)) from <A> to <B> is appropriate.
>
> wrt 1) *IF* A has representations... then serve them from <A> with a 200 OK response!
>     2) Use #'d URIs or protocol redirection map from <A> to <B>
>
> > The answer of httpRange-14 to (2) is to do a 303 redirect
> > from <A> to <B>.
>
> Or use #'d URI.
>

Sure. But I wanted to talk about problems with HTTP redirects, so I
hope you excuse me for ignoring this alternative.

> > By requiring a redirect, it also disallows
> > responding directly with a 200 on <A> thus making the
> > creation of <B> compulsory and consequently answering (1)
> > with a NO.
>
> Well... either A in fact has no representations OR by it's very nature defies representation so... 200 would be entirely wrong! *IF* A in fact has representations (ie. they are indeed representations of A (ie. "todays weather in Oaxaca") rather than representations of something else (eg. "a daily Oaxaca weather report for today")) then send respond with 200 and a representation.
>

What people are arguing about is this part that says "def[ying]
representation". All representations are limited by mime type, but
that does not mean that serving the representation is pointless. In
this case, I would say that "today's weather in Oaxaca" has a
representation, and that the representation overlaps with the one for
"a daily Oaxaca weather report for today".

Ignoring this relationship is possible, but if I send as a reply to a
GET on <A> a RDF representation of today weather in Oaxaca, I can't
add the property cc:license, since this is about the _information
resource_ B. To solve this I can redirect to <B> _or_ (in my opinion)
add <B> as a Content-Location. Maybe slightly irrelevant, but adding
<B> as a Content-Location sets the BaseURI accordingly, and I can add
all RDF in one go (in turtle):

<A> w:maxTemperature "28"^^w:centigrade.
<A> w:minTemperature "12"^^w:centigrade.
<> cc:licence <>.

(Since BaseURI is mapped to <B>.)

> > By adding the header Content-Location: <B> to the response to
> > a 303 redirect from <A>, we will be able to find the
> > information resource even when faced with automatic redirects
> > in user agents.
>
> This is where I begin to see confusion between content negotiation and redirection.
>
> Why would Content-Location: (which would be bogus because it refers to the content of the specific response - which has none) be better than the Location: header strongly suggested for use with 3xx responses?
>

My intention was of course that the Content-Location header is sent
along with the final 200 response. This is of course redundant
information if you know that it was redirected, but it is still
correct. My point was that since you might not see the 303 response
and its Location header you need to propagate this information to the
final response.

> > This resolves
> > (2) but what happens to (1)? Since the redirected response
> > code is a 200, the de facto result is that a representation
> > is served directly from <A> from the point of the user.
>
> Well the user (or User Agent) SHOULD be aware that the redirection has occurred - that the bits they end up with didn't come from the resource they originally referenced; and an SHOULD have an indication of where the bits they got came from.
>

Yes, but SHOULD on the user agent is not a MUST, isn't it?

> > This means that we can scrap the redirection part from
> > httpRange-14, and only worry about the Content-Location
> > header.
>
> I think you are confused and that Content-Location: serves a different purpose.
>
> I suppose target of the redirection could make a self reference with Content-Location: which would be a way of not having to remember the intermediate redirection target.
>

That was my intention. Sorry for my befuddled language.

> > If the header is set its value is the URL denoting
> > the content B. It doesn't matter whether you used redirection
> > or served the representation straight from <A>.
>
> Content-Location: makes a claim about the relation between the resource referenced by the URI received on the request line by the server and the resource referenced by the URI in the header, and it is saying that the latter is a variant of the former. <B> variantOf <B>  doesn't seem hugely useful (except as a pragmatic means to be able to forget what you asked for, or to discover that you've been given the answer to a different question than the one you asked). Content-Location: says nothing of the relation between <A> and <B>.
>

As I said above, if the redirect is hidden the URI in the request is
different to the one in the Content-Location, thus saying that <B>
variantOf <A>. If not it becomes <B> variantOf <B>, witch at least is
not wrong. As for saying nothing about the relation between them;
neither does 303 redirection by itself as you said in your caveat ("it
is not in general true that following a 303 will lead to a
de[s]cription/depiction of the original referent - but is a mechanism
that *can* be usefully employed to do so"), but httpRange-14 adds a
possible claim on the relation. I am just arguing for a similar
reinterpretation of Content-Location.

> Maybe being intentionally blind: I fail to see what using Content-Location: instead or aswell as Location: buys you. It certainly conveys no more information and it risks confusion between redirection and content negotiation.
>
> I say intentionally, because I can see that a self-referential Content-Location: accompanying the final 200 response is a way to 'sneek' the information that some http client library failed to preserve from the redirection back to User/UserAgent - but FWIW IMO it is the http client library which is at fault - the redirection should be visible to the library client.
>

The client library is not at fault. Should still doesn't make a must.
You could argue to change the HTTP specification, but is this a step
you are willing to take?

The whole debate of information resources has in my meaning been most
thoroughly argued by Roy Fielding in [2]. In this he notes though that
"Content-Location is not a sufficient fix for this problem simply
because the resource provider has no desire to use it." However, using
redirect for the same purpose is essentially the same dilemma -
getting resource providers to adhere to a specific convention. I only
want to argue that using Content-Location is a better solution
technically, and I think persuading people to use either solution will
be difficult. However, in the Linked Data movement there are a few
large players that seem to follow what the TAG group recommends, and
this is a big opportunity. Just because Content-Location didn't catch
on before, doesn't men it won't this time.

Regards, Tore

[1] <http://www.w3.org/DesignIssues/Generic>
[2] <http://lists.w3.org/Archives/Public/www-tag/2002Aug/0000>
Received on Friday, 30 November 2007 14:19:07 UTC