Re: Which URI should be persistent when redirects are used? from Tim Berners-Lee on 2007-10-01 (www-tag@w3.org from October 2007)

From: Tim Berners-Lee <timbl@w3.org>
Date: Mon, 1 Oct 2007 00:26:23 -0400
To: Alan Ruttenberg <alanruttenberg@gmail.com>
Cc: wangxiao@musc.edu, Misha Wolf <Misha.Wolf@reuters.com>, W3C-TAG <www-tag@w3.org>, semantic-web-ig list <semantic-web-ig.list@reuters.com>
Message-Id: <6911F4E5-354C-44AC-92C4-8D9181B463A2@w3.org>
On 2007-09 -30, at 00:06, Alan Ruttenberg wrote:

>
> Richard, I am concerned, in this question, with content  
> negotiation, although some of the same questions arise with  
> redirection. Also, I am also not concerned with the http browser  
> activities. My concern is the Semantic Web, and that the sort of  
> answers and definitions which are offered for the traditional web  
> do not seem to work, or at least are not  understandable to me in  
> the context of the semantic web.
>
> Xiaoshu,
>
> You wrote:
>> Content negotiation doesn't change the URI.  The server returns  
>> different representations for a particular request depending on  
>> the MIME type and q score, but all the representation is under the  
>> same URI.
>
> Let's examine the following situation:
>
> 3 URIs
>   http://example.com/depict/alan
>   http://example.com/depict/alan.jpg
>   http://example.com/depict/alan.png
>
> http://example.com/depict/alan does content negotiation, and  
> depending on whether the agent wants jpeg or png, redirects to one  
> of the two other URIs.  The bits for the jpg and the png are in a  
> file on the server's file system.
>
> The question is what http://example.com/depict/alan is.

"http://example.com/depict/alan" is a URI identifying "Generic  
Resource".
I wrote about this in  <http://www.w3.org/DesignIssues/Generic>  IIRC
(I am on a plane) years ago.

The <http://example.com/depict/alan> resource is generic in that it  
isn't
specified to the to level of what content-type is returned.
Genericness of resource is not always about content-type, it can also be
with respect to version, to natural language used.
Most URIs on the web are generic in one or more directions.
There are not always individual URIs available for the specific  
resources, but often there are.
Generic resources are  valuable concept as most of the time
we don't want to just refer to a specific version in a specific format
and a specific language.


> By my understand of your instruction, a web server should, in some  
> circumstances, when asked for the resource identified by http:// 
> example.com/depict/alan sometimes return the bits from the jpg  
> document, and sometimes return the bits from the png document.  
> These two different sets of bits are both "identified" by the same  
> URI, http://example.com/depict/alan, each of which should be  
> considered a representation of http://example.com/depict/alan
>
> Similarly, when the web server is asked for the resource identified  
> by http://example.com/depict/alan.jpg it should  return the bits  
> from the jpg document and we say that the resource returned is a  
> representation of http://example.com/depict/alan.jpg

Well, we don't use those words that way.   A resource is not returned.
A Representation is returned.
A representation is a structure of a) the HTTP headers which include  
one Content-type: image/jpeg and b) the bits of the picture.  (This  
is the term in Roy Fielding's, PhD thesis, and the TAG AWWW).
The word 'representation' here is used in a technical sense, like  
'packet' in IP, or 'internet message' in SMTP or 'completed return'  
in the IRS 1040 filing instructions.



> Now, I step back, and move into a language which I understand  
> better. I consider the jpg and png files documents in the  
> traditional and easier sense - they are a series of bits. They  
> won't change. I consider a "copy" any other document that has  
> exactly the same series of bits.

These are specific documents.  Non-generic resources.    Any  
Representation sent back for them will always have the same bits.  
OK.  Strictly, I wouldn't say the document *is* the bits.  The  
document is still a picture, a very specific one. The Representation  
still needs the headers as well as the bits, as you can't render the  
picture, in general, without knowing what format it is in.  The  
architecture is such, anyway, that you always send a Representation.

> I am thinking that I would like URIs to to identify this document.  
> Naively, perhaps, I choose http://example.com/depict/alan.jpg, and  
> http://example.com/depict/alan.png. If anyone asks me what I mean  
> by resource in this case I will say: "By resource, I mean document,  
> in the sense described".

Ok

> If they ask me what I mean by representation in this case, I will  
> say: "I don't know. Ask Xiaoshu".

Hope that is clear now.

> From this mindset, I will at some time later encounter http:// 
> example.com/depict/alan. Upon accessing it, I get a series of bits.

And metadata. You get a Representation of the picture in the HTTP sense.

> Upon examining the bits, I find that they are the same set of bits  
> as the document. I say, oh, http://example.com/depict/alan  
> identifies the same document as http://example.com/depict/alan.  
> Conclusion http://example.com/depict/alan is an alias for http:// 
> example.com/depict/alan [.jpg]

Well, you have got matching representation bits (and content-type)  
for each.
This does not mean they are the same resource.
One, the generic one, may well be sent with a  Vary: Accept" header  
meaning that
the  result you will get for this can vary.  It may also have
a "Content-Location:  alan.jpg   header to let you know a URI you can  
later useif you want to refer to the specific resource.  So the HTTP  
representation in this case allows you to build a pretty good picture  
of what is going on.

> Some time later I access http://example.com/depict/alan with a  
> different agent. Upon accessing it, I get a series of bits. Upon  
> examining the bits, I find that they are the same set of bits as  
> the document. I say, oh, http://example.com/depict/alan identifies  
> the same document as http://example.com/depict/alan.png. Conclusion  
> http://example.com/depict/alan is an alias for http://example.com/ 
> depict/alan
>

Now you have a picture of the generic resource and two different  
specific ones.



> But wait, it is worse, I now have a URI that seems to break the  
> rules, and identifies two documents
>

No, if you understand generic documents then you see that different  
URIs are useful
for the generic and specific versions,a and all is well.

In general, you are getting at ... is this a good thing, and what  
should the Semantic web do when referencing the document? The answer  
is, almost always refer using the generic URI.
Most of the things you say about it, like licensing, or what it  
depicts, are true of the image as a generic thing, independent of  
what image formats the server might have available now or in the  
future.   When you embed the image in a hypertext page, you use the  
generic URI, so that browsers with different capabilities will still  
work.  This is standard practice for the images on the W3C site, for  
example.

If you actually want to store the relationship between the various  
generic and specific resources in RDF, then there is a little  
ontology I made you might find useful
<http://www.w3.org/2006/gen/ont>.   (Also specific resources .n3  
and .rdf but please refer using the generic URI :-).

An example of its use is in <http://www.w3.org/2007/ont/meta>.

Tim
Received on Monday, 1 October 2007 14:45:40 UTC