Re: RDF Update Feeds + URI time travel on HTTP-level from Herbert Van de Sompel on 2009-11-24 (public-lod@w3.org from November 2009)

From: Herbert Van de Sompel <hvdsomp@gmail.com>
Date: Mon, 23 Nov 2009 21:02:37 -0700
To: Erik Hetzner <erik.hetzner@ucop.edu>, Linked Data community <public-lod@w3.org>
Cc: "Michael L. Nelson" <mln@cs.odu.edu>, Robert Sanderson <azaroth42@gmail.com>
Message-Id: <F7AC7062-F8CB-4389-A371-B6CBB33B1AED@gmail.com>
On Nov 23, 2009, at 4:59 PM, Erik Hetzner wrote:
> At Mon, 23 Nov 2009 00:40:33 -0500,
> Mark Baker wrote:
>>
>> On Sun, Nov 22, 2009 at 11:59 PM, Peter Ansell <ansell.peter@gmail.com 
>> > wrote:
>>> It should be up to resource creators to determine when the nature  
>>> of a
>>> resource changes across time. A web architecture that requires every
>>> single edit to have a different identifier is a large hassle and
>>> likely won't catch on if people find that they can work fine with a
>>> system that evolves constantly using semi-constant identifiers,  
>>> rather
>>> than through a series of mandatory time based checkpoints.
>>
>> You seem to have read more into my argument than was there, and
>> created a strawman; I agree with the above.
>>
>> My claim is simply that all HTTP requests, no matter the headers, are
>> requests upon the current state of the resource identified by the
>> Request-URI, and therefore, a request for a representation of the
>> state of "Resource X at time T" needs to be directed at the URI for
>> "Resource X at time T", not "Resource X".
>
> I think this is a very compelling argument.

Actually, I don't think it is.  The issue was also brought up (in a  
significantly more tentative manner) in Pete Johnston blog entry on  
eFoundations (http://efoundations.typepad.com/efoundations/2009/11/memento-and-negotiating-on-time.html 
). Tomorrow, we will post a response that will try and show that  
"current state" issue is - as far as we can see - not quite as  
"written in stone" as suggested above in the specs that matter in this  
case, i.e. Architecture of the World Wide Web and RFC 2616. Both are  
interestingly vague about this.


>
> On the other hand, there is, nothing I can see that prevents one URI
> from representing another URI as it changes through time. This is
> already the case with, e.g.,
> <http://web.archive.org/web/*/http://example.org>, which represents
> the URI <http://example.org> at all times. So this URI could, perhaps,
> be a target for X-Accept-Datetime headers.

That is actually what we do in Memento (see our paper http://arxiv.org/abs/0911.1112) 
, and we recognize two cases, here:

(1) If the web server does not keep track of its own archival  
versions, then we must rely on archival versions that are stored  
elsewhere, i.e. in Web Archives. In this case, the original server who  
receives the request can redirect the client to a resource like the  
one you mention above, i.e. a resource that stands for archived  
versions of another resource. Note that this redirect is a simple  
redirect like the ones that happen all the time on the Web. This is  
not a redirect that is part of a datetime content negotiation flow,  
rather a redirect that occurs because the server has detected an X- 
Accept-Datetime header. Now, we don't want to overload the existing <http://web.archive.org/web/*/http://example.org 
 > as you suggest, but rather choose to introduce a special-purpose  
resource that we call a TimeGate <http://web.archive.org/web/timegate/http://example.org 
 >. And we indeed introduce this resource as a target for datetime  
content negotiation.

(2) If the web server does keep track of its own archival versions  
(think CMS), then it can handle requests for old versions "locally" as  
it has all the information that is required to do so. In this case, we  
could also introduce a special-purpose, distinct, TimeGate on this  
server, and have the original resource redirect to it. That would make  
this case in essence the same as (1) above. This, however, seemed like  
a bit of overkill and we felt that the original resource and the  
Timegate could coincide; meaning datetime content negotiation occurs  
directly against the original resource. Meaning the URI that  
represents the resource as it evolves over time is the URI of the  
resource itself. It stands for past and present versions. The present  
version is delivered (200 OK) from that URI itself (business as  
usual), archived versions are delivered from other resources via  
content negotiation (302 with Location different than the original URI)

In In both (1) and (2) the original resource plays a role in the  
framework, either because it redirects to an external TimeGate that  
performs the datetime content negotiation, or because it performs the  
datetime content negotiation itself. And we actually think that is  
quite essential that this original resource is involved. It is the URI  
of the original resource by which the resource has been known as it  
evolved over time. It makes sense to be able to use that URI to try  
and get to its past versions. And by "get", I don't mean search for  
it, but rather use the network to get there. After all, we all go by  
the same name irrespective of the day you talk to us. Or we have the  
same Linked Data URI irrespective of the day it is dereferenced. Why  
would we suddenly need a new URI when we want to see what the LoD  
description for any of us was, say, a year ago? Why must we prevent  
that this same URI helps us to get to prior versions?



>
> There is something else that I find problematic about the Memento
> proposal. Archival versions of a web page are too important to hide
> inside HTTP headers.
>
> To take the canonical example, if I am viewing
> <http://oakland.example.org/weather>, I don’t want the fact that I am
> viewing historical weather information to be hidden in the request
> headers.
>

It is not. The _request_ for prior versions is in a request header.  
The response will come from a URI different than <http://oakland.example.org/weather 
 >, e.g. <http://oakland.example.org/20091012/weather> or <http://web.archive.org/web/20091012/http://oakland.example.org/weather 
 > and there will be a response header provided by the server that  
delivers this response (X-Archive-Interval) that informs the client  
unambiguously that the response _is_ an archived version. This info  
can be leveraged by the client to give the archived version the  
position of first class citizen it deserves.

> Furthermore, I am viewing resource X as it appeared at time T1, I
> should *not* be able to copy that URI and send it to a friend, or use
> it as a reference in a document, only to have them see the URI as it
> appears at time T2.
>

You will not. You would copy the URI  <http://oakland.example.org/20091012/weather 
 > or <http://web.archive.org/web/20091012/http://oakland.example.org/weather 
 >.
I think the misconception in this discussion is that the archived  
version is _delivered_ by the original URI. It is not. The archived  
version is _requested_ via the original URI, and it is _delivered_ by  
a resource at another URI. As is the case with all content negotiation.


> I think that those of us in the web archiving community [1] would very
> much appreciate a serious look by the web architecture community into
> the problem of web archiving. The problem of representing and
> resolving the tuple <URI, time> is a question which has not yet been
> adequately dealt with.

I hope that with Memento we have provided a significant contribution  
towards addressing that question. I think our paper at http://arxiv.org/abs/0911.1112 
   describes the proposed solution in quite some details, and  
addresses quite some of the concerns raised in the discussion on this  
list, so far. And, as indicated before, there's also the slides in  
case there is not enough time to read the paper (http://www.slideshare.net/hvdsomp/memento-time-travel-for-the-web 
).

Greetings

Herbert Van de Sompel



>
> best,
> Erik Hetzner
>
> 1. Those unfamiliar with web archives are encouraged to visit
> <http://web.archive.org/>, <http://www.archive-it.org/>,
> <http://www.vefsafn.is/>, <http://webarchives.cdlib.org/>, ...
> ;; Erik Hetzner, California Digital Library
> ;; gnupg key id: 1024D/01DB07E3

==
Herbert Van de Sompel
Digital Library Research & Prototyping
Los Alamos National Laboratory, Research Library
http://public.lanl.gov/herbertv/
tel. +1 505 667 1267
Received on Tuesday, 24 November 2009 04:03:23 UTC