- From: Herbert Van de Sompel <hvdsomp@gmail.com>
- Date: Mon, 23 Nov 2009 21:02:37 -0700
- To: Erik Hetzner <erik.hetzner@ucop.edu>, Linked Data community <public-lod@w3.org>
- Cc: "Michael L. Nelson" <mln@cs.odu.edu>, Robert Sanderson <azaroth42@gmail.com>
- Message-Id: <F7AC7062-F8CB-4389-A371-B6CBB33B1AED@gmail.com>
On Nov 23, 2009, at 4:59 PM, Erik Hetzner wrote: > At Mon, 23 Nov 2009 00:40:33 -0500, > Mark Baker wrote: >> >> On Sun, Nov 22, 2009 at 11:59 PM, Peter Ansell <ansell.peter@gmail.com >> > wrote: >>> It should be up to resource creators to determine when the nature >>> of a >>> resource changes across time. A web architecture that requires every >>> single edit to have a different identifier is a large hassle and >>> likely won't catch on if people find that they can work fine with a >>> system that evolves constantly using semi-constant identifiers, >>> rather >>> than through a series of mandatory time based checkpoints. >> >> You seem to have read more into my argument than was there, and >> created a strawman; I agree with the above. >> >> My claim is simply that all HTTP requests, no matter the headers, are >> requests upon the current state of the resource identified by the >> Request-URI, and therefore, a request for a representation of the >> state of "Resource X at time T" needs to be directed at the URI for >> "Resource X at time T", not "Resource X". > > I think this is a very compelling argument. Actually, I don't think it is. The issue was also brought up (in a significantly more tentative manner) in Pete Johnston blog entry on eFoundations (http://efoundations.typepad.com/efoundations/2009/11/memento-and-negotiating-on-time.html ). Tomorrow, we will post a response that will try and show that "current state" issue is - as far as we can see - not quite as "written in stone" as suggested above in the specs that matter in this case, i.e. Architecture of the World Wide Web and RFC 2616. Both are interestingly vague about this. > > On the other hand, there is, nothing I can see that prevents one URI > from representing another URI as it changes through time. This is > already the case with, e.g., > <http://web.archive.org/web/*/http://example.org>, which represents > the URI <http://example.org> at all times. So this URI could, perhaps, > be a target for X-Accept-Datetime headers. That is actually what we do in Memento (see our paper http://arxiv.org/abs/0911.1112) , and we recognize two cases, here: (1) If the web server does not keep track of its own archival versions, then we must rely on archival versions that are stored elsewhere, i.e. in Web Archives. In this case, the original server who receives the request can redirect the client to a resource like the one you mention above, i.e. a resource that stands for archived versions of another resource. Note that this redirect is a simple redirect like the ones that happen all the time on the Web. This is not a redirect that is part of a datetime content negotiation flow, rather a redirect that occurs because the server has detected an X- Accept-Datetime header. Now, we don't want to overload the existing <http://web.archive.org/web/*/http://example.org > as you suggest, but rather choose to introduce a special-purpose resource that we call a TimeGate <http://web.archive.org/web/timegate/http://example.org >. And we indeed introduce this resource as a target for datetime content negotiation. (2) If the web server does keep track of its own archival versions (think CMS), then it can handle requests for old versions "locally" as it has all the information that is required to do so. In this case, we could also introduce a special-purpose, distinct, TimeGate on this server, and have the original resource redirect to it. That would make this case in essence the same as (1) above. This, however, seemed like a bit of overkill and we felt that the original resource and the Timegate could coincide; meaning datetime content negotiation occurs directly against the original resource. Meaning the URI that represents the resource as it evolves over time is the URI of the resource itself. It stands for past and present versions. The present version is delivered (200 OK) from that URI itself (business as usual), archived versions are delivered from other resources via content negotiation (302 with Location different than the original URI) In In both (1) and (2) the original resource plays a role in the framework, either because it redirects to an external TimeGate that performs the datetime content negotiation, or because it performs the datetime content negotiation itself. And we actually think that is quite essential that this original resource is involved. It is the URI of the original resource by which the resource has been known as it evolved over time. It makes sense to be able to use that URI to try and get to its past versions. And by "get", I don't mean search for it, but rather use the network to get there. After all, we all go by the same name irrespective of the day you talk to us. Or we have the same Linked Data URI irrespective of the day it is dereferenced. Why would we suddenly need a new URI when we want to see what the LoD description for any of us was, say, a year ago? Why must we prevent that this same URI helps us to get to prior versions? > > There is something else that I find problematic about the Memento > proposal. Archival versions of a web page are too important to hide > inside HTTP headers. > > To take the canonical example, if I am viewing > <http://oakland.example.org/weather>, I don’t want the fact that I am > viewing historical weather information to be hidden in the request > headers. > It is not. The _request_ for prior versions is in a request header. The response will come from a URI different than <http://oakland.example.org/weather >, e.g. <http://oakland.example.org/20091012/weather> or <http://web.archive.org/web/20091012/http://oakland.example.org/weather > and there will be a response header provided by the server that delivers this response (X-Archive-Interval) that informs the client unambiguously that the response _is_ an archived version. This info can be leveraged by the client to give the archived version the position of first class citizen it deserves. > Furthermore, I am viewing resource X as it appeared at time T1, I > should *not* be able to copy that URI and send it to a friend, or use > it as a reference in a document, only to have them see the URI as it > appears at time T2. > You will not. You would copy the URI <http://oakland.example.org/20091012/weather > or <http://web.archive.org/web/20091012/http://oakland.example.org/weather >. I think the misconception in this discussion is that the archived version is _delivered_ by the original URI. It is not. The archived version is _requested_ via the original URI, and it is _delivered_ by a resource at another URI. As is the case with all content negotiation. > I think that those of us in the web archiving community [1] would very > much appreciate a serious look by the web architecture community into > the problem of web archiving. The problem of representing and > resolving the tuple <URI, time> is a question which has not yet been > adequately dealt with. I hope that with Memento we have provided a significant contribution towards addressing that question. I think our paper at http://arxiv.org/abs/0911.1112 describes the proposed solution in quite some details, and addresses quite some of the concerns raised in the discussion on this list, so far. And, as indicated before, there's also the slides in case there is not enough time to read the paper (http://www.slideshare.net/hvdsomp/memento-time-travel-for-the-web ). Greetings Herbert Van de Sompel > > best, > Erik Hetzner > > 1. Those unfamiliar with web archives are encouraged to visit > <http://web.archive.org/>, <http://www.archive-it.org/>, > <http://www.vefsafn.is/>, <http://webarchives.cdlib.org/>, ... > ;; Erik Hetzner, California Digital Library > ;; gnupg key id: 1024D/01DB07E3 == Herbert Van de Sompel Digital Library Research & Prototyping Los Alamos National Laboratory, Research Library http://public.lanl.gov/herbertv/ tel. +1 505 667 1267
Received on Tuesday, 24 November 2009 04:03:23 UTC