- From: Erik Hetzner <erik.hetzner@ucop.edu>
- Date: Thu, 26 Nov 2009 11:11:02 -0800
- To: public-lod@w3.org
- Cc: Michael Nelson <mln@cs.odu.edu>, Herbert Van de Sompel <hvdsomp@gmail.com>, Robert Sanderson <azaroth42@gmail.com>
- Message-ID: <P-IRC-EXBE01sBk31TZ0000359a@EX.UCOP.EDU>
At Wed, 25 Nov 2009 00:21:04 -0500, Michael Nelson wrote: > Hi Erik, > > Thanks for your response. I'm just going to cherry pick a few bits from > it: > > > As an aside, which may or may not be related to Memento, do you think > > there is a useful distinction to be made between web archives which > > preserve the actual bytestream of an HTTP response made at a certain > > time (e.g., the Internet Archive) and CMSs that preserve the general > > content, but allow headers, advertisements, and so on to change (e.g., > > Wikipedia). > > > > To see what I mean, visit: > > > > http://en.wikipedia.org/w/index.php?title=World_Wide_Web&oldid=9419736 > > > > and then: > > > > http://web.archive.org/web/20050213030130/en.wikipedia.org/wiki/World_Wide_Web > > > > I am not sure what the relationship is between these two resources. > > I'm not 100% sure either. I think this is a difficult problem in web > archiving in general. The wikipedia link with current content substituted > is not exactly the 2005 version, but the IA version isn't really what a > user would have seen in 2005 either (at least in terms of presentation). > > And: > > http://web.archive.org/web/20080103014411/http://www.cnn.com/ > > for example gives me at least a pop-up add that is relative to today, not > Jan 2008 (there may be better examples where "today's" content is > in-lined, but the point remains the same). I can’t find the popup, but the point is well taken. The problem of what I call ‘breaking out’ of archived web content is a very real one when archived web sites are displayed without browser support, using URI ‘rewriting’ and other tricks. The possibility of coming up with a solution for this problem is one reason why I am very excited about this discussion. Still, I think the intention of IA is different from that of Wikipedia’s previous versions. IA attempts to capture and replay the web exactly was it was, while Wikipedia presents its essential content in the same way while surrounding it with the latest tools. While either solution would be helpful to somebody researching the history of a Wikipedia article or to somebody looking for the previous version, only IA’s approach gives you the advertisements, etc., that can be very helpful for researchers. There is the further issue of the fact that IA’s copy is a third part and in some ways more trustworthy. Whether sites can generally be trusted to maintain accurate archives of their own content is a question that has already been answered, in my opinion. (The answer is, they can’t.) See, e.g., [1]. > As an aside, the Zoetrope (http://doi.acm.org/10.1145/1498759.1498837) > took an entirely different approach to this problem in their archives (see > pp. 246-247). They basically took DOM dumps from the client and saved > them, rather than a crawler-based URI approach. Thanks for the pointer. > > My confusion on this issue stems, I believe, from a longstanding > > confusion that I have had with the 302 Found response. > > > > My understanding of 302 Found has always been that, if I visit R and > > receive a 302 Found with Location R', my browser should continue to > > consider R the canonical version and use it for all further requests. > > If I bookmark R' after having been redirected to R, it is in fact R > > which should be bookmarked, and not R'. If I use my browser to send > > that link to a friend, my browser should send R, not R'. I believe > > that this is the meaning given to 302 Found in [3]. > > > > I am aware that browsers do not implement what I consider to be the > > correct behavior here, but it is the way that I understand the > > definition of 302 Found. > > > > Perhaps somebody could help me out by clarifying this for me? > > Firefox will attempt to do the right thing, but it depends on the client > maintaining state about the original URI. If you dereference R, then get > 302'd to R', a reload in Firefox will be on R and not R'. I hadn’t noticed this before, thank you for pointing it out. > Obviously, if you email or share or probably even bookmark R', then this > client-side state will be lost and 3rd party reloads will be relative to > R' (in fact, that might be want you *want* to occur). But at least within > a session, Firefox (and possibly other browsers) will reload wrt to the > original URI. > > Although it is not explicit in the current paper or presentation, we're > planning on some method for having R' "point" back to R to facilitate > Memento-aware clients to know the original URI. We're not sure > syntactically how it should be done (a value in the "Alternates" response > header maybe?), but semantically we want R' to point to R. This I think your email got cut off there. In any case, in the context actual existing implementations of 302, I think Memento is doing the correct thing. That is, redirection from R to the appropriate content (R') based on conneg make sense to me, for Memento, if what the user can bookmark and see is the conneg’ed URI (R') My belief (see [2] and especially [3]) is that properly behaving clients should bookmark R, not R'. I think that this could be problematic for Memento because the X-Accept-DateTime header could be lost with the bookmarking, as I mentioned in my previous message. But I think I may be beating a dead horse, because obviously clients in the real world behave by bookmarking and displaying R', not R. best, Erik Hetzner 1. <http://www.clinecenter.uiuc.edu/airbrushing_history/> 2. <http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html> 3. <http://www.w3.org/DesignIssues/UserAgent.html>
Received on Thursday, 26 November 2009 19:12:09 UTC