- From: Michael Nelson <mln@cs.odu.edu>
- Date: Wed, 25 Nov 2009 00:21:04 -0500
- To: Erik Hetzner <erik.hetzner@ucop.edu>
- CC: Herbert Van de Sompel <hvdsomp@gmail.com>, Linked Data community <public-lod@w3.org>, Robert Sanderson <azaroth42@gmail.com>
Hi Erik, Thanks for your response. I'm just going to cherry pick a few bits from it: > As an aside, which may or may not be related to Memento, do you think > there is a useful distinction to be made between web archives which > preserve the actual bytestream of an HTTP response made at a certain > time (e.g., the Internet Archive) and CMSs that preserve the general > content, but allow headers, advertisements, and so on to change (e.g., > Wikipedia). > > To see what I mean, visit: > > http://en.wikipedia.org/w/index.php?title=World_Wide_Web&oldid=9419736 > > and then: > > http://web.archive.org/web/20050213030130/en.wikipedia.org/wiki/World_Wide_Web > > I am not sure what the relationship is between these two resources. I'm not 100% sure either. I think this is a difficult problem in web archiving in general. The wikipedia link with current content substituted is not exactly the 2005 version, but the IA version isn't really what a user would have seen in 2005 either (at least in terms of presentation). And: http://web.archive.org/web/20080103014411/http://www.cnn.com/ for example gives me at least a pop-up add that is relative to today, not Jan 2008 (there may be better examples where "today's" content is in-lined, but the point remains the same). As an aside, the Zoetrope (http://doi.acm.org/10.1145/1498759.1498837) took an entirely different approach to this problem in their archives (see pp. 246-247). They basically took DOM dumps from the client and saved them, rather than a crawler-based URI approach. > My confusion on this issue stems, I believe, from a longstanding > confusion that I have had with the 302 Found response. > > My understanding of 302 Found has always been that, if I visit R and > receive a 302 Found with Location R', my browser should continue to > consider R the canonical version and use it for all further requests. > If I bookmark R' after having been redirected to R, it is in fact R > which should be bookmarked, and not R'. If I use my browser to send > that link to a friend, my browser should send R, not R'. I believe > that this is the meaning given to 302 Found in [3]. > > I am aware that browsers do not implement what I consider to be the > correct behavior here, but it is the way that I understand the > definition of 302 Found. > > Perhaps somebody could help me out by clarifying this for me? Firefox will attempt to do the right thing, but it depends on the client maintaining state about the original URI. If you dereference R, then get 302'd to R', a reload in Firefox will be on R and not R'. Obviously, if you email or share or probably even bookmark R', then this client-side state will be lost and 3rd party reloads will be relative to R' (in fact, that might be want you *want* to occur). But at least within a session, Firefox (and possibly other browsers) will reload wrt to the original URI. Although it is not explicit in the current paper or presentation, we're planning on some method for having R' "point" back to R to facilitate Memento-aware clients to know the original URI. We're not sure syntactically how it should be done (a value in the "Alternates" response header maybe?), but semantically we want R' to point to R. This regards, Michael > > best, > Erik Hetzner > ---- Michael L. Nelson mln@cs.odu.edu http://www.cs.odu.edu/~mln/ Dept of Computer Science, Old Dominion University, Norfolk VA 23529 +1 757 683 6393 +1 757 683 4900 (f)
Received on Wednesday, 25 November 2009 05:23:59 UTC