Re: RDF Update Feeds + URI time travel on HTTP-level from Michael Nelson on 2009-11-25 (public-lod@w3.org from November 2009)

From: Michael Nelson <mln@cs.odu.edu>
Date: Wed, 25 Nov 2009 00:21:04 -0500
To: Erik Hetzner <erik.hetzner@ucop.edu>
CC: Herbert Van de Sompel <hvdsomp@gmail.com>, Linked Data community <public-lod@w3.org>, Robert Sanderson <azaroth42@gmail.com>
Message-ID: <alpine.GSO.1.10.0911242322550.16061@vega.cs.odu.edu>

Hi Erik,

Thanks for your response.  I'm just going to cherry pick a few bits from 
it:

> As an aside, which may or may not be related to Memento, do you think
> there is a useful distinction to be made between web archives which
> preserve the actual bytestream of an HTTP response made at a certain
> time (e.g., the Internet Archive) and CMSs that preserve the general
> content, but allow headers, advertisements, and so on to change (e.g.,
> Wikipedia).
>
> To see what I mean, visit:
>
> http://en.wikipedia.org/w/index.php?title=World_Wide_Web&oldid=9419736
>
> and then:
>
> http://web.archive.org/web/20050213030130/en.wikipedia.org/wiki/World_Wide_Web
>
> I am not sure what the relationship is between these two resources.

I'm not 100% sure either.  I think this is a difficult problem in web 
archiving in general.  The wikipedia link with current content substituted 
is not exactly the 2005 version, but the IA version isn't really what a 
user would have seen in 2005 either (at least in terms of presentation).

And:

http://web.archive.org/web/20080103014411/http://www.cnn.com/

for example gives me at least a pop-up add that is relative to today, not 
Jan 2008 (there may be better examples where "today's" content is 
in-lined, but the point remains the same).

As an aside, the Zoetrope (http://doi.acm.org/10.1145/1498759.1498837) 
took an entirely different approach to this problem in their archives (see 
pp. 246-247).  They basically took DOM dumps from the client and saved 
them, rather than a crawler-based URI approach.

> My confusion on this issue stems, I believe, from a longstanding
> confusion that I have had with the 302 Found response.
>
> My understanding of 302 Found has always been that, if I visit R and
> receive a 302 Found with Location R', my browser should continue to
> consider R the canonical version and use it for all further requests.
> If I bookmark R' after having been redirected to R, it is in fact R
> which should be bookmarked, and not R'. If I use my browser to send
> that link to a friend, my browser should send R, not R'. I believe
> that this is the meaning given to 302 Found in [3].
>
> I am aware that browsers do not implement what I consider to be the
> correct behavior here, but it is the way that I understand the
> definition of 302 Found.
>
> Perhaps somebody could help me out by clarifying this for me?

Firefox will attempt to do the right thing, but it depends on the client 
maintaining state about the original URI.  If you dereference R, then get 
302'd to R', a reload in Firefox will be on R and not R'.

Obviously, if you email or share or probably even bookmark R', then this 
client-side state will be lost and 3rd party reloads will be relative to 
R' (in fact, that might be want you *want* to occur).  But at least within 
a session, Firefox (and possibly other browsers) will reload wrt to the 
original URI.

Although it is not explicit in the current paper or presentation, we're 
planning on some method for having R' "point" back to R to facilitate 
Memento-aware clients to know the original URI.  We're not sure 
syntactically how it should be done (a value in the "Alternates" response 
header maybe?), but semantically we want R' to point to R.  This

regards,

Michael

>
> best,
> Erik Hetzner
>

----
Michael L. Nelson mln@cs.odu.edu http://www.cs.odu.edu/~mln/
Dept of Computer Science, Old Dominion University, Norfolk VA 23529
+1 757 683 6393 +1 757 683 4900 (f)

Received on Wednesday, 25 November 2009 05:23:59 UTC