Re: New version of Memento I-D from Herbert van de Sompel on 2012-01-09 (www-tag@w3.org from January 2012)

From: Herbert van de Sompel <hvdsomp@gmail.com>
Date: Mon, 9 Jan 2012 14:41:12 -0700
To: Jonathan A Rees <rees@mumble.net>
Cc: "www-tag@w3.org" <www-tag@w3.org>
Message-ID: <CAOywMHfGXuTAfuN_zjYxo3sAKDosqZ4gkPdrwOoOuSL9rdHaTw@mail.gmail.com>
On Sun, Jan 8, 2012 at 5:08 PM, Jonathan A Rees <rees@mumble.net> wrote:

> On Sat, Dec 24, 2011 at 8:48 AM, Herbert Van de Sompel
> <hvdsomp@gmail.com> wrote:
> > Also, I would like to suggest that Memento fits in the scope of Goal (3)
> of the Persistence of Identifiers work:
> >
> > http://www.w3.org/2001/tag/products/persistence.html
>


Thanks for your thoughts, Jonathan. I insert some comments, below.


>
> This is an interesting idea - I do see Memento fitting into the
> broader aim of "persistent reference" because URI together with date
> establish a sort of reference (to versions of documents that don't
> change more frequently than the archives snapshot them).


Generally speaking, I think the combination of a URI and a time can be
regarded to indicate the state a URI-identified resource had at a specified
time. And the Memento framework as it currently stands can help with
obtaining a representation of that state, as long as the resources are
identified by HTTP URIs. How temporally accurate that representation is
depends (among others) on the temporal granularity at which the resource
was "archived". There's a whole spectrum with this regard ranging between:


(a) Observations of a resource are archived: the resource is "passively"
archived, e.g. a web server relies on a web archive to take snapshots of
its resources. (this is what you refer to)

(b) History of the resource is archived: the web server is "actively"
involved in archiving its resources, as is - for example - the case with
version control systems (CMS etc) and transactional web archives. In this
case it is exactly known which version was operational at which time and
there are no gaps in coverage.


Memento has been demonstrated in that range of situations: There are
working examples of the Memento framework in conjunction with web archives
(e.g. IA), wikis (cf the MediaWiki Memento plug-in <
http://www.mediawiki.org/wiki/Extension:Memento>), transactional archives
(cf the experimental <http://theresourcedepot.org>), Linked Data archives
(cf the DBpedia archive described in <http://arxiv.org/abs/1003.3661>).



> But it seems
> to me to have the goal of remediation or workaround, in the situation
> where persistent identifiers are *not* available. That is, it aims to
> use archives in order to make it *unnecessary* to solve the persistent
> identifier problem.


This is a rather accurate observation. My interest in working in the realm
of persistent referencing rather than persistent identification is probably
a reflection of being worn out by poking at the latter over the past 10
years (mostly in the realm of digital libraries and scholarly
communication). I guess I have started to wonder when persistent
identifiers actually *are* available.


Anyhow, with regard to persistent referencing, I have explored some issues
that involve time and scholarly citations (including citations with e.g.
DOIs) in the presentation "Time Travel for the Scholarly Web" that I
recently gave at the STM Innovations Seminar in London. It can be found at <
http://river-valley.tv/time-travel-for-the-scholarly-web/>.



> Therefore it does not really fall in the scope of
> a persistent *identifiers* project, only  of a persistent *references*
> project.


Understood. Still, I am thinking the problems are rather closely related,
and, given how hard the persistent identification problem is, maybe
ignoring it and addressing persistent referencing becomes an appealing
strategy? ;-)



> I certainly consider it to be in the solution space for the
> latter, although it is disturbing that there is no syntax for the kind
> of reference one might want to do (the date is supplied implicitly).
>

Memento indeed does not provide such syntax. However, Larry Masinter's DURI
proposal does, and DURIs that carry a HTTP URI could be dereferenced via
Memento. The Memento I-D references DURI. The DURI I-D does not reference
Memento.


Also, scholarly references to "regular" web resources typically carry an
observation datetime that could be used. Scholarly citations to
DOI-identified resources don't, and I think they should especially now that
CrossRef has started recommending the use of the http://dx.doi.org/... form
of DOIs in citations. (again, see the aforementioned presentation).


>
> The exception would be the special case of unchanging and unreplaced
> documents, whose URIs under Memento become persistent URIs, even if
> they were not originally designed to be such.


Correct. This is described explicitly in <
http://www.mementoweb.org/guide/rfc/ID/#HTTP_OriginalResource_server_memento
>



> (This would rely on
> "tombstoning" i.e. persistent delivery of 404s for lost documents.) I
> imagine Memento's discovery protocol would lead you to archived copies
> even if the document goes missing from its original location and even
> if there is no known appropriate date (right?).
>

The latter is correct. Without a date, it would lead to the most recent
archived version. With a date it should lead to a matching version. And
Memento would also do this for cases in which e.g. the domain is no longer
active.

Greetings

Herbert


>
> Jonathan
>
> > Greetings
> >
> > Herbert Van de Sompel
> > http://public.lanl.gov/herbertv/
>



-- 
Herbert Van de Sompel
Digital Library Research & Prototyping
Los Alamos National Laboratory, Research Library
http://public.lanl.gov/herbertv/

==
Received on Monday, 9 January 2012 21:41:44 UTC