- From: Herbert van de Sompel <hvdsomp@gmail.com>
- Date: Mon, 30 Sep 2013 12:04:57 -0600
- To: Larry Masinter <masinter@adobe.com>
- Cc: "ashok.malhotra@oracle.com" <ashok.malhotra@oracle.com>, Karl Dubost <karl@la-grange.net>, Mark Nottingham <mnot@mnot.net>, "www-tag@w3.org WG" <www-tag@w3.org>, Herbert van de Sompel <hvdsomp@gmail.com>
On Sat, Sep 28, 2013 at 10:35 PM, Larry Masinter <masinter@adobe.com> wrote: > to solve "link rot" you have to solve "storage rot": > > having a reliable archive for insuring permanent access to referenced material is both > * necessary: without access to some representation of the material, the persistent pointer is useless > * sufficient: any permanent way of accessing material must of necessity have an index system for identifying the material preserved. > For many web resources, prior versions are available from a variety of sources, not just one source: * crawl-based archives such as the Internet Archive, and national web archives such as the British Library, the UK National Archives, and the Icelandic web archives, * subscription-based web archives such as Archive-It, * on-demand web archives such as archive.is and perma.cc, * transactional web archives, cf SiteStory, cf http://mementoweb.github.io/SiteStory/, * content management systems with time-based versioning such as Wikipedia and all MediaWiki installations. All these sources of resource versions have their own index, which in essence contains the following information per version resource: - URI-R of the original resource - URI-M of the versioned resource - version datetime The Memento protocol specifies an interoperable approach to interact with those indexes. It consists of two components: - Datetime negotiation with a TimeGate for an original resource: Given a URI-R of the original resource and a preferred datetime return a URI-M for a versioned resource that is temporally close to the preferred datetime. Note that for a CMS the exact version that was active at the preferred datetime will be returned. For web archives, how close the returned version is to the preferred datetime depends on the coverage of the archive for the original resource. - List of all versions of the original resource via a TimeMap for an original resource: Such a list details the URI-R of the original resource, and for each resource version the URI-M of a version resource as well as its version datetime. Some resources/servers express a preference for a certain archive. For example, DBpedia provides an HTTP Link pointing at the DBpedia Archive. Most resources don't, in which case a Memento client will decide itself which archive to interact with. The Memento extension for Chrome that will be pre-released today, allows a user to set a preference for a default web archive. Admittedly, these sources of prior resource versions do not cover all prior versions of all resources. But there's a significant body of prior resource versions out there. For example, the Internet Archive is said to currently contain 335 billion archived web resources. To put it differently, there's a significant body of URIs out there for which machine-actionable temporal information added to a link, as proposed in the document I shared, would be useful rather than useless. Hence, it would be nice to see a discussion that is more about that aspect of the reference rot problem that is addressed in the document I shared, and less about those aspect that the document has no proposal for and for which it relies on ongoing international efforts pertaining to web archiving. Cheers Herbert -- Herbert Van de Sompel Digital Library Research & Prototyping Los Alamos National Laboratory, Research Library http://public.lanl.gov/herbertv/ ==
Received on Monday, 30 September 2013 18:05:28 UTC