Web data preservation

Dear all,

This seems to be like the flu: it comes back periodically :-). Let's have another go.


* Web data preservation
In our context it might be more appropriate to use this term, to differentiate it from traditional data preservation such as LTANS and similar

"Long-Term Archive Service Requirements"  (LTANS)
http://tools.ietf.org/html/rfc4810

"A System for Long-Term Document Preservation"
http://larry.masinter.net/0603-archiving.pdf


* Main aspects
Two main preservation aspects:
- URI      : how to find the data
- Resource : the data itself


* URI
This is addressed in COMURI
http://dragoman.org/comuri.html

In particular, ultrapercitency
"Ultrapersistent URI covers the full life-cycle: original site, archiving into archival sites, and offline data."
http://dragoman.org/comuri.html#dfn-ultrapersistent-uri

Preserving URIs for a long term, say 25 or 50 years is not trivial: the data itself is far harder.


* Resource
Web data has some different characteristic from "traditional data". For example, traditional data tended to be static and many web data (resources) are dynamic.

"Web Data" (this was previously in COMURI)
http://dragoman.org/webdata.html

This Editor's Draft captures some of nature of web data: ultrapersistentcy, archival, packing, granularity, online/offline, static/dynamic, multilinguallity, etc.


* Monolithic and separated specifications 
One should try to break the specifications into separated specifications; among others, it is easier to manage and to allocate the tasks. But if the consensus is towards monolithic specifications, I will contribution to the appropriate sections.


This message is in response to several previous messages.
 
Regards
Tomas

Received on Tuesday, 14 April 2015 14:58:35 UTC