- From: Herbert Van de Sompel <hvdsomp@gmail.com>
- Date: Thu, 19 Mar 2015 03:39:00 -0600
- To: "public-dwbp-wg@w3.org" <public-dwbp-wg@w3.org>
- Cc: Herbert Van de Sompel <hvdsomp@gmail.com>
Dear all, Many thanks for your enthusiastic comments. Below, I respond to some of your comments/questions. Greetings Herbert Van de Sompel == * I can only express my excitement that interest is expressed in reading the Memento protocol spec, RFC 7089. A handy HTML version is available at [1]. For a gentle introduction, see [2]. * Regarding Ghislain's remark about "storing" versions: The Memento protocol has nothing to say about criteria used for deciding when a resource is effectively a new version. The Memento protocol comes into play once temporal resource versions have been created, irrespective of the underlying approach used to create them. Typical cases are: (a) In web archiving, a temporal version (snapshot) is created after a robot crawled a page and the resulting resources (the page and its embedded/linked resources) are ingested in a web archive. The crawling date will be the Memento-Datetime. It is the date time of the observation of the crawled web resources. Returning the "best" Memento for a specified datetime is typically done on the basis of the smallest delta between the specified datetime and a Memento-Datetime value. This is the approach in all web archives including Internet Archive. (b) In CMS, software versioning systems, etc., a temporal version is created subject to technical and editorial policies. The datetime of a new temporal version becomes its Memento-Datetime. With CMS etc., one typically knows the history of resource versions, i.e. one knows the temporal interval in which they were the "live" versions. Because this history is known, returning the "best" Memento for a specified datetime is typically done by returning the version that was operational in the interval that includes the specified datetime, i.e. the version that is closest in the past to the specified datetime is returned. This approach is used e.g. in the Memento extensions for MediaWiki [3][4]. * Regarding Ghislain's remark about URI syntax for Mementos: The Memento protocol does not require any special URI syntax for Mementos as everything is (according to REST and HATEOAS principles) based on HTTP headers, typed links, negotiation. However, the syntax style exemplified by <http://dbpedia.mementodepot.org/memento/20100316/http://dbpedia.org/page/DJ_Shadow> is rather widely used/supported by web archives although definitely not uniformly. The API associated with our Time Travel service <http://bit.ly/webtimetravel> also supports the syntax. But CMS etc. definitely do not use it. * Steven says: "Your note addresses archiving published data, but I also ask how an organization can assume best practices in publication if they do not yet have policies to retain that which is not yet decided to be published?" : I guess retention is a bit of a different beast and typically subject to a range of policies. There's also the question whether everything that is decided to be retained is also public/published. Let's just say that, if an organization decides to retain resources in the public eye (i.e. publish them), the data principles apply. If the organization would already apply the data principles internally, prior to publishing, chances are high they would be in a better position to adhere to the principles when they publish. * Antoine says: "So in practice for the document I would be very happy to say that the versioned vocabulary could be published following the methods that are applied to the data itself. And count on the data versioning section to refer on Memento.": That would be an approach. But, as you mention, a lot of vocabularies are used in data that are not controlled/published by the publisher of the data. If data and vocabulary use a different approach for handling versions, interoperability decreases. * Lewis says: "… the DWBP WG is taking a data centric view of data versioning meaning that a protocol which defines the data version would be more part of the BP relating to Follow REST principles when designing APIs. I think we need to be aware of the differences between something like Memento (a specification and protocol for accessing resources) and best practice of publishing versioning information alongside dataset which are to be published on to the Web.": This is a very good point, and goes straight to the two possible perspectives one can take on Memento in the context of this discussion: - The Memento protocol, RFC 7089, is actually a RESTful "API" to access temporal resource versions. API between quotes because it's actually not an API, it's just a straightforward extension of HTTP with datetime negotiation, a feature that Tim Berners-Lee suggested ages ago [5] but was never specified. The protocol offers TimeGates (datetime negotiation to access a single temporal version) and TimeMaps (access to a temporal resource version history) as version access mechanisms. Obviously, instead of having a multitude of APIs to access temporal versions and version information, I would much prefer a world in which this were uniformly done using the Memento protocol ;-) The uniform "API" exists and our experience with the TimeGate server [6] shows that it is typically straightforward to implement Memento support in cases where a bespoke version API exists. - Aspects of the Memento protocol can be used to publish resource version information without actually fully implementing the protocol. This is the bit that I shared initially and that is described in [7]. [1] http://mementoweb.org/guide/rfc/ [2] http://mementoweb.org/guide/quick-intro/ [3] http://www.mediawiki.org/wiki/Extension:Memento [4] http://www.mediawiki.org/wiki/Extension:MementoHeaders [5] http://www.w3.org/DesignIssues/Generic.html [6] https://github.com/mementoweb/timegate [7] http://mementoweb.org/guide/howto/ On Wed, Mar 18, 2015 at 4:19 PM, Mcgibbney, Lewis J (398M) <Lewis.J.Mcgibbney@jpl.nasa.gov> wrote: > Hi Herbert, > >> >>(1) vocabulary versioning >> >>The Memento-related comments I made about Data Versioning apply >>equally to Vocabulary Versioning. All approaches described in >><http://mementoweb.org/guide/howto/> apply to data and vocabulary. As >>a matter of fact, when implementing Memento protocol support for both >>data and vocabularies used in data, temporal versions of the data can >>automatically be aligned with the temporally correct version of the >>used vocabulary. >> > > Right now the Best Practices document classifies Data Versioning as a > part-of/child-component within the Metadata parent topic. > This can be seen within the taxonomy provided within the BP document ToC > [0]. > To me there is a distinction to me made here which indicate that your > Momento-related comments do not necessarily apply equally to both Data > versioning and Vocab versioning. The Momentos themselves e.g. The > instances of archived versions of web resources could provide a > Memento-Datetime which may be different from that published within and > unique to the dataset. > We need complete and utter clarification on this topic, however AFAICT the > DWBP WG is taking a data centric view of data versioning meaning that a > protocol which defines the data version would be more part of the BP > relating to ³Follow REST principles when designing APIs² [1]. > I think we need to be aware of the differences between something like > Memento (a specification and protocol for accessing resources) and best > practice of publishing versioning information alongside dataset which are > to be published on to the Web. > Thank you very much for your comments Herbert. > Working GroupŠ is it worth visiting some aspects of the data versioning > commentary and use cases at one of the forthcoming meetings? > Thanks > > [0] http://w3c.github.io/dwbp/bp.html#h-toc > [1] http://w3c.github.io/dwbp/bp.html#BulkAccess2 > [2] > -- Herbert Van de Sompel Digital Library Research & Prototyping Los Alamos National Laboratory, Research Library http://public.lanl.gov/herbertv/ ==
Received on Thursday, 19 March 2015 09:39:28 UTC