Re: vocabulary versioning and preservation

Hello Herbert,

thank you very much for your feedback. Your suggestion is contemplated 
in the Data on the Web Best Practices document[1].

In the  Best Practice 8: Provide versioning information, the "Possible 
Approach to Implementation" says " Use Memento to relate the different 
versions to each other. An example of this is DBpedia which has 
undergone several releases since its first publication and always uses 
the same URI for its resources. Every resource is de-referenced to the 
most up to date description available for it along with a link to 
preserved descriptions using the Memento protocol ([RFC7809]) for the 
Memento gateway of DBpedia."

Kind regards,
Caroline

[1] http://w3c.github.io/dwbp/bp.html#dataVersioning

On 19/03/15 06:39, Herbert Van de Sompel wrote:
> Dear all,
>
> Many thanks for your enthusiastic comments. Below, I respond to some
> of your comments/questions.
>
> Greetings
>
> Herbert Van de Sompel
>
> ==
>
> * I can only express my excitement that interest is expressed in
> reading the Memento protocol spec, RFC 7089. A handy HTML version is
> available at [1]. For a gentle introduction, see [2].
>
> * Regarding Ghislain's remark about "storing" versions:
>
> The Memento protocol has nothing to say about criteria used for
> deciding when a resource is effectively a new version. The Memento
> protocol comes into play once temporal resource versions have been
> created, irrespective of the underlying approach used to create them.
> Typical cases are:
>
> (a) In web archiving, a temporal version (snapshot) is created after a
> robot crawled a page and the resulting resources (the page and its
> embedded/linked resources) are ingested in a web archive. The crawling
> date will be the Memento-Datetime. It is the date time of the
> observation of the crawled web resources. Returning the "best" Memento
> for a specified datetime is typically done on the basis of the
> smallest delta between the specified datetime and a Memento-Datetime
> value. This is the approach in all web archives including Internet
> Archive.
>
> (b) In CMS, software versioning systems, etc., a temporal version is
> created subject to technical and editorial policies. The datetime of a
> new temporal version becomes its Memento-Datetime. With CMS etc., one
> typically knows the history of resource versions, i.e. one knows the
> temporal interval in which they were the "live" versions.  Because
> this history is known, returning the "best" Memento for a specified
> datetime is typically done by returning the version that was
> operational in the interval that includes the specified datetime, i.e.
> the version that is closest in the past to the specified datetime is
> returned. This approach is used e.g. in the Memento extensions for
> MediaWiki [3][4].
>
> * Regarding Ghislain's remark about URI syntax for Mementos:
>
> The Memento protocol does not require any special URI syntax for
> Mementos as everything is (according to REST and HATEOAS principles)
> based on HTTP headers, typed links, negotiation. However, the syntax
> style exemplified by
> <http://dbpedia.mementodepot.org/memento/20100316/http://dbpedia.org/page/DJ_Shadow>
> is rather widely used/supported by web archives although definitely
> not uniformly. The API associated with our Time Travel service
> <http://bit.ly/webtimetravel> also supports the syntax. But CMS etc.
> definitely do not use it.
>
> * Steven says: "Your note addresses archiving published data, but I
> also ask how an organization can assume best practices in publication
> if they do not yet have policies to retain that which is not yet
> decided to be published?" :
>
> I guess retention is a bit of a different beast and typically subject
> to a range of policies. There's also the question whether everything
> that is decided to be retained is also public/published. Let's just
> say that, if an organization decides to retain resources in the public
> eye (i.e. publish them), the data principles apply. If the
> organization would already apply the data principles internally, prior
> to publishing, chances are high they would be in a better position to
> adhere to the principles when they publish.
>
> * Antoine says: "So in practice for the document I would be very happy
> to say that the versioned vocabulary could be published following the
> methods that are applied to the data itself. And count on the data
> versioning section to refer on Memento.":
>
> That would be an approach. But, as you mention, a lot of vocabularies
> are used in data that are not controlled/published by the publisher of
> the data. If data and vocabulary use a different approach for handling
> versions, interoperability decreases.
>
> * Lewis says: "… the DWBP WG is taking a data centric view of data
> versioning meaning that a protocol which defines the data version
> would be more part of the BP relating to Follow REST principles when
> designing APIs. I think we need to be aware of the differences between
> something like Memento (a specification and protocol for accessing
> resources) and best practice of publishing versioning information
> alongside dataset which are to be published on to the Web.":
>
> This is a very good point, and goes straight to the two possible
> perspectives one can take on Memento in the context of this
> discussion:
>
> - The Memento protocol, RFC 7089, is actually a RESTful "API" to
> access temporal resource versions. API between quotes because it's
> actually not an API, it's just a straightforward extension of HTTP
> with datetime negotiation, a feature that Tim Berners-Lee suggested
> ages ago [5] but was never specified. The protocol offers TimeGates
> (datetime negotiation to access a single temporal version) and
> TimeMaps (access to a temporal resource version history) as version
> access mechanisms. Obviously, instead of having a multitude of APIs to
> access temporal versions and version information, I would much prefer
> a world in which this were uniformly done using the Memento protocol
> ;-) The uniform "API" exists and our experience with the TimeGate
> server [6] shows that it is typically straightforward to implement
> Memento support in cases where a bespoke version API exists.
>
> - Aspects of the Memento protocol can be used to publish resource
> version information without actually fully implementing the protocol.
> This is the bit that I shared initially and that is described in [7].
>
> [1] http://mementoweb.org/guide/rfc/
> [2] http://mementoweb.org/guide/quick-intro/
> [3] http://www.mediawiki.org/wiki/Extension:Memento
> [4] http://www.mediawiki.org/wiki/Extension:MementoHeaders
> [5] http://www.w3.org/DesignIssues/Generic.html
> [6] https://github.com/mementoweb/timegate
> [7] http://mementoweb.org/guide/howto/
>
> On Wed, Mar 18, 2015 at 4:19 PM, Mcgibbney, Lewis J (398M)
> <Lewis.J.Mcgibbney@jpl.nasa.gov> wrote:
>> Hi Herbert,
>>
>>> (1) vocabulary versioning
>>>
>>> The Memento-related comments I made about Data Versioning apply
>>> equally to Vocabulary Versioning. All approaches described in
>>> <http://mementoweb.org/guide/howto/> apply to data and vocabulary. As
>>> a matter of fact, when implementing Memento protocol support for both
>>> data and vocabularies used in data, temporal versions of the data can
>>> automatically be aligned with the temporally correct version of the
>>> used vocabulary.
>>>
>> Right now the Best Practices document classifies Data Versioning as a
>> part-of/child-component within the Metadata parent topic.
>> This can be seen within the taxonomy provided within the BP document ToC
>> [0].
>> To me there is a distinction to me made here which indicate that your
>> Momento-related comments do not necessarily apply equally to both Data
>> versioning and Vocab versioning. The Momentos themselves e.g. The
>> instances of archived versions of web resources could provide a
>> Memento-Datetime which may be different from that published within and
>> unique to the dataset.
>> We need complete and utter clarification on this topic, however AFAICT the
>> DWBP WG is taking a data centric view of data versioning meaning that a
>> protocol which defines the data version would be more part of the BP
>> relating to ³Follow REST principles when designing APIs² [1].
>> I think we need to be aware of the differences between something like
>> Memento (a specification and protocol for accessing resources) and best
>> practice of publishing versioning information alongside dataset which are
>> to be published on to the Web.
>> Thank you very much for your comments Herbert.
>> Working GroupŠ is it worth visiting some aspects of the data versioning
>> commentary and use cases at one of the forthcoming meetings?
>> Thanks
>>
>> [0] http://w3c.github.io/dwbp/bp.html#h-toc
>> [1] http://w3c.github.io/dwbp/bp.html#BulkAccess2
>> [2]
>>
>
>

Received on Monday, 18 May 2015 20:56:28 UTC