Re: Versioning

Dear all, first sorry for not being able to attend again the call due to a national holiday here in the Netherlands. This discussion on versioning interests me a lot. I have been struggling with DCAT lack of support for this and have been considering a conceptual model that could tackle it. For me the issue here is reproducibility in the sense that given a dataset identifier, we can expect the same set of data items to be retrieved.

In my opinion, conceptually, for the evolution cases Nandana mentioned, there is an entity named Dataset and periodically, there are Releases of this dataset, i.e., a Release is a concrete instance of a Dataset. This Release has some properties, including timestamp of the release and version number.

Of course, there are are types of datasets that are not susceptible to be released, i.e., retrieving multiple times the same dataset does not guarantee that the same set of data items will be retrieved. As Makx mentioned the continuously changing datasets like those from information systems, e.g., electronic health record data from hospitals. In this latter case we could only rely on the create or last modified timestamp of individual records.

Does this make sense to you?


Luiz Olavo Bonino
CTO FAIR Data

Dutch Techcentre for Life Sciences
Visiting address: Catharijnesingel 54 | 3511 GC Utrecht
Postal address: Postbus 19245 | 3501 DE Utrecht

E-mail: luiz.bonino@dtls.nl <mailto:luiz.bonino@dtls.nl>
Mobile: +31 6 24 61 9131
Skype: luizolavobonino
Website: w <>ww.dtls.nl <>
> On 5 Jun 2017, at 20:49, Karen Coyle <kcoyle@kcoyle.net> wrote:
> 
> Makx,
> 
> Thank you. I think that going deeper into the various meanings of
> versioning through additional use cases is a great idea. We can then
> discuss those as a group. (This reminds me of the publication patterns
> for serial publications - and like those it may be hard to cover every
> case.)
> 
> One aspect of versioning that may or may not be relevant but that I see
> in my field is "updates in place" - that is, databases or datasets in
> which updated records are included in the dataset, but there is no
> replacement of the entire dataset (although that can usually be
> requested). These require a call for "updates since ...", and there may
> not be any regularity to the update schedule. These types of datasets
> also require three types of updates: new, replace, delete.
> 
> Does anyone else have this case, and if so, are you able to create a use
> case for it?
> 
> Thanks,
> kc
> 
> On 6/5/17 9:44 AM, Makx Dekkers wrote:
>> Apologies for my slow reaction in the discussion today in the call on
>> the versioning use case,
>> https://www.w3.org/2017/dxwg/wiki/Use_Case_Working_Space#Dataset_Versioning_Information.
>> I was struggling with my connection and just managed to note in IRC that
>> I didn’t agree with the use case. Disagreeing is not the right word but
>> I felt that we maybe need to discuss first what we mean by ‘version’,
>> because in my work over the years I have engaged in discussions where
>> people didn’t have the same opinion on what we were talking about.
>> 
>> 
>> 
>> As I see it, there may be various types of ‘versioning’ relationships
>> between datasets. For example:
>> 
>> 
>> 
>>  * Evolution: for example, a dataset that is published with
>>    year-to-date information; every week or month, new, recent data is
>>    appended to the existing data.
>>  * Replacement: for example, existing data was wrong in some way, and a
>>    new dataset is published that replaces the old data.
>>  * Snapshots: for example, continuously changing data like the state of
>>    traffic or weather maps with hourly snapshots.
>>  * Time series: for example, annual budget data.
>>  * Conversion: for example, data that is transformed from one
>>    coordinate system to another, or from one set of units to another;
>>    similar to translation of textual resources.
>>  * Lower/higher granularity: for example, maps in different scales,
>>    images in different resolutions, compression like MP3 versus CD
>>    sound, and summaries of large amounts of data. 
>> 
>> 
>> 
>> In my mind, the use case
>> https://www.w3.org/2017/dxwg/wiki/Use_Case_Working_Space#Dataset_Versioning_Information
>> is a useful placeholder for a number of more specific cases that might
>> have different requirements. I am pretty sure that some of those
>> requirements could be satisfied by some explanatory text in the DCAT
>> specification; some others might need addition of other properties (or
>> even classes?) to DCAT.
>> 
>> 
>> 
>> I am planning to write some of this up in separate use cases over the
>> next few weeks.
>> 
>> 
>> 
>> Makx.
>> 
>> 
>> 
>> 
>> 
> 
> -- 
> Karen Coyle
> kcoyle@kcoyle.net http://kcoyle.net
> m: 1-510-435-8234 (Signal)
> skype: kcoylenet/+1-510-984-3600
> 

Received on Monday, 5 June 2017 19:11:53 UTC