Re: Versioning from Luiz Olavo Bonino on 2017-06-06 (public-dxwg-wg@w3.org from June 2017)

From: Luiz Olavo Bonino <luiz.bonino@dtls.nl>
Date: Tue, 6 Jun 2017 10:08:34 +0200
To: public-dxwg-wg@w3.org
Message-Id: <7904C74B-D44A-43C9-8EF4-A64AC234A515@dtls.nl>
Hi Phil, in my opinion we have two different aspects. The first, as you mentioned, refers to the process of deciding whether to create a new release/version (To some, even a minor correction to an error in a dataset is enough to call the result a new version. For others, it needs a very much larger scale change.) and I completely agree that it is very subjective and context related, and it is not our role to intervene. For this one could only provide guidelines, best practices, etc.

The second aspect is, once the decision has been made, how to properly represent the release/version in a commonly agreed way. My argument here is that currently each one informs the release/version in its own way, e.g., as part of the URL, as a field in the data record, as metadata, etc., making it difficult to data user to assess the correct releases/versions to be used.

In software engineering, configuration frameworks like Maven have defined a common way to express the software components’ versions to allow proper dependency management.

Best,


Luiz Olavo Bonino
CTO FAIR Data

Dutch Techcentre for Life Sciences
Visiting address: Catharijnesingel 54 | 3511 GC Utrecht
Postal address: Postbus 19245 | 3501 DE Utrecht

E-mail: luiz.bonino@dtls.nl <mailto:luiz.bonino@dtls.nl>
Mobile: +31 6 24 61 9131
Skype: luizolavobonino
Website: w <>ww.dtls.nl <>
> On 6 Jun 2017, at 09:51, Phil Archer <phila@w3.org> wrote:
> 
> Taking a standards design angle on this, we have a concept, versioning, that is understood in different ways by different people and in different circumstances. To some, even a minor correction to an error in a dataset is enough to call the result a new version. For others, it needs a very much larger scale change. Both positions can readily be justified, neither is right or wrong.
> 
> Do we think that 10 people will all agree on what is and isn't a version? I suggest not. And so the temptation is then to say, OK, we'll have a means of differentiating between "this is really still version X but we've corrected a few errors", and "this is a whole new version" with maybe several other types in between. OK - but then the problem becomes one of usability. Will users actually bother to choose the right term? Information managers care about metadata far more than anyone else.
> 
> I would think in terms of:
> 
> - a very basic model of versioning;
> - possible qualification/descriptions of how one version differs from another (which may be machine or human-only readable);
> - replaces/replacedBy relationships;
> - leaving it to profiles to provide more detailed semantics where necessary;
> - Following DWBP, especially
> -- https://www.w3.org/TR/dwbp/#dataVersioning <https://www.w3.org/TR/dwbp/#dataVersioning>
> -- https://www.w3.org/TR/dwbp/#VersionIdentifiers <https://www.w3.org/TR/dwbp/#VersionIdentifiers>
> 
> My 2 cents
> 
> Phil
> 
> On 05/06/2017 23:56, Rob Atkinson wrote:
>> +1 for articulating version Use Cases in more detail. We will consolidate -
>> and the current "high level" versioning UC is a placeholder for analysis of
>> requirements - for which we will need more detail.  For now  add new Use
>> Cases with detail and links specific to the specific case.
>> 
>> Dont assume we need ot handle this by, for example, extending DCAT or
>> adopting a specific vocabulary
>> 
>> I see several options:
>> 1) do nothing
>> 2) Add something to DCAT and hope it handles the versioning problem in
>> general
>> 3) Identify or create one or more versioning conceptual models and
>> associated vocabularies and recommend adoption
>> 4) provide guidance around use of microformats in existing or new DCAT
>> elements  (e.g. define semantics of 3.4.0 style version identifiers)
>> 5) provide guidance on how dataset relationships express version evolution
>> 6) other
>> 
>> The only reason to go into this (solution space) in any detail now is to
>> identify where external constraints should bleed into requirements - for
>> example a need to be semantically and/or syntactically compatible with
>> established version negotiation protocols (either at HTTP level or
>> something like OAI). UCs should express these interactions.
>> 
>> Could the people who have looked into this deeply please ensure any prior
>> work that can and should be referenced is called out in related Use Cases.
>> 
>> Rob Atkinson
>> 
>> 
>> 
>> On Tue, 6 Jun 2017 at 05:11 Luiz Olavo Bonino <luiz.bonino@dtls.nl> wrote:
>> 
>>> Dear all, first sorry for not being able to attend again the call due to a
>>> national holiday here in the Netherlands. This discussion on versioning
>>> interests me a lot. I have been struggling with DCAT lack of support for
>>> this and have been considering a conceptual model that could tackle it. For
>>> me the issue here is reproducibility in the sense that given a dataset
>>> identifier, we can expect the same set of data items to be retrieved.
>>> 
>>> In my opinion, conceptually, for the evolution cases Nandana mentioned,
>>> there is an entity named Dataset and periodically, there are Releases of
>>> this dataset, i.e., a Release is a concrete instance of a Dataset. This
>>> Release has some properties, including timestamp of the release and version
>>> number.
>>> 
>>> Of course, there are are types of datasets that are not susceptible to be
>>> released, i.e., retrieving multiple times the same dataset does not
>>> guarantee that the same set of data items will be retrieved. As Makx
>>> mentioned the continuously changing datasets like those from information
>>> systems, e.g., electronic health record data from hospitals. In this latter
>>> case we could only rely on the create or last modified timestamp of
>>> individual records.
>>> 
>>> Does this make sense to you?
>>> 
>>> 
>>> *Luiz Olavo Bonino*
>>> CTO FAIR Data
>>> 
>>> Dutch Techcentre for Life Sciences
>>> *V*isiting address: Catharijnesingel 54 | 3511 GC Utrecht
>>> *Postal address: *Postbus 19245 | 3501 DE Utrecht
>>> 
>>> E-mail: luiz.bonino@dtls.nl
>>> Mobile: +31 6 24 61 9131
>>> Skype: luizolavobonino
>>> Website: www.dtls.nl
>>> 
>>> On 5 Jun 2017, at 20:49, Karen Coyle <kcoyle@kcoyle.net> wrote:
>>> 
>>> Makx,
>>> 
>>> Thank you. I think that going deeper into the various meanings of
>>> versioning through additional use cases is a great idea. We can then
>>> discuss those as a group. (This reminds me of the publication patterns
>>> for serial publications - and like those it may be hard to cover every
>>> case.)
>>> 
>>> One aspect of versioning that may or may not be relevant but that I see
>>> in my field is "updates in place" - that is, databases or datasets in
>>> which updated records are included in the dataset, but there is no
>>> replacement of the entire dataset (although that can usually be
>>> requested). These require a call for "updates since ...", and there may
>>> not be any regularity to the update schedule. These types of datasets
>>> also require three types of updates: new, replace, delete.
>>> 
>>> Does anyone else have this case, and if so, are you able to create a use
>>> case for it?
>>> 
>>> Thanks,
>>> kc
>>> 
>>> On 6/5/17 9:44 AM, Makx Dekkers wrote:
>>> 
>>> Apologies for my slow reaction in the discussion today in the call on
>>> the versioning use case,
>>> 
>>> https://www.w3.org/2017/dxwg/wiki/Use_Case_Working_Space#Dataset_Versioning_Information
>>> .
>>> I was struggling with my connection and just managed to note in IRC that
>>> I didn’t agree with the use case. Disagreeing is not the right word but
>>> I felt that we maybe need to discuss first what we mean by ‘version’,
>>> because in my work over the years I have engaged in discussions where
>>> people didn’t have the same opinion on what we were talking about.
>>> 
>>> 
>>> 
>>> As I see it, there may be various types of ‘versioning’ relationships
>>> between datasets. For example:
>>> 
>>> 
>>> 
>>> * Evolution: for example, a dataset that is published with
>>>   year-to-date information; every week or month, new, recent data is
>>>   appended to the existing data.
>>> * Replacement: for example, existing data was wrong in some way, and a
>>>   new dataset is published that replaces the old data.
>>> * Snapshots: for example, continuously changing data like the state of
>>>   traffic or weather maps with hourly snapshots.
>>> * Time series: for example, annual budget data.
>>> * Conversion: for example, data that is transformed from one
>>>   coordinate system to another, or from one set of units to another;
>>>   similar to translation of textual resources.
>>> * Lower/higher granularity: for example, maps in different scales,
>>>   images in different resolutions, compression like MP3 versus CD
>>>   sound, and summaries of large amounts of data.
>>> 
>>> 
>>> 
>>> In my mind, the use case
>>> 
>>> https://www.w3.org/2017/dxwg/wiki/Use_Case_Working_Space#Dataset_Versioning_Information
>>> is a useful placeholder for a number of more specific cases that might
>>> have different requirements. I am pretty sure that some of those
>>> requirements could be satisfied by some explanatory text in the DCAT
>>> specification; some others might need addition of other properties (or
>>> even classes?) to DCAT.
>>> 
>>> 
>>> 
>>> I am planning to write some of this up in separate use cases over the
>>> next few weeks.
>>> 
>>> 
>>> 
>>> Makx.
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> --
>>> Karen Coyle
>>> kcoyle@kcoyle.net http://kcoyle.net
>>> m: 1-510-435-8234 (Signal)
>>> skype: kcoylenet/+1-510-984-3600 <+1%20510-984-3600>
>>> 
>>> 
>>> 
>> 
> 
> -- 
> 
> 
> Phil Archer
> Data Strategist, W3C
> http://www.w3.org/ <http://www.w3.org/>
> 
> http://philarcher.org <http://philarcher.org/>
> +44 (0)7887 767755
> @philarcher1
Received on Tuesday, 6 June 2017 08:09:12 UTC