Re: Versioning from Phil Archer on 2017-06-06 (public-dxwg-wg@w3.org from June 2017)

From: Phil Archer <phila@w3.org>
Date: Tue, 6 Jun 2017 08:51:51 +0100
To: public-dxwg-wg@w3.org
Message-ID: <258631b6-c608-9f07-eac0-1fda1fde690a@w3.org>
Taking a standards design angle on this, we have a concept, versioning, 
that is understood in different ways by different people and in 
different circumstances. To some, even a minor correction to an error in 
a dataset is enough to call the result a new version. For others, it 
needs a very much larger scale change. Both positions can readily be 
justified, neither is right or wrong.

Do we think that 10 people will all agree on what is and isn't a 
version? I suggest not. And so the temptation is then to say, OK, we'll 
have a means of differentiating between "this is really still version X 
but we've corrected a few errors", and "this is a whole new version" 
with maybe several other types in between. OK - but then the problem 
becomes one of usability. Will users actually bother to choose the right 
term? Information managers care about metadata far more than anyone else.

I would think in terms of:

- a very basic model of versioning;
- possible qualification/descriptions of how one version differs from 
another (which may be machine or human-only readable);
- replaces/replacedBy relationships;
- leaving it to profiles to provide more detailed semantics where necessary;
- Following DWBP, especially
-- https://www.w3.org/TR/dwbp/#dataVersioning
-- https://www.w3.org/TR/dwbp/#VersionIdentifiers

My 2 cents

Phil

On 05/06/2017 23:56, Rob Atkinson wrote:
> +1 for articulating version Use Cases in more detail. We will consolidate -
> and the current "high level" versioning UC is a placeholder for analysis of
> requirements - for which we will need more detail.  For now  add new Use
> Cases with detail and links specific to the specific case.
>
> Dont assume we need ot handle this by, for example, extending DCAT or
> adopting a specific vocabulary
>
> I see several options:
> 1) do nothing
> 2) Add something to DCAT and hope it handles the versioning problem in
> general
> 3) Identify or create one or more versioning conceptual models and
> associated vocabularies and recommend adoption
> 4) provide guidance around use of microformats in existing or new DCAT
> elements  (e.g. define semantics of 3.4.0 style version identifiers)
> 5) provide guidance on how dataset relationships express version evolution
> 6) other
>
> The only reason to go into this (solution space) in any detail now is to
> identify where external constraints should bleed into requirements - for
> example a need to be semantically and/or syntactically compatible with
> established version negotiation protocols (either at HTTP level or
> something like OAI). UCs should express these interactions.
>
> Could the people who have looked into this deeply please ensure any prior
> work that can and should be referenced is called out in related Use Cases.
>
> Rob Atkinson
>
>
>
> On Tue, 6 Jun 2017 at 05:11 Luiz Olavo Bonino <luiz.bonino@dtls.nl> wrote:
>
>> Dear all, first sorry for not being able to attend again the call due to a
>> national holiday here in the Netherlands. This discussion on versioning
>> interests me a lot. I have been struggling with DCAT lack of support for
>> this and have been considering a conceptual model that could tackle it. For
>> me the issue here is reproducibility in the sense that given a dataset
>> identifier, we can expect the same set of data items to be retrieved.
>>
>> In my opinion, conceptually, for the evolution cases Nandana mentioned,
>> there is an entity named Dataset and periodically, there are Releases of
>> this dataset, i.e., a Release is a concrete instance of a Dataset. This
>> Release has some properties, including timestamp of the release and version
>> number.
>>
>> Of course, there are are types of datasets that are not susceptible to be
>> released, i.e., retrieving multiple times the same dataset does not
>> guarantee that the same set of data items will be retrieved. As Makx
>> mentioned the continuously changing datasets like those from information
>> systems, e.g., electronic health record data from hospitals. In this latter
>> case we could only rely on the create or last modified timestamp of
>> individual records.
>>
>> Does this make sense to you?
>>
>>
>> *Luiz Olavo Bonino*
>> CTO FAIR Data
>>
>> Dutch Techcentre for Life Sciences
>> *V*isiting address: Catharijnesingel 54 | 3511 GC Utrecht
>> *Postal address: *Postbus 19245 | 3501 DE Utrecht
>>
>> E-mail: luiz.bonino@dtls.nl
>> Mobile: +31 6 24 61 9131
>> Skype: luizolavobonino
>> Website: www.dtls.nl
>>
>> On 5 Jun 2017, at 20:49, Karen Coyle <kcoyle@kcoyle.net> wrote:
>>
>> Makx,
>>
>> Thank you. I think that going deeper into the various meanings of
>> versioning through additional use cases is a great idea. We can then
>> discuss those as a group. (This reminds me of the publication patterns
>> for serial publications - and like those it may be hard to cover every
>> case.)
>>
>> One aspect of versioning that may or may not be relevant but that I see
>> in my field is "updates in place" - that is, databases or datasets in
>> which updated records are included in the dataset, but there is no
>> replacement of the entire dataset (although that can usually be
>> requested). These require a call for "updates since ...", and there may
>> not be any regularity to the update schedule. These types of datasets
>> also require three types of updates: new, replace, delete.
>>
>> Does anyone else have this case, and if so, are you able to create a use
>> case for it?
>>
>> Thanks,
>> kc
>>
>> On 6/5/17 9:44 AM, Makx Dekkers wrote:
>>
>> Apologies for my slow reaction in the discussion today in the call on
>> the versioning use case,
>>
>> https://www.w3.org/2017/dxwg/wiki/Use_Case_Working_Space#Dataset_Versioning_Information
>> .
>> I was struggling with my connection and just managed to note in IRC that
>> I didn’t agree with the use case. Disagreeing is not the right word but
>> I felt that we maybe need to discuss first what we mean by ‘version’,
>> because in my work over the years I have engaged in discussions where
>> people didn’t have the same opinion on what we were talking about.
>>
>>
>>
>> As I see it, there may be various types of ‘versioning’ relationships
>> between datasets. For example:
>>
>>
>>
>>  * Evolution: for example, a dataset that is published with
>>    year-to-date information; every week or month, new, recent data is
>>    appended to the existing data.
>>  * Replacement: for example, existing data was wrong in some way, and a
>>    new dataset is published that replaces the old data.
>>  * Snapshots: for example, continuously changing data like the state of
>>    traffic or weather maps with hourly snapshots.
>>  * Time series: for example, annual budget data.
>>  * Conversion: for example, data that is transformed from one
>>    coordinate system to another, or from one set of units to another;
>>    similar to translation of textual resources.
>>  * Lower/higher granularity: for example, maps in different scales,
>>    images in different resolutions, compression like MP3 versus CD
>>    sound, and summaries of large amounts of data.
>>
>>
>>
>> In my mind, the use case
>>
>> https://www.w3.org/2017/dxwg/wiki/Use_Case_Working_Space#Dataset_Versioning_Information
>> is a useful placeholder for a number of more specific cases that might
>> have different requirements. I am pretty sure that some of those
>> requirements could be satisfied by some explanatory text in the DCAT
>> specification; some others might need addition of other properties (or
>> even classes?) to DCAT.
>>
>>
>>
>> I am planning to write some of this up in separate use cases over the
>> next few weeks.
>>
>>
>>
>> Makx.
>>
>>
>>
>>
>>
>>
>> --
>> Karen Coyle
>> kcoyle@kcoyle.net http://kcoyle.net
>> m: 1-510-435-8234 (Signal)
>> skype: kcoylenet/+1-510-984-3600 <+1%20510-984-3600>
>>
>>
>>
>

-- 


Phil Archer
Data Strategist, W3C
http://www.w3.org/

http://philarcher.org
+44 (0)7887 767755
@philarcher1
Received on Tuesday, 6 June 2017 07:51:41 UTC