Re: Relating versions and UC47 (Define update method)

All, thanks for this discussion. It is very useful. The DCAT-AP link is
well worth a look. We haven't discussed what documentation might
accompany DCAT 1.1 yet, but perhaps it would be helpful to consider
that. It would give us a "parking lot" where we could place complex
questions that we are stumbling on. Further on we could look at "parked"
issues and see if they have a place in one of our deliverables or need
some separate documentation.

It appears that although "versioning" was the obvious lack in DCAT 1.0,
it also is a particularly difficult space.

1. Does it make sense to continue this work now, or should we tackle
some other requirements?

2. Would a small "version sub-group" help? That latter could come back
to the full group with a proposal. Does anyone want to volunteer for that?

kc

On 10/10/17 1:14 AM, andrea.perego@ec.europa.eu wrote:
> I also agree with Simon and Rob that we cannot be prescriptive about
> what a "version" is and how it is identified.
> 
>  
> 
> Restating Simon's point, I think we are dealing with a notion – as the
> one of "dataset" – which is used with different meanings by different
> communities - and they know exactly what a "version" is. Moreover, what
> a "version" is also very much related to the data management policy /
> workflow in place. And this affects how different versions of a dataset
> are modelled.
> 
>  
> 
> It might be useful to have a look at the discussion on this topic
> carried out in the DCAT-AP WG, that highlighted quite a few different
> perspectives – and coming up with an agreement turned out to be quite
> problematic. This issue was further discussed during the work on the
> implementation guidelines of DCAT-AP, and the result was not to define
> what is or is not a version, but rather an explanation of different
> possible ways of modelling it, based on implementation evidence. The
> summary is available here:
> 
>  
> 
> https://joinup.ec.europa.eu/release/dcat-ap-how-model-dataset-series
> 
>  
> 
> As you can read there, we have examples where different versions of a
> dataset are modelled with distributions, or as different datasets in a
> series, possibly in combination with a statement saying which is the
> previous / next version (by using dct:hasVersion / dct:isVersionOf,
> respectively). And we have also to consider cases when datasets are
> updated (on a regular or irregular basis) but the old versions are not
> maintained (this frequently happens, e.g., for datasets updated daily).
> 
>  
> 
> I think the lesson learnt in DCAT-AP is that what users are looking for is:
> 
>  
> 
> 1. Having guidance on how to model dataset versions (i.e., with
> different datasets, different distributions, etc.), based on evidence
> from similar use cases / domains. This requirement mainly applies to
> communities where the notion of dataset "version" is not established /
> clearly defined.
> 
>  
> 
> 2. Having clear information on which are the relevant terms (classes,
> properties) in DCAT, and on how to use them. This requirement apply to
> all users.
> 
>  
> 
>  
> 
> About point (2), I take this opportunity to add a note here - also about
> some of them that I'm not sure have been mentioned so far in our discussion:
> 
>  
> 
> - dct:modified [1] and dct:accrualPeriodicity [2]: These properties
> provide implicit information about a dataset version – especially when
> combined with the issue and/or creation date –, that can be used also
> when old versions are not maintained.
> 
>  
> 
> - About the issue raised by Rob about previous/next/current version,
> dct:hasVersion [3] and dct:isVersionOf [4] are actually meant to model
> exactly previous / next versions. Moreover, there is also adms:prev [5]
> and adms:next [6], plus adms:last [7] for the latest version (@Rob, I'm
> not sure if with "current" version you actually mean this).
> 
>  
> 
>  
> 
> Cheers,
> 
>  
> 
> Andrea
> 
>  
> 
> ----
> 
> [1] http://dublincore.org/documents/dcmi-terms/#terms-modified
> 
> [2] http://dublincore.org/documents/dcmi-terms/#terms-accrualPeriodicity
> 
> [3] http://dublincore.org/documents/dcmi-terms/#terms-hasVersion
> 
> [4] http://dublincore.org/documents/dcmi-terms/#terms-isVersionOf
> 
> [5] https://www.w3.org/TR/vocab-adms/#adms-prev
> 
> [6] https://www.w3.org/TR/vocab-adms/#adms-next
> 
> [7] https://www.w3.org/TR/vocab-adms/#adms-last
> 
>  
> 
> ----
> 
> Andrea Perego, Ph.D.
> 
> Scientific / Technical Project Officer
> 
> European Commission DG JRC
> 
> Directorate B - Growth and Innovation
> 
> Unit B6 - Digital Economy
> 
> Via E. Fermi, 2749 - TP 262
> 
> 21027 Ispra VA, Italy
> 
>  
> 
> https://ec.europa.eu/jrc/
> 
>  
> 
> ----
> 
> The views expressed are purely those of the writer and may
> 
> not in any circumstances be regarded as stating an official
> 
> position of the European Commission.
> 
>  
> 
> *From:*Rob Atkinson [mailto:rob@metalinkage.com.au]
> *Sent:* Tuesday, October 10, 2017 6:19 AM
> *To:* Simon.Cox@csiro.au; kcoyle@kcoyle.net; public-dxwg-wg@w3.org
> *Subject:* Re: Relating versions and UC47 (Define update method)
> 
>  
> 
>  
> 
>  
> 
> +1  We cannot be prescriptive about what constitutes a version, nor how
> a version identifier is represented.
> 
>  
> 
> What we can be prescriptive about are how versions are identified - i.e.
> the name of DCAT properties that refer to versions of a DCAT Dataset
> description, the dataset described by this description and version of
> DCAT Distribution.
> 
>  
> 
> We can also require that identifiers are lexically comparable, so that
> if A is lexically > B then the version denoted by A is later than the
> version denoted by B. (and if A = B then version is the same)
> 
>  
> 
> If a version designator is a URI, it could dereference to a "model" -
> however DCAT profiles could use third party vocabularies to define
> properties for such models, and have a simple string property in DCAT
> core.  
> 
>  
> 
> We probably need special properties in DCAT to handle
> "previous/next/current version" problems.  
> 
>  
> 
> Which leaves open whether we need another special property to indicate
> the type of version, and a set of defined literals for common cases.
> 
>  
> 
> Any statistics about change should be through a deferenceable version
> model, defined by the application domain.
> 
>  
> 
> <descends into solution space...>
> 
>  
> 
> IMHO its important we have one consistent pattern for these types of
> situations where we promote some special semantics to dcat properties,
> but also want to use dcat Classes to act as subjects for discovery of
> domain-specific properties.
> 
>  
> 
> The pattern seems to be a combination of simple DataProperties for DCAT
> core properties, and extension points using defined ObjectProperties
> whose type is controlled by domain profiles. Such ObjectProperties may
> be canonically defined in DCAT, or external vocabularies also defined by
> domain profiles. Do we want a simple pattern:
> 
>  
> 
> dcat:prop a owl:DataProperty
> 
>  
> 
> dcat:propLink a owl:ObjectProperty
> 
>  
> 
>  
> 
>  
> 
> Rob Atkinson
> 
>  
> 
> On Tue, 10 Oct 2017 at 14:31 <Simon.Cox@csiro.au
> <mailto:Simon.Cox@csiro.au>> wrote:
> 
>     I'm trying to not get sucked into the versioning discussion, but
>     feel the need to draw attention to this work from Research Data
>     Alliance, who two years ago developed guidelines on a very closely
>     related topic - citation of dynamic datasets - i.e. how to identify
>     a particular state of a dataset that is being continuously updated.
>     The main link is here
>     https://www.rd-alliance.org/group/data-citation-wg/outcomes/data-citation-recommendation.html 
>     and there is a longer paper here:
>     https://www.rd-alliance.org/system/files/documents/TCDL-RDA-Guidelines_160411.pdf
> 
>     Seems to me that the notion of 'version' is usually a publisher's
>     choice to assign a memorable identifier to a product, which may have
>     many more intermediate changes from the last 'version'. Version
>     control systems talk about 'tags' and 'releases' which are usually
>     along a more-or-less continuous development path. Criteria for
>     versions will vary depending on the application. There is no way we
>     can be prescriptive on this, except for the requirement for
>     transparency from the publisher, so perhaps the focus should be on a
>     framework for enabling a publisher to describe their criteria, with
>     the various concerns that apply.
> 
>     The key concern of the RDA work was to support the retrieval of any
>     previous state (though not necessarily instantaneously).
> 
>     Simon
> 
>     -----Original Message-----
>     From: Karen Coyle [mailto:kcoyle@kcoyle.net <mailto:kcoyle@kcoyle.net>]
>     Sent: Wednesday, 27 September, 2017 03:39
>     To: public-dxwg-wg@w3.org <mailto:public-dxwg-wg@w3.org>
>     Subject: Relating versions and UC47 (Define update method)
> 
>     Here's a (much) more coherent statement of something I started to
>     say during the meeting yesterday but didn't have my thoughts together.
> 
>     I created use case 47[1] because I felt that there is an unspoken
>     assumption behind the discussion of "versions" - which is that each
>     version is a complete replacement for the previous one(s). That is
>     how I read the statement about the version delta: "indicating the
>     "type" of change (addition/removal/update of data etc.)"[2] The
>     implied subject if that is a single dataset that has been changed.
>     If that is the case, then we can use "version" in that way. However,
>     there are other situations that are not captured by that definition
>     but that will arise in practice.
> 
>     The example I gave in use case 47 is one in which there is a master
>     dataset, and that additions and changes to that dataset are issued
>     in transaction files. A transaction file will have a newer date (or
>     some other sequential numbering), but it is not a "version" of the
>     master file; instead, it must be applied to the master file to
>     create a new master file.
> 
>     This is only one kind of update. There are also sequential datasets
>     that may or may not be stand-alone. That is analogous to the issues
>     of a serial publication. This may include periodic datasets like
>     census information - each new census provides new information, but
>     would we call a later census file a version of an earlier one?
> 
>     Use case 44 [3] (Identification of versioned datasets and subsets)
>     is also related to this question because it addresses the part/whole
>     relationship between datasets. Use case 32 [4] (Relationships between
>     datasets) has elements of this question as well, although it
>     emphasizes the type of derivation or part/whole relationship.
> 
>     It may be best to make a clear separation between versions of a
>     dataset and related datasets that are not one-to-one replacements
>     for another.
>     If nothing else, our definition of versions needs to make clear what
>     types of relationships are included in the declaration that one
>     dataset is a version of another. This is what I mainly find to be
>     missing.
> 
>     kc
>     [1] https://w3c.github.io/dxwg/ucr/#ID47
>     [2]
>     https://lists.w3.org/Archives/Public/public-dxwg-wg/2017Sep/0051.html
>     [3] https://w3c.github.io/dxwg/ucr/#ID44
>     [4] https://w3c.github.io/dxwg/ucr/#ID32
> 
> 
>     --
>     Karen Coyle
>     kcoyle@kcoyle.net <mailto:kcoyle@kcoyle.net> http://kcoyle.net
>     m: 1-510-435-8234 (Signal)
>     skype: kcoylenet/+1-510-984-3600 <tel:+1%20510-984-3600>
> 

-- 
Karen Coyle
kcoyle@kcoyle.net http://kcoyle.net
m: 1-510-435-8234 (Signal)
skype: kcoylenet/+1-510-984-3600

Received on Tuesday, 10 October 2017 15:48:57 UTC