- From: Karen Coyle <kcoyle@kcoyle.net>
- Date: Tue, 10 Oct 2017 08:48:29 -0700
- To: andrea.perego@ec.europa.eu, rob@metalinkage.com.au, Simon.Cox@csiro.au
- Cc: public-dxwg-wg@w3.org, makx@makxdekkers.com
All, thanks for this discussion. It is very useful. The DCAT-AP link is well worth a look. We haven't discussed what documentation might accompany DCAT 1.1 yet, but perhaps it would be helpful to consider that. It would give us a "parking lot" where we could place complex questions that we are stumbling on. Further on we could look at "parked" issues and see if they have a place in one of our deliverables or need some separate documentation. It appears that although "versioning" was the obvious lack in DCAT 1.0, it also is a particularly difficult space. 1. Does it make sense to continue this work now, or should we tackle some other requirements? 2. Would a small "version sub-group" help? That latter could come back to the full group with a proposal. Does anyone want to volunteer for that? kc On 10/10/17 1:14 AM, andrea.perego@ec.europa.eu wrote: > I also agree with Simon and Rob that we cannot be prescriptive about > what a "version" is and how it is identified. > > > > Restating Simon's point, I think we are dealing with a notion – as the > one of "dataset" – which is used with different meanings by different > communities - and they know exactly what a "version" is. Moreover, what > a "version" is also very much related to the data management policy / > workflow in place. And this affects how different versions of a dataset > are modelled. > > > > It might be useful to have a look at the discussion on this topic > carried out in the DCAT-AP WG, that highlighted quite a few different > perspectives – and coming up with an agreement turned out to be quite > problematic. This issue was further discussed during the work on the > implementation guidelines of DCAT-AP, and the result was not to define > what is or is not a version, but rather an explanation of different > possible ways of modelling it, based on implementation evidence. The > summary is available here: > > > > https://joinup.ec.europa.eu/release/dcat-ap-how-model-dataset-series > > > > As you can read there, we have examples where different versions of a > dataset are modelled with distributions, or as different datasets in a > series, possibly in combination with a statement saying which is the > previous / next version (by using dct:hasVersion / dct:isVersionOf, > respectively). And we have also to consider cases when datasets are > updated (on a regular or irregular basis) but the old versions are not > maintained (this frequently happens, e.g., for datasets updated daily). > > > > I think the lesson learnt in DCAT-AP is that what users are looking for is: > > > > 1. Having guidance on how to model dataset versions (i.e., with > different datasets, different distributions, etc.), based on evidence > from similar use cases / domains. This requirement mainly applies to > communities where the notion of dataset "version" is not established / > clearly defined. > > > > 2. Having clear information on which are the relevant terms (classes, > properties) in DCAT, and on how to use them. This requirement apply to > all users. > > > > > > About point (2), I take this opportunity to add a note here - also about > some of them that I'm not sure have been mentioned so far in our discussion: > > > > - dct:modified [1] and dct:accrualPeriodicity [2]: These properties > provide implicit information about a dataset version – especially when > combined with the issue and/or creation date –, that can be used also > when old versions are not maintained. > > > > - About the issue raised by Rob about previous/next/current version, > dct:hasVersion [3] and dct:isVersionOf [4] are actually meant to model > exactly previous / next versions. Moreover, there is also adms:prev [5] > and adms:next [6], plus adms:last [7] for the latest version (@Rob, I'm > not sure if with "current" version you actually mean this). > > > > > > Cheers, > > > > Andrea > > > > ---- > > [1] http://dublincore.org/documents/dcmi-terms/#terms-modified > > [2] http://dublincore.org/documents/dcmi-terms/#terms-accrualPeriodicity > > [3] http://dublincore.org/documents/dcmi-terms/#terms-hasVersion > > [4] http://dublincore.org/documents/dcmi-terms/#terms-isVersionOf > > [5] https://www.w3.org/TR/vocab-adms/#adms-prev > > [6] https://www.w3.org/TR/vocab-adms/#adms-next > > [7] https://www.w3.org/TR/vocab-adms/#adms-last > > > > ---- > > Andrea Perego, Ph.D. > > Scientific / Technical Project Officer > > European Commission DG JRC > > Directorate B - Growth and Innovation > > Unit B6 - Digital Economy > > Via E. Fermi, 2749 - TP 262 > > 21027 Ispra VA, Italy > > > > https://ec.europa.eu/jrc/ > > > > ---- > > The views expressed are purely those of the writer and may > > not in any circumstances be regarded as stating an official > > position of the European Commission. > > > > *From:*Rob Atkinson [mailto:rob@metalinkage.com.au] > *Sent:* Tuesday, October 10, 2017 6:19 AM > *To:* Simon.Cox@csiro.au; kcoyle@kcoyle.net; public-dxwg-wg@w3.org > *Subject:* Re: Relating versions and UC47 (Define update method) > > > > > > > > +1 We cannot be prescriptive about what constitutes a version, nor how > a version identifier is represented. > > > > What we can be prescriptive about are how versions are identified - i.e. > the name of DCAT properties that refer to versions of a DCAT Dataset > description, the dataset described by this description and version of > DCAT Distribution. > > > > We can also require that identifiers are lexically comparable, so that > if A is lexically > B then the version denoted by A is later than the > version denoted by B. (and if A = B then version is the same) > > > > If a version designator is a URI, it could dereference to a "model" - > however DCAT profiles could use third party vocabularies to define > properties for such models, and have a simple string property in DCAT > core. > > > > We probably need special properties in DCAT to handle > "previous/next/current version" problems. > > > > Which leaves open whether we need another special property to indicate > the type of version, and a set of defined literals for common cases. > > > > Any statistics about change should be through a deferenceable version > model, defined by the application domain. > > > > <descends into solution space...> > > > > IMHO its important we have one consistent pattern for these types of > situations where we promote some special semantics to dcat properties, > but also want to use dcat Classes to act as subjects for discovery of > domain-specific properties. > > > > The pattern seems to be a combination of simple DataProperties for DCAT > core properties, and extension points using defined ObjectProperties > whose type is controlled by domain profiles. Such ObjectProperties may > be canonically defined in DCAT, or external vocabularies also defined by > domain profiles. Do we want a simple pattern: > > > > dcat:prop a owl:DataProperty > > > > dcat:propLink a owl:ObjectProperty > > > > > > > > Rob Atkinson > > > > On Tue, 10 Oct 2017 at 14:31 <Simon.Cox@csiro.au > <mailto:Simon.Cox@csiro.au>> wrote: > > I'm trying to not get sucked into the versioning discussion, but > feel the need to draw attention to this work from Research Data > Alliance, who two years ago developed guidelines on a very closely > related topic - citation of dynamic datasets - i.e. how to identify > a particular state of a dataset that is being continuously updated. > The main link is here > https://www.rd-alliance.org/group/data-citation-wg/outcomes/data-citation-recommendation.html > and there is a longer paper here: > https://www.rd-alliance.org/system/files/documents/TCDL-RDA-Guidelines_160411.pdf > > Seems to me that the notion of 'version' is usually a publisher's > choice to assign a memorable identifier to a product, which may have > many more intermediate changes from the last 'version'. Version > control systems talk about 'tags' and 'releases' which are usually > along a more-or-less continuous development path. Criteria for > versions will vary depending on the application. There is no way we > can be prescriptive on this, except for the requirement for > transparency from the publisher, so perhaps the focus should be on a > framework for enabling a publisher to describe their criteria, with > the various concerns that apply. > > The key concern of the RDA work was to support the retrieval of any > previous state (though not necessarily instantaneously). > > Simon > > -----Original Message----- > From: Karen Coyle [mailto:kcoyle@kcoyle.net <mailto:kcoyle@kcoyle.net>] > Sent: Wednesday, 27 September, 2017 03:39 > To: public-dxwg-wg@w3.org <mailto:public-dxwg-wg@w3.org> > Subject: Relating versions and UC47 (Define update method) > > Here's a (much) more coherent statement of something I started to > say during the meeting yesterday but didn't have my thoughts together. > > I created use case 47[1] because I felt that there is an unspoken > assumption behind the discussion of "versions" - which is that each > version is a complete replacement for the previous one(s). That is > how I read the statement about the version delta: "indicating the > "type" of change (addition/removal/update of data etc.)"[2] The > implied subject if that is a single dataset that has been changed. > If that is the case, then we can use "version" in that way. However, > there are other situations that are not captured by that definition > but that will arise in practice. > > The example I gave in use case 47 is one in which there is a master > dataset, and that additions and changes to that dataset are issued > in transaction files. A transaction file will have a newer date (or > some other sequential numbering), but it is not a "version" of the > master file; instead, it must be applied to the master file to > create a new master file. > > This is only one kind of update. There are also sequential datasets > that may or may not be stand-alone. That is analogous to the issues > of a serial publication. This may include periodic datasets like > census information - each new census provides new information, but > would we call a later census file a version of an earlier one? > > Use case 44 [3] (Identification of versioned datasets and subsets) > is also related to this question because it addresses the part/whole > relationship between datasets. Use case 32 [4] (Relationships between > datasets) has elements of this question as well, although it > emphasizes the type of derivation or part/whole relationship. > > It may be best to make a clear separation between versions of a > dataset and related datasets that are not one-to-one replacements > for another. > If nothing else, our definition of versions needs to make clear what > types of relationships are included in the declaration that one > dataset is a version of another. This is what I mainly find to be > missing. > > kc > [1] https://w3c.github.io/dxwg/ucr/#ID47 > [2] > https://lists.w3.org/Archives/Public/public-dxwg-wg/2017Sep/0051.html > [3] https://w3c.github.io/dxwg/ucr/#ID44 > [4] https://w3c.github.io/dxwg/ucr/#ID32 > > > -- > Karen Coyle > kcoyle@kcoyle.net <mailto:kcoyle@kcoyle.net> http://kcoyle.net > m: 1-510-435-8234 (Signal) > skype: kcoylenet/+1-510-984-3600 <tel:+1%20510-984-3600> > -- Karen Coyle kcoyle@kcoyle.net http://kcoyle.net m: 1-510-435-8234 (Signal) skype: kcoylenet/+1-510-984-3600
Received on Tuesday, 10 October 2017 15:48:57 UTC