W3C home > Mailing lists > Public > public-dxwg-wg@w3.org > October 2017

Re: Relating versions and UC47 (Define update method)

From: Rob Atkinson <rob@metalinkage.com.au>
Date: Tue, 10 Oct 2017 22:02:55 +0000
Message-ID: <CACfF9LzD3QZGmpM6ZaY2wcGwZtWFp0ufgadM6YJmb5n+ACaQQQ@mail.gmail.com>
To: kcoyle@kcoyle.net, andrea.perego@ec.europa.eu, rob@metalinkage.com.au, Simon.Cox@csiro.au
Cc: public-dxwg-wg@w3.org, makx@makxdekkers.com
Another way to look at versioning is via profile... i.e. a profile can
define a versioning approach for the profile.

Of course, there are many domains that might want to share a versioning
approach....

Given these requirements... we can dive into potential solution space for
the purposes of comparison with with existing requirements for profiles:

So if we have profiles being basically constraint sets, and each dcat
profile potentially inheriting many such constraint sets - then either we
declare all the constraints per dcat object (painful) or we have transitive
inheritance (declare a profile which in turn declares its inherited
constraints). These approaches are not mutually exclusive, but i think if
the requirement that a profile is available in a fully expressed form (all
transitive properties declared) then dereferencing a profile identifier
should be enough to understand what type of versioning is being used.

So, profiles can define a smallish number of common versioning models,
re-use these in different application domains.

So I think we have the "rich model" of versioning delegated to domains
covered, and we can focus on those few simple cases.  Having a requirement
for lexical ordering and identity comparison of version identifiers might
be one implication.

Rob


On Wed, 11 Oct 2017 at 02:48 Karen Coyle <kcoyle@kcoyle.net> wrote:

> All, thanks for this discussion. It is very useful. The DCAT-AP link is
> well worth a look. We haven't discussed what documentation might
> accompany DCAT 1.1 yet, but perhaps it would be helpful to consider
> that. It would give us a "parking lot" where we could place complex
> questions that we are stumbling on. Further on we could look at "parked"
> issues and see if they have a place in one of our deliverables or need
> some separate documentation.
>
> It appears that although "versioning" was the obvious lack in DCAT 1.0,
> it also is a particularly difficult space.
>
> 1. Does it make sense to continue this work now, or should we tackle
> some other requirements?
>
> 2. Would a small "version sub-group" help? That latter could come back
> to the full group with a proposal. Does anyone want to volunteer for that?
>
> kc
>
> On 10/10/17 1:14 AM, andrea.perego@ec.europa.eu wrote:
> > I also agree with Simon and Rob that we cannot be prescriptive about
> > what a "version" is and how it is identified.
> >
> >
> >
> > Restating Simon's point, I think we are dealing with a notion – as the
> > one of "dataset" – which is used with different meanings by different
> > communities - and they know exactly what a "version" is. Moreover, what
> > a "version" is also very much related to the data management policy /
> > workflow in place. And this affects how different versions of a dataset
> > are modelled.
> >
> >
> >
> > It might be useful to have a look at the discussion on this topic
> > carried out in the DCAT-AP WG, that highlighted quite a few different
> > perspectives – and coming up with an agreement turned out to be quite
> > problematic. This issue was further discussed during the work on the
> > implementation guidelines of DCAT-AP, and the result was not to define
> > what is or is not a version, but rather an explanation of different
> > possible ways of modelling it, based on implementation evidence. The
> > summary is available here:
> >
> >
> >
> > https://joinup.ec.europa.eu/release/dcat-ap-how-model-dataset-series
> >
> >
> >
> > As you can read there, we have examples where different versions of a
> > dataset are modelled with distributions, or as different datasets in a
> > series, possibly in combination with a statement saying which is the
> > previous / next version (by using dct:hasVersion / dct:isVersionOf,
> > respectively). And we have also to consider cases when datasets are
> > updated (on a regular or irregular basis) but the old versions are not
> > maintained (this frequently happens, e.g., for datasets updated daily).
> >
> >
> >
> > I think the lesson learnt in DCAT-AP is that what users are looking for
> is:
> >
> >
> >
> > 1. Having guidance on how to model dataset versions (i.e., with
> > different datasets, different distributions, etc.), based on evidence
> > from similar use cases / domains. This requirement mainly applies to
> > communities where the notion of dataset "version" is not established /
> > clearly defined.
> >
> >
> >
> > 2. Having clear information on which are the relevant terms (classes,
> > properties) in DCAT, and on how to use them. This requirement apply to
> > all users.
> >
> >
> >
> >
> >
> > About point (2), I take this opportunity to add a note here - also about
> > some of them that I'm not sure have been mentioned so far in our
> discussion:
> >
> >
> >
> > - dct:modified [1] and dct:accrualPeriodicity [2]: These properties
> > provide implicit information about a dataset version – especially when
> > combined with the issue and/or creation date –, that can be used also
> > when old versions are not maintained.
> >
> >
> >
> > - About the issue raised by Rob about previous/next/current version,
> > dct:hasVersion [3] and dct:isVersionOf [4] are actually meant to model
> > exactly previous / next versions. Moreover, there is also adms:prev [5]
> > and adms:next [6], plus adms:last [7] for the latest version (@Rob, I'm
> > not sure if with "current" version you actually mean this).
> >
> >
> >
> >
> >
> > Cheers,
> >
> >
> >
> > Andrea
> >
> >
> >
> > ----
> >
> > [1] http://dublincore.org/documents/dcmi-terms/#terms-modified
> >
> > [2] http://dublincore.org/documents/dcmi-terms/#terms-accrualPeriodicity
> >
> > [3] http://dublincore.org/documents/dcmi-terms/#terms-hasVersion
> >
> > [4] http://dublincore.org/documents/dcmi-terms/#terms-isVersionOf
> >
> > [5] https://www.w3.org/TR/vocab-adms/#adms-prev
> >
> > [6] https://www.w3.org/TR/vocab-adms/#adms-next
> >
> > [7] https://www.w3.org/TR/vocab-adms/#adms-last
> >
> >
> >
> > ----
> >
> > Andrea Perego, Ph.D.
> >
> > Scientific / Technical Project Officer
> >
> > European Commission DG JRC
> >
> > Directorate B - Growth and Innovation
> >
> > Unit B6 - Digital Economy
> >
> > Via E. Fermi, 2749 - TP 262
> >
> > 21027 Ispra VA, Italy
> >
> >
> >
> > https://ec.europa.eu/jrc/
> >
> >
> >
> > ----
> >
> > The views expressed are purely those of the writer and may
> >
> > not in any circumstances be regarded as stating an official
> >
> > position of the European Commission.
> >
> >
> >
> > *From:*Rob Atkinson [mailto:rob@metalinkage.com.au]
> > *Sent:* Tuesday, October 10, 2017 6:19 AM
> > *To:* Simon.Cox@csiro.au; kcoyle@kcoyle.net; public-dxwg-wg@w3.org
> > *Subject:* Re: Relating versions and UC47 (Define update method)
> >
> >
> >
> >
> >
> >
> >
> > +1  We cannot be prescriptive about what constitutes a version, nor how
> > a version identifier is represented.
> >
> >
> >
> > What we can be prescriptive about are how versions are identified - i.e.
> > the name of DCAT properties that refer to versions of a DCAT Dataset
> > description, the dataset described by this description and version of
> > DCAT Distribution.
> >
> >
> >
> > We can also require that identifiers are lexically comparable, so that
> > if A is lexically > B then the version denoted by A is later than the
> > version denoted by B. (and if A = B then version is the same)
> >
> >
> >
> > If a version designator is a URI, it could dereference to a "model" -
> > however DCAT profiles could use third party vocabularies to define
> > properties for such models, and have a simple string property in DCAT
> > core.
> >
> >
> >
> > We probably need special properties in DCAT to handle
> > "previous/next/current version" problems.
> >
> >
> >
> > Which leaves open whether we need another special property to indicate
> > the type of version, and a set of defined literals for common cases.
> >
> >
> >
> > Any statistics about change should be through a deferenceable version
> > model, defined by the application domain.
> >
> >
> >
> > <descends into solution space...>
> >
> >
> >
> > IMHO its important we have one consistent pattern for these types of
> > situations where we promote some special semantics to dcat properties,
> > but also want to use dcat Classes to act as subjects for discovery of
> > domain-specific properties.
> >
> >
> >
> > The pattern seems to be a combination of simple DataProperties for DCAT
> > core properties, and extension points using defined ObjectProperties
> > whose type is controlled by domain profiles. Such ObjectProperties may
> > be canonically defined in DCAT, or external vocabularies also defined by
> > domain profiles. Do we want a simple pattern:
> >
> >
> >
> > dcat:prop a owl:DataProperty
> >
> >
> >
> > dcat:propLink a owl:ObjectProperty
> >
> >
> >
> >
> >
> >
> >
> > Rob Atkinson
> >
> >
> >
> > On Tue, 10 Oct 2017 at 14:31 <Simon.Cox@csiro.au
> > <mailto:Simon.Cox@csiro.au>> wrote:
> >
> >     I'm trying to not get sucked into the versioning discussion, but
> >     feel the need to draw attention to this work from Research Data
> >     Alliance, who two years ago developed guidelines on a very closely
> >     related topic - citation of dynamic datasets - i.e. how to identify
> >     a particular state of a dataset that is being continuously updated.
> >     The main link is here
> >
> https://www.rd-alliance.org/group/data-citation-wg/outcomes/data-citation-recommendation.html
> >     and there is a longer paper here:
> >
> https://www.rd-alliance.org/system/files/documents/TCDL-RDA-Guidelines_160411.pdf
> >
> >     Seems to me that the notion of 'version' is usually a publisher's
> >     choice to assign a memorable identifier to a product, which may have
> >     many more intermediate changes from the last 'version'. Version
> >     control systems talk about 'tags' and 'releases' which are usually
> >     along a more-or-less continuous development path. Criteria for
> >     versions will vary depending on the application. There is no way we
> >     can be prescriptive on this, except for the requirement for
> >     transparency from the publisher, so perhaps the focus should be on a
> >     framework for enabling a publisher to describe their criteria, with
> >     the various concerns that apply.
> >
> >     The key concern of the RDA work was to support the retrieval of any
> >     previous state (though not necessarily instantaneously).
> >
> >     Simon
> >
> >     -----Original Message-----
> >     From: Karen Coyle [mailto:kcoyle@kcoyle.net <mailto:
> kcoyle@kcoyle.net>]
> >     Sent: Wednesday, 27 September, 2017 03:39
> >     To: public-dxwg-wg@w3.org <mailto:public-dxwg-wg@w3.org>
> >     Subject: Relating versions and UC47 (Define update method)
> >
> >     Here's a (much) more coherent statement of something I started to
> >     say during the meeting yesterday but didn't have my thoughts
> together.
> >
> >     I created use case 47[1] because I felt that there is an unspoken
> >     assumption behind the discussion of "versions" - which is that each
> >     version is a complete replacement for the previous one(s). That is
> >     how I read the statement about the version delta: "indicating the
> >     "type" of change (addition/removal/update of data etc.)"[2] The
> >     implied subject if that is a single dataset that has been changed.
> >     If that is the case, then we can use "version" in that way. However,
> >     there are other situations that are not captured by that definition
> >     but that will arise in practice.
> >
> >     The example I gave in use case 47 is one in which there is a master
> >     dataset, and that additions and changes to that dataset are issued
> >     in transaction files. A transaction file will have a newer date (or
> >     some other sequential numbering), but it is not a "version" of the
> >     master file; instead, it must be applied to the master file to
> >     create a new master file.
> >
> >     This is only one kind of update. There are also sequential datasets
> >     that may or may not be stand-alone. That is analogous to the issues
> >     of a serial publication. This may include periodic datasets like
> >     census information - each new census provides new information, but
> >     would we call a later census file a version of an earlier one?
> >
> >     Use case 44 [3] (Identification of versioned datasets and subsets)
> >     is also related to this question because it addresses the part/whole
> >     relationship between datasets. Use case 32 [4] (Relationships between
> >     datasets) has elements of this question as well, although it
> >     emphasizes the type of derivation or part/whole relationship.
> >
> >     It may be best to make a clear separation between versions of a
> >     dataset and related datasets that are not one-to-one replacements
> >     for another.
> >     If nothing else, our definition of versions needs to make clear what
> >     types of relationships are included in the declaration that one
> >     dataset is a version of another. This is what I mainly find to be
> >     missing.
> >
> >     kc
> >     [1] https://w3c.github.io/dxwg/ucr/#ID47
> >     [2]
> >
> https://lists.w3.org/Archives/Public/public-dxwg-wg/2017Sep/0051.html
> >     [3] https://w3c.github.io/dxwg/ucr/#ID44
> >     [4] https://w3c.github.io/dxwg/ucr/#ID32
> >
> >
> >     --
> >     Karen Coyle
> >     kcoyle@kcoyle.net <mailto:kcoyle@kcoyle.net> http://kcoyle.net
> >     m: 1-510-435-8234 (Signal)
> >     skype: kcoylenet/+1-510-984-3600 <+1%20510-984-3600>
> <tel:+1%20510-984-3600>
> >
>
> --
> Karen Coyle
> kcoyle@kcoyle.net http://kcoyle.net
> m: 1-510-435-8234 (Signal)
> skype: kcoylenet/+1-510-984-3600 <+1%20510-984-3600>
>
>
Received on Tuesday, 10 October 2017 22:04:23 UTC

This archive was generated by hypermail 2.3.1 : Sunday, 25 November 2018 21:36:42 UTC