RE: Comments on ID50

Rob,

 

You are of course right: fine-grained semantics around changes is really hard. 

 

My question really was how much of these fine-grained semantics can we expect to get in real life? Creating and maintaining metadata is expensive, unless there are good automated tools that can do those things. I am certainly not arguing that it is not a good idea to propose solutions to exposing changes, but we need to be realistic. 

 

Maybe we could think of expressing more coarse-grained semantics as a first step, e.g. adding information about the type of change – a property dcat:changeType to go with version information? This could then be expressed using a concept scheme (externally defined) with things like ‘data/observations added’, ‘data/observations corrected’, ‘compressed’, ‘encrypted’, ‘units converted’, etc.?

 

Makx.

 

 

 

From: Rob Atkinson [mailto:rob@metalinkage.com.au] 
Sent: 05 September 2017 15:04
To: Makx Dekkers <mail@makxdekkers.com>; Rob Atkinson <rob@metalinkage.com.au>; Jaroslav Pullmann <jaroslav.pullmann@fit.fraunhofer.de>; Peter.Winstanley@gov.scot; public-dxwg-wg@w3.org
Subject: Re: Comments on ID50

 

 

the point is that fine grained semantics around version change is problematic.  BTW the example does not relate olddistribution to newDistribution - there are not replacements - so poorly chosen labels. And i think that you have indeed identified the correct interpretation of my example - there has been a change in X - you now need to go to the version mechanisms to compare to find out what that change was... This was just an example however, not a concrete proposal for a solution.

 

Rob

 

 

 

 

On Tue, 5 Sep 2017 at 19:54 Makx Dekkers <mail@makxdekkers.com <mailto:mail@makxdekkers.com> > wrote:

Rob,

 

The example seems to indicate that you make an assertion in the metadata for the Dataset about a change in metadata for the Distribution:

 

my:dataset a dcat:Dataset ;

  dcat:Version "1.3.2" ;

  dcat:distribution my:oldDistribution ;  

  dcat:distribution my:newDistribution ;

  dcat:versionChangesOn dcat:distribution, rdfs:comment;

   rdfs:comment "XML with DTDs soooo old school so I added JSON-LD"

 

It seems to me that you are trying to say something like “in comparison to the metadata of my:oldDistribution, there is now a (new/modified?) rdfs:comment in the metadata for Distribution my:newDistribution”. However, in the RDF of the example, there is no way that you can explicitly say that the dcat:versionChangesOn applies to the metadata of newDistribution as compared to oldDistribution. That is even more complicated if you have more than two distributions, where different relationships may exist between different pairs.

 

I think it would be clearer in such a case to make assertions in the metadata for the distributions, e.g.

 

my:newDistribution a dcat:Distribution ;

  dct:isVersionOf my:oldDistribution ; 

  dcat:versionChangesOn rdfs:comment ;

  rdfs:comment "XML with DTDs soooo old school so I added JSON-LD" .

 

But also, taking one step back: how realistic is the use case that would require such fine-grained information about changes? I am not questioning that “it would be nice to have” but I wonder (a) would metadata creators (whether human or software) put in the effort to create such detailed information – given that it is already hard to get even a minimum set of reliable metadata for lots of datasets – and (b) what would metadata reusers (again human or software) do with the information? Do we know of any existing tools that could create or consume information at this level of detail – or even tools that are known to be stuck in the absence of this kind of detail?

 

And if there is really a need to expose fine-grained change, why not use PROV-O which would be the more general way to do this?

 

Makx.

 

 

From: Rob Atkinson [mailto:rob@metalinkage.com.au <mailto:rob@metalinkage.com.au> ] 
Sent: 05 September 2017 03:56
To: Jaroslav Pullmann <jaroslav.pullmann@fit.fraunhofer.de <mailto:jaroslav.pullmann@fit.fraunhofer.de> >; Peter.Winstanley@gov.scot <mailto:Peter.Winstanley@gov.scot> ; public-dxwg-wg@w3.org <mailto:public-dxwg-wg@w3.org> 
Subject: Re: Comments on ID50

 

 

I think fine-grained semantics of change is going to be very very hard to nail down as a cross-community standard.

 

*** Warning - you are entering solution space without life-support ***

 

How about a simpler idea of recording the properties that change between versions of a metadata record - i.e. if a distribution description has changed, that tells you what you need to know.

 

dcat:versionChangesOn 

 a rdfs:Property ;

rdfs:domain dcat:Dataset ;

rdfs:range rdfs:Property .

 

e.g. 

 

 

my:dataset a dcat:Dataset ;

  dcat:Version "1.3.1" ;

  dcat:distribution my:oldDistribution ;

  rdfs:comment "love XML with DTDs as the only valid distribution" ;

 

becomes (without trying to solve co-existence of versions - assume its somethign we can negotiate by graph or something)

 

my:dataset a dcat:Dataset ;

  dcat:Version "1.3.2" ;

  dcat:distribution my:oldDistribution ;  

  dcat:distribution my:newDistribution ;

  dcat:versionChangesOn dcat:distribution, rdfs:comment;

   rdfs:comment "XML with DTDs soooo old school so I added JSON-LD"

 

Thus the semantics of the change partly comes from the property that is recorded as changed (and this can be any property from any third party specialised vocab.)   A way of having a version-change model attached as well is just a special case of any fine grained semantics attached using any specific 3rd party vocabulary.

 

Variations would be a set of more specific changes:

 

 dcat:versionAddPropertyValue , dcat:versionDeletePropertyValue , dcat:versionCorrectPropertyValue, dcat:versionUpdatePropertyValue

 

or by reifying and annotating each change:

 

dcat:versionChangesOn [ dcat:changedProperty dcat:distributiuon, dcat:changeType dcat:Addition, rdfs:comment "updated to comply with DWBP guidelines" ] .

 

 Big benefit - only have to worry about classifying changes to DCAT - not all possible changes to datasets, yet still have a canonical means to address them

 

Rob

 

On Tue, 5 Sep 2017 at 01:45 Jaroslav Pullmann <jaroslav.pullmann@fit.fraunhofer.de <mailto:jaroslav.pullmann@fit.fraunhofer.de> > wrote:


   Dear Peter, dear all

     as said, the use case of distinguishing the type of Dataset/Distribution update makes sense to me,
    e.g. when it comes to the decision whether to notify a its clients of "substantial" changes or not.
    Within your  UC description there are samples of changes to content (deduplication) and Distribution
    (compression), the latter not related to content, i.e. Dataset part.

    Would you mind to generally consider "typology of change" applied to any DCAT resource (Catalog/Dataset/
    Distribution), where the "change to information content" is one of the change types?

    Each type has to be further specified with regards to affected dimensions (only), e.g. cutting down 5 past years
    of data series affects the temporal coverage resulting in a new coverage range (but not influencing the semantics).

   We might consider subclassing prov:Activity to model and define some generic types of change, examples
   given by Prov-O document among others are: "processing", "transforming", "modifying", "relocating".
   In case of  "altering" information content of Dataset we should define the conditions when this happens and
   on the contrary which Dataset updates do not lead to a change of this category.

    Best regards
   Jaroslav

--
Jaroslav Pullmann
Fraunhofer Institute for Applied Information Technology FIT
User-Centered Ubiquitous Computing
Schloss Birlinghoven | D-53757 Sankt Augustin | Germany
Phone: +49-2241-143620 <tel:+49%202241%20143620>  | Fax: +49-2241-142146 <tel:+49%202241%20142146> 

Received on Tuesday, 5 September 2017 13:30:57 UTC