W3C home > Mailing lists > Public > public-dxwg-wg@w3.org > June 2018

Re: [dxwg] Added a section to deal with quality and started some guidance for r…

From: Riccardo Albertoni via GitHub <sysbot+gh@w3.org>
Date: Wed, 06 Jun 2018 14:42:26 +0000
To: public-dxwg-wg@w3.org
Message-ID: <issue_comment.created-395093599-1528296145-sysbot+gh@w3.org>
> The examples you have developed all appear to attach quality information to DCAT resources with information that is external to the DCAT resource itself - with the URI for the dataset description as the object of an axiom. So this information would not actually be in the dcat:Catalog.

Dear @dr-shorthair, 
I am not sure to fully understand your remark, which seems to propose an additional requirement related to the DCAT self-containment, not explicit in the issue #57. Anyway, I sense that it implies more than one desiderata,  which I am listing below

1.  to have dcat element not only as an object of axioms;
2.  to have quality statement collected into a single container X;
3.  To have X expressable as a native DCAT element. 

As to 1, the dcat:Dataset/Distribution are connect to Measurements or  Annotations through  the properties `dqv:hasQualityMeasurement ` and `dqv:hasQualityAnnotation`.

As to 2,   it is possible to collect all the kind of quality information into  :myQualityMetadata.  :myQualityMetadata is an instance of dqv:QualityMetadata and collects all into the same graph, or in the same Turtle. 
We can related dcat:dataset/distribution to myQualityMetadata saying
`dcat:busStopInGenoa dqv:hasQualityMetadata :myQualityMetadata`

For example, assuming :myQualityMetadata is serialized in TRix, we can write the following 

:myQualityMetadata a dqv:QualityMetadata. 

GRAPH :myQualityMetadata {
    :busStopInGenoa a dcat:Dataset ;
        dqv:hasQualityAnnotation :qualityNote .

        a dqv:UserQualityFeedback ;
        oa:hasTarget :busStopInGenoa ;
        oa:hasBody :textBody ;
        oa:motivatedBy dqv:qualityAssessment ;
        prov:wasAttributedTo :consumer1 ;
        prov:generatedAtTime "2018-05-27T02:52:02Z"^^xsd:dateTime ;
        dqv:inDimension ldqd:completeness

    :textBody a oa:TextualBody ;
        rdf:value "Incomplete dataset: it contains only 20500 out of 30000 existing bus stops" ;
        dc:language "en" ; 
        dc:format "text/plain" 
        dqv:hasQualityMeasurement :myMeasurement .

        a dqv:QualityMeasurement ;
        dqv:computedOn :busStopInGenoa ;
        dqv:isMeasurementOf :completenessWRTExpectedNumberOfEntities ;
        dqv:value "0.6833333"^^xsd:decimal  ;
        prov:wasAttributedTo :myQualityChecker ;
        prov:generatedAtTime "2018-05-27T02:52:02Z"^^xsd:dateTime ;
        prov:wasGeneratedBy :myQualityChecking   

        a dqv:Metric ;
        skos:definition "it returns the degree of completeness as ratio between the actual number of entities included in the dataset and the declared expected number of entities."@en ;
        dqv:expectedDataType xsd:decimal ;
        dqv:inDimension ldqd:completeness .

As to 3, I do not see any reason why we cannot define a dcat:Distribution or a dcat:Dataset for cataloguing the data quality information serialized in:myQualityMetadata.

We have already dqv:hasQualityMetadata conneting  dcat:Dataset/Distribution to dqv:QualityMetadata
Another issue is whether or not we want more explicit ways to say this dcat:Dataset Y contains the quality data of dcat:Dataset X or to say Y has been derived by X in a quality assessment activity. 
 I tend to consider this a separate issue which might be influenced by the solutions chosen in to solve the Qualified forms [RQF] #79, Provenance information [RPIF] #76,  and the dcat core element restructuration. 

GitHub Notification of comment by riccardoAlbertoni
Please view or discuss this issue at https://github.com/w3c/dxwg/pull/245#issuecomment-395093599 using your GitHub account
Received on Wednesday, 6 June 2018 14:42:29 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 19:42:04 UTC