Re: Data Quality Vocab for SDW from Andrea Perego on 2016-05-05 (public-sdw-comments@w3.org from May 2016)

From: Andrea Perego <andrea.perego@jrc.ec.europa.eu>
Date: Thu, 05 May 2016 12:31:24 +0200
To: Antoine Isaac <aisaac@few.vu.nl>
Cc: Riccardo Albertoni <albertoni@ge.imati.cnr.it>, "Heaven, Rachel E." <reh@bgs.ac.uk>, Phil Archer <phila@w3.org>, Linda van den Brink <l.vandenbrink@geonovum.nl>, "public-sdw-comments@w3.org" <public-sdw-comments@w3.org>
Message-id: <a0879cc8-258c-74a8-c7e7-6dd50af82bd4@jrc.ec.europa.eu>
Hi, Antoine.

On 05/05/2016 11:44, Antoine Isaac wrote:
> Hi Andrea,
>
> Thanks!
> Riccardo has added the example about angular distance in the latest draft:
> http://w3c.github.io/dwbp/vocab-dqg.html#ExpressDatasetAccuracyPrecision
>
> Is it ok with you?

Thanks, Antoine & Riccardo! +1 from me.

Just a comment on the first example in that section:


:myDatasetPrecision a dqv:QualityMeasurement ;
    dqv:isMeasurementOf :spatialResolutionAsDistanceInMetres ;
    dqv:value "1000"^^xsd:decimal ;
    sdmx-attribute:unitMeasure 
<http://www.wurvoc.org/vocabularies/om-1.8/metre>
    .

:spatialResolutionAsDistanceInMetres  a  dqv:Metric;
     skos:definition "Spatial resolution of a dataset expressed as 
distance"@en ;
     dqv:expectedDataType xsd:decimal ;
     dqv:inDimension dqv:precision


@Riccardo, as per our email conversation [1], I think "InMetres" should 
be dropped from the "name" of the dqv:Metric instance. So, 
:spatialResolutionAsDistanceInMetres should be just 
:spatialResolutionAsDistance:


:myDatasetPrecision a dqv:QualityMeasurement ;
    dqv:isMeasurementOf :spatialResolutionAsDistance ;
    dqv:value "1000"^^xsd:decimal ;
    sdmx-attribute:unitMeasure 
<http://www.wurvoc.org/vocabularies/om-1.8/metre>
    .

:spatialResolutionAsDistance  a  dqv:Metric;
     skos:definition "Spatial resolution of a dataset expressed as 
distance"@en ;
     dqv:expectedDataType xsd:decimal ;
     dqv:inDimension dqv:precision



> About non-numerical value. Your example is spot on, thanks for making
> the connection (it seems that you manage to fit these EARL tests in any
> discussion, now ;-) )

:)

> I would be strongly against option 2. Too complex.
> Right now our design choices for DQV are clearly indicating that #3 (the
> one with tests results as annotation) is the prefered option. But still
> it is useful to know whether you would prefer to have #1 (allow
> dqv:value to be used with anything, even if one would loose
> interoperability with other vocabularies and tools, i.e. the ones based
> on Data Cube)

Thanks, Antoine. Actually, I don't have at the moment any strong use 
cases that couldn't somehow be addressed with option (3) - which 
includes the "link" (prov:wasDerivedFrom) between the quality 
measurement and the quality annotation.

Cheers,

Andrea

----
[1]https://lists.w3.org/Archives/Public/public-sdw-comments/2016Apr/0002.html


> On 01/05/16 22:56, Andrea Perego wrote:
>> Hi, Antoine.
>>
>> On 28/04/2016 18:01, Antoine Isaac wrote:
>>> [snip]
>>>
>>>>> :myDatasetPrecisionAS a dqv:QualityMeasurement ;
>>>>>      dqv:isMeasurementOf :spatialResolutionAsAngularDistance ;
>>>>>      dqv:value "[a fraction of degree]"^^xsd:decimal
>>>>>      .
>>>>
>>>> I see that "degree" is one of the units of measure listed in
>>>> wurvoc.org, so the example above might be re-written as follows:
>>>>
>>>> :myDatasetPrecisionAS a dqv:QualityMeasurement ;
>>>>       dqv:isMeasurementOf :spatialResolutionAsAngularDistance ;
>>>>       dqv:value "[a decimal degree]"^^xsd:decimal ;
>>>>       sdmx-attribute:unitMeasure
>>>> <http://www.wurvoc.org/vocabularies/om-1.8/degree> .
>>>>
>>>> Does this make sense?
>>>
>>> Absolutely!
>>> Thanks for spotting the unit of measure.
>>> I'm very much tempted to add this next to the already present example.
>>
>> +1 from me :)
>>
>>>>> :spatialResolutionAsALevelOfDetail a dqv:Metric;
>>>>>      skos:definition "Spatial resolution of a dataset expressed as
>>>>> level
>>>>> of detail"@en ;
>>>>>      dqv:inDimension dqv:precision
>>>>>      .
>>>>> :myDatasetPrecisionLoD a dqv:QualityMeasurement ;
>>>>>      dqv:isMeasurementOf :spatialResolutionAsALevelOfDetail ;
>>>>>      dqv:value X .
>>>>>      .
>>>>>
>>>>> Note that in the last example, X could be a string as you suggest by
>>>>> using gco:CharacterString. It could also be an instance of
>>>>> skos:Concept
>>>>> that denotes a level of detail (and this has a prefLabel that
>>>>> corresponds to the string one would have expressed in the first way of
>>>>> tackling the requirement). In the latter case then we're in a
>>>>> borderline
>>>>> case where the value would make stronger the temptation to use
>>>>> QualityAnnotation, as the observation is not really a (numerical)
>>>>> measure, but something more conceptual (and possibly derived from a
>>>>> numerical observation).
>>>>
>>>> Thanks, Antoine. This indeed clarifies the intended use of dqv:value.
>>>>
>>>> So, the range is not formally restricted to a literal (as in daq:value
>>>> [1]), but this property is meant to be used with a "quantity", that
>>>> can expressed in different ways (a number, free text, a URI reference).
>>>>
>>>> Is this correct?
>>>
>>> This is the tricky point. At this stage I'm not sure, and this is what
>>> my confusing paragraph was trying to express.
>>> At the beginning we were strongly convinced that dqv:value should work
>>> with literals, and that Annotation should be used for the non-literal
>>> quality assessment. I think this may be a condition to keep direct
>>> compatibility with DataCube, which we're very keen on.
>>> But in the meantime many people have expressed the will to have
>>> 'measures' where the value space is made of resources.
>>>
>>> Do you have any opinion on this matter?
>>
>> I don't know if this is in scope or relevant, but I was just thinking
>> of the case when the quality measurement fails for some reasons to
>> evaluate a given metric. (This links to the other thread concerning
>> how to express conformance levels [1]).
>>
>> Let's suppose that the expected datatype is a boolean. In case you
>> would like to express this situation "true" / "false" would not be
>> enough. I'm using as an example the case of EARL (discussed in another
>> thread [2]), where the "outcome values" of a test are the following [3]:
>>
>> earl:passed
>>    Passed - the subject passed the test.
>> earl:failed
>>    Failed - the subject failed the test.
>> earl:cantTell
>>    Cannot tell - it is unclear if the subject passed or failed the test.
>> earl:inapplicable
>>    Inapplicable - the test is not applicable to the subject.
>> earl:untested
>>    Untested - the test has not been carried out.
>>
>> As far as I can see, this scenario could be addressed in three
>> possible ways:
>>
>> 1. Allowing dqv:value to be used not only with literals.
>>
>> 2. Adding another property to the quality measurement, which can be
>> used to provide additional information on the measurement value. So,
>> supposing that the metric was "inapplicable" to that specific
>> resource, you would have dqv:value "false"^^xsd:boolean, plus a
>> statement saying "why". However, this might not cover the EARL cases
>> "can't tell" or "untested" - unless you deal with this by using values
>> expressing three-valued logic (+1 = true, -1 = false, 0 = unknown).
>>
>> 3. Using a quality annotation to provide such additional information
>> on the measurement value, and link the quality measurement with the
>> annotation via prov:wasDerivedFrom.
>>
>>
>> Andrea
>>
>> ----
>> [1]http://lists.w3.org/Archives/Public/public-dwbp-wg/2016Mar/0035.html
>> [2]http://lists.w3.org/Archives/Public/public-dwbp-wg/2016Jan/0008.html
>> [3]https://www.w3.org/TR/EARL10/#OutcomeValue

-- 
Andrea Perego, Ph.D.
Scientific / Technical Project Officer
European Commission DG JRC
Institute for Environment & Sustainability
Unit H06 - Digital Earth & Reference Data
Via E. Fermi, 2749 - TP 262
21027 Ispra VA, Italy

https://ec.europa.eu/jrc/
Received on Thursday, 5 May 2016 10:34:31 UTC