Re: Data Quality Vocab for SDW from Antoine Isaac on 2016-05-05 (public-sdw-comments@w3.org from May 2016)

From: Antoine Isaac <aisaac@few.vu.nl>
Date: Thu, 5 May 2016 11:44:40 +0200
To: Andrea Perego <andrea.perego@jrc.ec.europa.eu>
CC: Riccardo Albertoni <albertoni@ge.imati.cnr.it>, "Heaven, Rachel E." <reh@bgs.ac.uk>, Phil Archer <phila@w3.org>, Linda van den Brink <l.vandenbrink@geonovum.nl>, "public-sdw-comments@w3.org" <public-sdw-comments@w3.org>
Message-ID: <572B1608.6040906@few.vu.nl>
Hi Andrea,

Thanks!
Riccardo has added the example about angular distance in the latest draft:
http://w3c.github.io/dwbp/vocab-dqg.html#ExpressDatasetAccuracyPrecision

Is it ok with you?

About non-numerical value. Your example is spot on, thanks for making the connection (it seems that you manage to fit these EARL tests in any discussion, now ;-) )
I would be strongly against option 2. Too complex.
Right now our design choices for DQV are clearly indicating that #3 (the one with tests results as annotation) is the prefered option. But still it is useful to know whether you would prefer to have #1 (allow dqv:value to be used with anything, even if one would loose interoperability with other vocabularies and tools, i.e. the ones based on Data Cube)

Cheers,

Antoine
On 01/05/16 22:56, Andrea Perego wrote:
> Hi, Antoine.
>
> On 28/04/2016 18:01, Antoine Isaac wrote:
>> [snip]
>>
>>>> :myDatasetPrecisionAS a dqv:QualityMeasurement ;
>>>>      dqv:isMeasurementOf :spatialResolutionAsAngularDistance ;
>>>>      dqv:value "[a fraction of degree]"^^xsd:decimal
>>>>      .
>>>
>>> I see that "degree" is one of the units of measure listed in
>>> wurvoc.org, so the example above might be re-written as follows:
>>>
>>> :myDatasetPrecisionAS a dqv:QualityMeasurement ;
>>>       dqv:isMeasurementOf :spatialResolutionAsAngularDistance ;
>>>       dqv:value "[a decimal degree]"^^xsd:decimal ;
>>>       sdmx-attribute:unitMeasure
>>> <http://www.wurvoc.org/vocabularies/om-1.8/degree> .
>>>
>>> Does this make sense?
>>
>> Absolutely!
>> Thanks for spotting the unit of measure.
>> I'm very much tempted to add this next to the already present example.
>
> +1 from me :)
>
>>>> :spatialResolutionAsALevelOfDetail a dqv:Metric;
>>>>      skos:definition "Spatial resolution of a dataset expressed as level
>>>> of detail"@en ;
>>>>      dqv:inDimension dqv:precision
>>>>      .
>>>> :myDatasetPrecisionLoD a dqv:QualityMeasurement ;
>>>>      dqv:isMeasurementOf :spatialResolutionAsALevelOfDetail ;
>>>>      dqv:value X .
>>>>      .
>>>>
>>>> Note that in the last example, X could be a string as you suggest by
>>>> using gco:CharacterString. It could also be an instance of skos:Concept
>>>> that denotes a level of detail (and this has a prefLabel that
>>>> corresponds to the string one would have expressed in the first way of
>>>> tackling the requirement). In the latter case then we're in a borderline
>>>> case where the value would make stronger the temptation to use
>>>> QualityAnnotation, as the observation is not really a (numerical)
>>>> measure, but something more conceptual (and possibly derived from a
>>>> numerical observation).
>>>
>>> Thanks, Antoine. This indeed clarifies the intended use of dqv:value.
>>>
>>> So, the range is not formally restricted to a literal (as in daq:value
>>> [1]), but this property is meant to be used with a "quantity", that
>>> can expressed in different ways (a number, free text, a URI reference).
>>>
>>> Is this correct?
>>
>> This is the tricky point. At this stage I'm not sure, and this is what
>> my confusing paragraph was trying to express.
>> At the beginning we were strongly convinced that dqv:value should work
>> with literals, and that Annotation should be used for the non-literal
>> quality assessment. I think this may be a condition to keep direct
>> compatibility with DataCube, which we're very keen on.
>> But in the meantime many people have expressed the will to have
>> 'measures' where the value space is made of resources.
>>
>> Do you have any opinion on this matter?
>
> I don't know if this is in scope or relevant, but I was just thinking of the case when the quality measurement fails for some reasons to evaluate a given metric. (This links to the other thread concerning how to express conformance levels [1]).
>
> Let's suppose that the expected datatype is a boolean. In case you would like to express this situation "true" / "false" would not be enough. I'm using as an example the case of EARL (discussed in another thread [2]), where the "outcome values" of a test are the following [3]:
>
> earl:passed
>    Passed - the subject passed the test.
> earl:failed
>    Failed - the subject failed the test.
> earl:cantTell
>    Cannot tell - it is unclear if the subject passed or failed the test.
> earl:inapplicable
>    Inapplicable - the test is not applicable to the subject.
> earl:untested
>    Untested - the test has not been carried out.
>
> As far as I can see, this scenario could be addressed in three possible ways:
>
> 1. Allowing dqv:value to be used not only with literals.
>
> 2. Adding another property to the quality measurement, which can be used to provide additional information on the measurement value. So, supposing that the metric was "inapplicable" to that specific resource, you would have dqv:value "false"^^xsd:boolean, plus a statement saying "why". However, this might not cover the EARL cases "can't tell" or "untested" - unless you deal with this by using values expressing three-valued logic (+1 = true, -1 = false, 0 = unknown).
>
> 3. Using a quality annotation to provide such additional information on the measurement value, and link the quality measurement with the annotation via prov:wasDerivedFrom.
>
>
> Andrea
>
> ----
> [1]http://lists.w3.org/Archives/Public/public-dwbp-wg/2016Mar/0035.html
> [2]http://lists.w3.org/Archives/Public/public-dwbp-wg/2016Jan/0008.html
> [3]https://www.w3.org/TR/EARL10/#OutcomeValue
Received on Thursday, 5 May 2016 09:45:10 UTC