Re: Data Quality Vocab for SDW

Hi, Antoine.

On 28/04/2016 18:01, Antoine Isaac wrote:
> [snip]
>
>>> :myDatasetPrecisionAS a dqv:QualityMeasurement ;
>>>      dqv:isMeasurementOf :spatialResolutionAsAngularDistance ;
>>>      dqv:value "[a fraction of degree]"^^xsd:decimal
>>>      .
>>
>> I see that "degree" is one of the units of measure listed in
>> wurvoc.org, so the example above might be re-written as follows:
>>
>> :myDatasetPrecisionAS a dqv:QualityMeasurement ;
>>       dqv:isMeasurementOf :spatialResolutionAsAngularDistance ;
>>       dqv:value "[a decimal degree]"^^xsd:decimal ;
>>       sdmx-attribute:unitMeasure
>> <http://www.wurvoc.org/vocabularies/om-1.8/degree> .
>>
>> Does this make sense?
>
> Absolutely!
> Thanks for spotting the unit of measure.
> I'm very much tempted to add this next to the already present example.

+1 from me :)

>>> :spatialResolutionAsALevelOfDetail a dqv:Metric;
>>>      skos:definition "Spatial resolution of a dataset expressed as level
>>> of detail"@en ;
>>>      dqv:inDimension dqv:precision
>>>      .
>>> :myDatasetPrecisionLoD a dqv:QualityMeasurement ;
>>>      dqv:isMeasurementOf :spatialResolutionAsALevelOfDetail ;
>>>      dqv:value X .
>>>      .
>>>
>>> Note that in the last example, X could be a string as you suggest by
>>> using gco:CharacterString. It could also be an instance of skos:Concept
>>> that denotes a level of detail (and this has a prefLabel that
>>> corresponds to the string one would have expressed in the first way of
>>> tackling the requirement). In the latter case then we're in a borderline
>>> case where the value would make stronger the temptation to use
>>> QualityAnnotation, as the observation is not really a (numerical)
>>> measure, but something more conceptual (and possibly derived from a
>>> numerical observation).
>>
>> Thanks, Antoine. This indeed clarifies the intended use of dqv:value.
>>
>> So, the range is not formally restricted to a literal (as in daq:value
>> [1]), but this property is meant to be used with a "quantity", that
>> can expressed in different ways (a number, free text, a URI reference).
>>
>> Is this correct?
>
> This is the tricky point. At this stage I'm not sure, and this is what
> my confusing paragraph was trying to express.
> At the beginning we were strongly convinced that dqv:value should work
> with literals, and that Annotation should be used for the non-literal
> quality assessment. I think this may be a condition to keep direct
> compatibility with DataCube, which we're very keen on.
> But in the meantime many people have expressed the will to have
> 'measures' where the value space is made of resources.
>
> Do you have any opinion on this matter?

I don't know if this is in scope or relevant, but I was just thinking of 
the case when the quality measurement fails for some reasons to evaluate 
a given metric. (This links to the other thread concerning how to 
express conformance levels [1]).

Let's suppose that the expected datatype is a boolean. In case you would 
like to express this situation "true" / "false" would not be enough. I'm 
using as an example the case of EARL (discussed in another thread [2]), 
where the "outcome values" of a test are the following [3]:

earl:passed
   Passed - the subject passed the test.
earl:failed
   Failed - the subject failed the test.
earl:cantTell
   Cannot tell - it is unclear if the subject passed or failed the test.
earl:inapplicable
   Inapplicable - the test is not applicable to the subject.
earl:untested
   Untested - the test has not been carried out.

As far as I can see, this scenario could be addressed in three possible 
ways:

1. Allowing dqv:value to be used not only with literals.

2. Adding another property to the quality measurement, which can be used 
to provide additional information on the measurement value. So, 
supposing that the metric was "inapplicable" to that specific resource, 
you would have dqv:value "false"^^xsd:boolean, plus a statement saying 
"why". However, this might not cover the EARL cases "can't tell" or 
"untested" - unless you deal with this by using values expressing 
three-valued logic (+1 = true, -1 = false, 0 = unknown).

3. Using a quality annotation to provide such additional information on 
the measurement value, and link the quality measurement with the 
annotation via prov:wasDerivedFrom.


Andrea

----
[1]http://lists.w3.org/Archives/Public/public-dwbp-wg/2016Mar/0035.html
[2]http://lists.w3.org/Archives/Public/public-dwbp-wg/2016Jan/0008.html
[3]https://www.w3.org/TR/EARL10/#OutcomeValue

Received on Sunday, 1 May 2016 20:57:15 UTC