Re: DQV, ISO 19115/19157 and GeoDCAT-AP - representing conformance levels from Antoine Isaac on 2016-03-08 (public-dwbp-wg@w3.org from March 2016)

From: Antoine Isaac <aisaac@few.vu.nl>
Date: Tue, 8 Mar 2016 10:18:17 +0100
To: Andrea Perego <andrea.perego@jrc.ec.europa.eu>
CC: "public-dwbp-wg@w3.org" <public-dwbp-wg@w3.org>
Message-ID: <56DE98D9.5020004@few.vu.nl>
Hi Andrea,

And now for the hardest part.

On 1/11/16 9:00 AM, Andrea Perego wrote:
>> 3. We have discussed your suggestions of introducing "not evaluated" and
>> "not conformant" as you suggested. We are convinced that adding "not
>> conformant" would be very useful [5].
>> However, if we implement it, are you aware of a property and a value
>> (URI) to build such statement in RDF?
>
> None I'm aware of.
>
> About this, I think the issue here is about the ability not only to express whether data are conformant or not with a given (quality) standard, but also how the conformance test has been done, by whom, and when.
>
> Besides this being the approach used in ISO, this information is important for a number of reasons.
>
> One of them is that, in many cases, quality control is not something is done once, but needs to be carried out on a regular basis. This happens, e.g., for datasets that are regularly updated (dynamic data).
>
> Another one is about the ability to control and verify how the quality check has been carried out, for instance in order to be able to reproduce it. Providing all the information needed to "reproduce an experiment" is common practice in scientific publications, and the same principles can be applied to data. And here we are also talking about transparency.
>
> A final example is that, in some cases, the final conformance result may be related to conformance tests carried out against more than one criterion - i.e., the final conformance result is determined by the aggregation of multiple conformance tests, each concerning a specific criterion. An example is also provided in the EARL specification, that, in its examples, is referring to WCAG, where conformance depends on a set of criteria ("general techniques") to be checked.
>
>
> Based on that, I see two levels for the specification of data quality:
>
>
> 1. The former is about dataset filtering in the discovery phase, where I might just want to get the datasets conformant with a given (quality) standard. In such cases, properties expressing just whether data are conformant (dct:conformsTo) or not (??) can do the job.
>
>
> 2. The latter concerns two classes of actors:
>
> (a) Who is managing data. This is about the ability to record the details of conformance tests and results, to use them in the data management workflow, and to expose to users the final results, possibly in an aggregated form (i.e., as in scenario #1).
>
> (b) Users who would like to contribute feedback/reviews on data quality. Note that such users can be also third-parties who are involved directly by data custodians for some reasons (e.g., this is the case of quality certificates, or peer-reviews of data).
>
> For these cases, EARL might be the appropriate tool in a SemWeb / LD context. Or, at least, EARL provides the vocabulary to specify the relevant information. This includes "outcome values" of test results that are not limited to conformant / not conformant (appropriate in scenario #1). Notably, EARL supports 5 possible outcome values (see http://www.w3.org/TR/EARL10-Schema/#OutcomeValue) - quoting:
>
> [[
> earl:passed
>    Passed - the subject passed the test.
> earl:failed
>    Failed - the subject failed the test.
> earl:cantTell
>    Cannot tell - it is unclear if the subject passed or failed the test.
> earl:inapplicable
>    Inapplicable - the test is not applicable to the subject.
> earl:untested
>    Untested - the test has not been carried out.
> ]]
>
> In this scenario, also values like "not evaluated" (earl:untested) are relevant, since they can be used internally to plan the quality checks.
>
>
> Based on what I see, DQV addresses all these users' scenarios, but it's unclear to me if it is able to encode conformance test results with the level of detail described above.
>


I'm tempted to start discussing in depth these matters, but I'm afraid I've got more general questions before - especially now that some time has passed and new things appeared.

- what is GeoDCAT-AP really using for degrees of conformance? Annex II.14 mentions that INSPIRE Registry maintains a URI set for them, and points to section 6. But the link given there in
http://inspire.ec.europa.eu/codelist/DegreeOfConformity does not work.

- we are about to add in DQV examples regarding quality policies (draft at [1]). Do you think this is closely related to the issues you raised here? Should we unify the patterns? At this stage I'd rather avoid the extra work, but I do have to check with you.

Cheers,

Antoine

[1] https://www.w3.org/community/odrl/wiki/W3C_Data_on_the_Web_Best_Practices_-_Data_Quality_Policy
Received on Tuesday, 8 March 2016 09:18:48 UTC