Data Quality and Granularity vocabulary - preliminary report from Antoine Isaac on 2015-03-29 (public-dwbp-wg@w3.org from March 2015)

From: Antoine Isaac <aisaac@few.vu.nl>
Date: Sun, 29 Mar 2015 16:53:52 +0200
To: Public DWBP WG <public-dwbp-wg@w3.org>
Message-ID: <55181200.5040606@few.vu.nl>

Dear all,

Here's a report on the work for Quality and Granularity (Q&G) vocabulary

We have started extracting requirements from the best practices:
https://www.w3.org/2013/dwbp/wiki/Requirements_From_FPWD_BP

There are three categories: "infrastructure" requirements, requirements on the process of designing and publishing the Q&G voc, and finally requirements that tell us which kind of information needs the Q&G voc should answer.

The last category is the most important to for us now, as it will dictate which classes and properties we should have in the Q&G voc.
However, we feel we have little material to work on. Riccardo has done a great work identifying 'competency questions' (which match the idea of 'concrete requirements' in our schedule at [1]). But he had to be very 'creative' - most of the question come from him, not from the best practices in the WD.

A second stream of work is the extraction of anything relevant for Q&G from our document on Use Cases and Requirements:
https://www.w3.org/2013/dwbp/wiki/Quality_Requirements_From_UCR

Two main results here:
- trying to assess which requirements should be in scope for the Q&G work
https://www.w3.org/2013/dwbp/wiki/Requirements_In_Scope_For_Quality

- extracting the relevant Q&G stuff from the descriptions of Use Cases
https://www.w3.org/2013/dwbp/wiki/Quality_Aspects_In_Use_Cases

This work raises first a scoping issue. The UCR WD lists a handful of requirements as relevant for Q&G. But the owners of use cases tend to relate Q&G to a much broader set of aspects. The biggest question here is whether the Q&G voc should serve to express how well a dataset implements some best practices in our BP WD. If yes, then the Q&G voc will have to cover a wide set of competency questions.

The second issue is whether the Q&G voc should enable expressing specific quality metrics. It is clear that Q&G voc will bring a framework to express metrics for data quality along specific quality dimensions (e.g. 'completeness'), and exchange the results for datasets for these metrics. But this doesn't say that the Q&G vocabulary should itself define the specific metrics (e.g. 'precision/recall of statements').

In fact the analysis of the Use Cases confirms the observation we made for the BPs: we have not much material to define competency questions for the Q&G voc.

So we have this idea of coming back to the use case owners with a questionnaire trying to extract more specific quality aspects than the ones we have in the Use Case WD.
The first (partial) draft is at
https://www.w3.org/2013/dwbp/wiki/QualityQuestionnaire

We are still following the original schedule [1] and we will send another report before this week's telecon.
However I expect the report will be very similar to this mail, unless the group gives some substantial feedback on the way for us to move forward, in the meantime!

Best,

Antoine, on behalf of Riccardo, Deirdre and Christophe.

[1] https://www.w3.org/2013/dwbp/wiki/Data_quality_schedule

Received on Sunday, 29 March 2015 14:54:22 UTC