W3C home > Mailing lists > Public > public-dwbp-comments@w3.org > October 2015

AW: Data Quality Vocabulary - feedback welcome!

From: Bailer, Werner <werner.bailer@joanneum.at>
Date: Wed, 28 Oct 2015 17:12:06 +0000
To: "public-dwbp-comments@w3.org" <public-dwbp-comments@w3.org>
CC: "Russegger, Silvia" <silvia.russegger@joanneum.at>, "Orgel, Thomas" <Thomas.Orgel@joanneum.at>, Höffernig, Martin <Martin.Hoeffernig@joanneum.at>, "Ehgarter, Stephan" <Stephan.Ehgarter@joanneum.at>
Message-ID: <bb44136f8c9d48f0a62f62d880c4908b@RZJMBX2.jr1.local>
Dear editors of the DQV spec,

many thanks for sharing your working draft. We have been using a recent draft version of the vocabulary in a project [1] that integrates and enriches metadata of cultural heritage objects in order to provide recommendations to users, contextualised by their current work. We perform metadata quality assessment of the records received from providers and after mapping to a common data model.

We found that DQV covers most of our requirements, and integrates smoothly with W3C PROV which is already part of our data model. The proposed DQV has been found to be compact and intuitive to use. 

There are however a few observation from the use of DQV that we'd like to share with the group.

1. daq:metric
a. multiple values of one metric

We found that there are metrics where a single output value may not be sufficient. In particular, this applies to statistics (which are listed as one dimension in the draft spec). For example, one may want to express the mean, min or max of a metric over a dataset, or providing an absolute and a relative (normalized) value for the same metric. Of course this could be done by defining multiple metrics, but then one would need a mechanism to group/link them or express their dependency. 

In the EBU, the working group on quality control [2] has defined a data model for the somewhat related problem of describing quality of audiovisual content (with XML serialisations so far, not RDF). This model supports multiple output values, that can be typed. For DQV, this could for example be achieved by having multiple values, and defining subproperties of daq:value.

b. parameters

For some metrics, input parameters could be required. E.g., there have been recent publications on metadata quality which use weights or target values in the metrics. For descriptions with quality measurements that are self-contained, it would be required to include the values of such parameters in the description of the metric.

c. daq:expectedDataType

This property from DAQ is defined to have range xsd:anySimpleType. While it seems useful to define the expected data type for a metric, a simple type may too narrow: in many cases a metric will be determined on a data record or a subgraph.

2. Dimensions and categories

The dimensions proposed seem quite high-level, so it is difficult to think of categories that are more general and group dimensions. In contrast, it seems in some cases desirable to have a level between dimensions and metrics. For example, we are dealing with assessing mapping quality. The metrics fall in the dimension of accuracy (i.e., does the output of the mapping process represent the object less accurately), and form a specific group there. To make the distinction of the different levels more confusing, the note in 7.3 Processability currently says "Level on the 5-star scale", which sounds more like a metric than a dimension (there could of course be metrics aggregating results from other metric, daq:requires could be used to express such a dependency). 
We are not sure if there is a strong need for categories, we would rather propose to consider nesting multiple levels of dimensions to allow grouping.

Maybe you have come across these points already in working on the model, and there are good reasons for not adopting the suggestions. In any case, we are happy to continue the discussion and provide feedback on future iterations of the model. 

Best regards,
Werner

[1] http://www.eexcess.eu
[2] https://tech.ebu.ch/groups/qc

-----Ursprüngliche Nachricht-----
Von: Discussion list for Europeana Technical Developments [mailto:EUROPEANA-TECH@LIST.ECOMPASS.NL] Im Auftrag von Antoine Isaac
Gesendet: Freitag, 14. August 2015 12:06
An: EUROPEANA-TECH@LIST.ECOMPASS.NL
Betreff: Data Quality Vocabulary - feedback welcome!

Dear all,

(apologies for possible cross-posting as this request is sent to a couple of lists with overlapping interests)

The W3C Data on the Web Best Practices Working Group has recently published a first draft for a Data Quality Vocabulary (DQV):
http://www.w3.org/TR/2015/WD-vocab-dqv-20150625/

Our goal is to facilitate the publication of data on the quality of datasets published on the Web. We hope such information will be made widely available, so that data re-users can be better guided in their selection of data sources.

Many people in the Europeana community have worked on metadata quality, defining criteria that guide the creation and exchange of data in order to facilitate later access and re-use of digital objects.

This new W3C work is not meant to reproduce all this work.  We have rather focused on a basic framework to allow people to express the quality observations that are relevant for them, using specific vocabularies defined as DQV extensions or refinements. We consider that such a common framework would make varied quality assessments better comparable - and usable.

This draft is still open for comments, from editorial remarks to deeper criticism. The group is very keen on receiving any form of 'feedback from the market', before developing this vocabulary further.

Comments can be either posted here, or sent to the dedicated W3C mailing list public-dwbp-comments@w3.org.
The three co-editors (cc'ed) are also willing to receive personal emails, should you not be sure that your feedback is fit for open publication!

Kind regards,

Antoine Isaac
--
R&D Manager, Europeana.eu
Received on Wednesday, 28 October 2015 17:12:37 UTC

This archive was generated by hypermail 2.3.1 : Wednesday, 28 October 2015 17:12:38 UTC