RE: Data quality and requirements - discussion for F2F?

As I am following this discussion, it occurred to me that maybe we could
look also at who will use any statements about  and what for. 

On one hand, there is quality-related information that is for human
consumption, e.g. things like the information provided at
http://www.legislation.gov.uk/help#aboutChangesToLeg and other FAQ items on
that page. Such information can be used by humans to take decisions about
whether they want to use the data. 

 

On the other hand, precise metrics may be used by programs to pre-select
collections of data, but in that case we need to understand maybe a little
bit more what kind of programs or applications would consume the metrics and
for what purpose.

 

It seems to me that maybe the human- focused information is a little easier
to define (e.g. using the legislation.gov.uk as a starting point). We could
start to define a small set of properties for those (either as text or using
some controlled vocabulary) and look at the metrics later on the basis of
existing applications that use quality metrics in practice. I agree that
metrics are not that easy to define, and probably also complex to use.

 

Makx

 

De: Debattista, Jeremy [mailto:Jeremy.Debattista@iais-extern.fraunhofer.de] 
Enviado el: jueves, 30 de octubre de 2014 11:11
Para: Bart van Leeuwen
CC: Public DWBP WG; Antoine Isaac
Asunto: Re: Data quality and requirements - discussion for F2F?

 

Hi Bart, Antoine 

 

I agree with both of you that defining a vocabulary based on metrics is
hard. From my work on data quality, I realised that different domains, use
cases etc might require different metrics. Of course, there are those
metrics that would be suitable for most of the use cases. What I found
useful was to define how quality metadata should be represented at an
abstract level [1]. Then based on this abstract ontology, we defined a
number of quality metrics [2], some of which might be similar to those
extracted from the DWBP use cases. On the whole, my opinion is that we have
to provide a pragmatic solution that would be suitable for everyone within
the community, i.e. in the future other interested parties should be able to
define quality metrics that can be easily interoperable with other defined
quality metrics.

 

I would gladly join the F2F discussion remotely, if it won't be after 10pm
(CET) :).

 

Cheers,

Jer

 

 

[1]
https://raw.githubusercontent.com/EIS-Bonn/Luzzu/master/luzzu-semantics/src/
main/resources/vocabularies/daq/daq.trig

[2]
https://raw.githubusercontent.com/diachron/quality/luzzu-integration/src/mai
n/resources/vocabularies/dqm/dqm.trig

 

On 29 Oct 2014, at 17:17, Bart van Leeuwen <bart_van_leeuwen@netage.nl
<mailto:bart_van_leeuwen@netage.nl> > wrote:





Hi Antoine, 

Last night I had a conversation with Bernadette on this topic which ended up
in a nice discussion. 
I'm on the same page with you that I think the Quality vocabulary is rather
hard to define if we will focus on metrics. 

I Hope we have some good amount of time during the F2F to discuss it. 

Met Vriendelijke Groet / With Kind Regards
Bart van Leeuwen 

##############################################################
# twitter: @semanticfire 
# netage.nl <http://netage.nl> 
#  <http://netage.nl/> http://netage.nl
# Enschedepad 76
# 1324 GJ Almere
# The Netherlands
# tel. +31(0)36-5347479
############################################################## 



From:        Antoine Isaac <aisaac@few.vu.nl <mailto:aisaac@few.vu.nl> > 
To:        Public DWBP WG <public-dwbp-wg@w3.org
<mailto:public-dwbp-wg@w3.org> > 
Date:        29-10-2014 17:07 
Subject:        Data quality and requirements - discussion for F2F? 

  _____  




Dear all,

As a preparation to the F2F discussions on vocabularies, I have checked the
latest version of the UCR document [1]. The progress that has been made on
describing use cases and identifying requirements is impressive.
In particular, it is great the categorization of requirements to identify
requirements most important for our vocabulary work, including the one on
quality and granularity [2].

Yet, I am still not sure of the scoping of the quality vocabulary. I've
looked at all requirements, one could say that many could impact the scope
of a vocabulary to be used to document quality. Some thoughts are on a new
wiki page [3]. I admittedly played the devil's advocate there, i.e. I was
very liberal when judging a requirement could impact quality and
granularity. But in fact when looking at what various UCs have to say about
quality, I am wondering whether I am the only one confused! I have compiled
a list of quotes from the UC descriptions [3], which shows that considering
all contributors, a very wide definition of quality is still on order.

My wish for the F2F discussion would be that the group spend some time going
through the requirements, and discuss whether they should be in scope of the
vocabulary.
Or to put it in other words, decide whether the vocabulary should include
elements for documenting whether a dataset meet the considered requirements,
ie., there is metadata for data re-users to understand the performance of
the dataset against the requirements the group has identified.

A reminder, all kind of pointers for the quality work are gathered at [4].
Including first vocabulary design by Phil.

Best regards,

Antoine

[1]  <http://www.w3.org/TR/2014/WD-dwbp-ucr-20141014/>
http://www.w3.org/TR/2014/WD-dwbp-ucr-20141014/
[2]
<http://www.w3.org/TR/dwbp-ucr/#requirements-for-quality-and-granularity-des
cription-vocabulary>
http://www.w3.org/TR/dwbp-ucr/#requirements-for-quality-and-granularity-desc
ription-vocabulary
[3]  <https://www.w3.org/2013/dwbp/wiki/UCRs_and_Quality>
https://www.w3.org/2013/dwbp/wiki/UCRs_and_Quality
[4]  <https://www.w3.org/2013/dwbp/wiki/Data_quality_notes>
https://www.w3.org/2013/dwbp/wiki/Data_quality_notes



 

Received on Thursday, 30 October 2014 11:59:33 UTC