Re: Data Q&G vocabulary - report and questions for F2F

Dear all,

let me share with you some of my thoughts hoping they might contribute in
the discussion.

1) Antoine has  mentioned  the following two scope issues

"Quality Vocabulary for express dataset compliance to Best practices" vs
"Quality vocabulary to express metrics for data quality"

 I think both are in scope and should be addressed.
I might change my mind after a proper discussion,  but in my opinion,

- the latter, "Quality vocabulary to express metrics for data quality",
 should be addressed by  providing  a RDF vocabulary the so called "Quality
Vocabulary".  I think   the quality vocabulary should be provided   by
revising, extending  the Jeremy's DAQ  ontology [1], which has been
mentioned by Carlos and other,  and by specializing some other W3C
ontologies.  For example, starting from DAQ and other W3C vocabulary, we
might
(a) doublecheck that  any kind of quality metrics   can be easily
represented and that the Quality vocabulary can be adopted as a mean to
exchange quality results;
(b) extend  the vocabulary, so that,  it can cover the competency questions
derived from requirement analysis ( e.g., my list of CQ from BP document
 [2] once the list has properly revised by the group);
(c) include other  quality representations besides metrics' results. Don't
get me wrong,  I am a big supporter of metrics, actually, in my own
research activity, I am trying to define new metrics for linkset quality (
e.g., [3]), but I suspect   not all the providers  want to deal with
 metrics. Might  they  need to document the quality,   perhaps in a less
"machine oriented", such as, by providing guided descriptions about known
issues?   Here,  it would be of great help if we get a list of approaches
followed  in literature or by people in the group, especially  for "non
linked data"  open datasets.  Carlos has already sent some, are there any
others,  except those included in [4], the group considers as relevant
examples?

- I think  the former, "Quality Vocabulary for express dataset compliance
to Best practices"  should   be firstly addressed in the best practice
document. For example, by defining  a set of  levels/profiles for
compliance ( see discussion on 5 stars.. I tend to endorse the Phil's
proposal, )  and defining  procedure to evaluate compliance  (perhaps,
lately  we might take advantage of SHACL (Shapes Constraint Language) if it
serves the goal)).
Of course,  lately, statements of  compliance to a certain level/profile of
best practice might be one of the other  "quality representations" to put
besides metric results.

2) concerning what quality dimensions to consider,  ..   Surely it is
interesting to know which among the possible quality dimensions are more
appealing for the group, at the same time, I suspect plenty of efforts are
going to be spent  defining  quality measures in the next years, and it
might be that the set of dimensions/ metrics changes a lot in the near and
not so near future, so  in my opinion, at least for the moment,  we should
leave the taxonomy about   dimensions-metrics out the core quality
 vocabulary, and we should provide it  as a sort of non-normative example
taxonomy, perhaps, in a separate namespace.

I wonder if there are  objections or radically different views in the
group about
these points?

Regards,
Riccardo

[1] http://butterbur04.iai.uni-bonn.de/ontologies/daq/daq
[2] https://www.w3.org/2013/dwbp/wiki/Requirements_From_FPWD_BP
[3] Albertoni, Asunción Gómez-Pérez: Assessing linkset quality for
complementing third-party datasets. EDBT/ICDT Workshops 2013: 52-59
[4]
https://www.w3.org/2013/dwbp/wiki/Data_quality_notes#Links.2C_related_work

On 6 April 2015 at 11:22, Carlos Iglesias <contact@carlosiglesias.es> wrote:

> Good. I'm adding also the Dataset Quality Vocabulary (daQ) as a reference
> as well http://butterbur04.iai.uni-bonn.de/ontologies/daq/daq
>
> Best,
>  CI.
>
> On 4 April 2015 at 18:37, Antoine Isaac <aisaac@few.vu.nl> wrote:
>
>> Hi Carlos,
>>
>> Thanks a lot for the links!
>> I've been collecting a list at
>> https://www.w3.org/2013/dwbp/wiki/Data_quality_notes#Links.
>> 2C_related_work
>> I've added your ones that were not there (all but one!)
>>
>> We should certainly study all this at one point.
>> For the moment however we'd like to give it a try to define quality by
>> our own use cases and best practices. Especially for defining what is in
>> scope or not.
>> There is indeed a lot of related work, mostly academic, and this could
>> end in trying to tackle many things, some perhaps less important than
>> others.
>>
>> Cheers,
>>
>> Antoine
>>
>> PS: @Carlos sorry I won't have time to answer on the other
>> (BP/vocabulary) thread very soon...
>>
>> On 4/4/15 3:37 AM, Carlos Iglesias wrote:
>>
>>> Hi Antoine, all,
>>>
>>> I think there is extensive literature on the different data quality
>>> characteristics that may be useful here as well.
>>> Some examples are:
>>>
>>> - Data quality under the computer science perspective
>>> http://www.academia.edu/2746633/Data_quality_under_the_computer_science_
>>> perspective
>>>
>>> - Data quality at a glance
>>> http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.
>>> 106.8628&rep=rep1&type=pdf
>>>
>>> - A metrics-driven approach for quality assessment of LOD
>>> http://www.scielo.cl/pdf/jtaer/v9n2/art06.pdf
>>>
>>> - Socio-technical impediments of Open Data
>>> http://www.ejeg.com/issue/download.html?idArticle=255
>>>
>>> - Risk Analysis to Overcome Barriers to Open Data
>>> http://www.ejeg.com/issue/download.html?idArticle=296
>>>
>>> - Quality Assessment Methodologies for Linked Open Data
>>> http://www.semantic-web-journal.net/system/files/swj414.pdf
>>>
>>> As well as other authoritative resources we may consider as well such as:
>>>
>>> - The Sebastopol principles
>>> https://public.resource.org/8_principles.html
>>>
>>> - ISO 8000 Data quality series.
>>>
>>> -- ISO 25012 Data quality model.
>>>
>>> Hope it helps.
>>>   Best,
>>>   CI.
>>>
>>> On 3 April 2015 at 18:42, Antoine Isaac <aisaac@few.vu.nl <mailto:
>>> aisaac@few.vu.nl>> wrote:
>>>
>>>     Dear all,
>>>
>>>     One week has passed since our previous report. The same situation is
>>> roughly the same. Since there was no reaction to my previous email I'm
>>> trying a different format.
>>>
>>>     We analyzed Q&G aspects in the Use Cases and Requirements FPWD:
>>>     - assessing which requirements should be in scope for the Q&G work
>>> [1]
>>>     - extracting the relevant Q&G stuff from the descriptions of Use
>>> Cases [2]
>>>
>>>     The outcome is that use cases have very diverse views on quality.
>>> There are two main issues for scoping the voc:
>>>
>>>     1. Focusing on expressing metrics for data quality
>>>     VS.
>>>     Also expressing compliance of dataset wrt Best practices. from our
>>> BP WD.
>>>
>>>     2. Focusing on a general framework to express metrics for data
>>> quality and exchange results along specific quality dimensions
>>>     VS.
>>>     Defining specific metrics with such framework.
>>>
>>>
>>>     Meanwhile, we have started extracting requirements from the best
>>> practices [3]
>>>
>>>     This includes identifying 'competency questions' guiding us to add
>>> classes and properties in the voc.
>>>
>>>     In general we feel we don't have much material to continue our work.
>>>     In fact most of the competency questions come from Riccardo, not
>>> from the best practices in the WD.
>>>
>>>     One option is to ask use case owners more precise questions. We
>>> started a questionnaire [4].
>>>
>>>     What is the group's reaction on this?
>>>     Can this be discussed at the F2F?
>>>
>>>     I am afraid that without further input it will be hard to keep to
>>> our schedule [5], which is already very late compared to the charter.
>>>
>>>     Antoine, on behalf of Riccardo, Deirdre and Christophe.
>>>
>>>     [1] https://www.w3.org/2013/dwbp/__wiki/Requirements_In_Scope___
>>> For_Quality <https://www.w3.org/2013/dwbp/wiki/Requirements_In_Scope_
>>> For_Quality>
>>>     [2] https://www.w3.org/2013/dwbp/__wiki/Quality_Aspects_In_Use__
>>> _Cases <https://www.w3.org/2013/dwbp/wiki/Quality_Aspects_In_Use_Cases>
>>>     [3] https://www.w3.org/2013/dwbp/__wiki/Requirements_From_FPWD_BP <
>>> https://www.w3.org/2013/dwbp/wiki/Requirements_From_FPWD_BP>
>>>     [4] https://www.w3.org/2013/dwbp/__wiki/QualityQuestionnaire <
>>> https://www.w3.org/2013/dwbp/wiki/QualityQuestionnaire>
>>>     [5] https://www.w3.org/2013/dwbp/__wiki/Data_quality_schedule <
>>> https://www.w3.org/2013/dwbp/wiki/Data_quality_schedule>
>>>
>>>
>>>
>>>
>>> --
>>> ---
>>>
>>> Carlos Iglesias.
>>> Open Data Consultant.
>>> +34 687 917 759
>>> contact@carlosiglesias.es <mailto:contact@carlosiglesias.es>
>>> @carlosiglesias
>>> http://es.linkedin.com/in/carlosiglesiasmoro/en
>>>
>>
>>
>
>
> --
> ---
>
> Carlos Iglesias.
> Internet & Web Consultant.
> +34 687 917 759
> contact@carlosiglesias.es
> @carlosiglesias
> http://es.linkedin.com/in/carlosiglesiasmoro/en
>
> --
> This message has been scanned for viruses and dangerous content by
> *E.F.A. Project* <http://www.efa-project.org>, and is believed to be
> clean.




-- 
----------------------------------------------------------------------------
Riccardo Albertoni
Istituto per la Matematica Applicata e Tecnologie Informatiche "Enrico
Magenes"
Consiglio Nazionale delle Ricerche
via de Marini 6 - 16149 GENOVA - ITALIA
tel. +39-010-6475624 - fax +39-010-6475660
e-mail: Riccardo.Albertoni@ge.imati.cnr.it
Skype: callto://riccardoalbertoni/
LinkedIn: http://www.linkedin.com/in/riccardoalbertoni
www: http://www.ge.imati.cnr.it/Albertoni
http://purl.oclc.org/NET/riccardoAlbertoni
FOAF:http://purl.oclc.org/NET/RiccardoAlbertoni/foaf

Received on Tuesday, 7 April 2015 13:31:43 UTC