- From: Riccardo Albertoni <albertoni@ge.imati.cnr.it>
- Date: Tue, 7 Apr 2015 15:31:17 +0200
- To: contact@carlosiglesias.es
- Cc: Antoine Isaac <aisaac@few.vu.nl>, Public DWBP WG <public-dwbp-wg@w3.org>
- Message-ID: <CAOHhXmS2BMPq-2P-c2BMdQWEwHu3LeSJSKieOmorR1S2SnG3UA@mail.gmail.com>
Dear all, let me share with you some of my thoughts hoping they might contribute in the discussion. 1) Antoine has mentioned the following two scope issues "Quality Vocabulary for express dataset compliance to Best practices" vs "Quality vocabulary to express metrics for data quality" I think both are in scope and should be addressed. I might change my mind after a proper discussion, but in my opinion, - the latter, "Quality vocabulary to express metrics for data quality", should be addressed by providing a RDF vocabulary the so called "Quality Vocabulary". I think the quality vocabulary should be provided by revising, extending the Jeremy's DAQ ontology [1], which has been mentioned by Carlos and other, and by specializing some other W3C ontologies. For example, starting from DAQ and other W3C vocabulary, we might (a) doublecheck that any kind of quality metrics can be easily represented and that the Quality vocabulary can be adopted as a mean to exchange quality results; (b) extend the vocabulary, so that, it can cover the competency questions derived from requirement analysis ( e.g., my list of CQ from BP document [2] once the list has properly revised by the group); (c) include other quality representations besides metrics' results. Don't get me wrong, I am a big supporter of metrics, actually, in my own research activity, I am trying to define new metrics for linkset quality ( e.g., [3]), but I suspect not all the providers want to deal with metrics. Might they need to document the quality, perhaps in a less "machine oriented", such as, by providing guided descriptions about known issues? Here, it would be of great help if we get a list of approaches followed in literature or by people in the group, especially for "non linked data" open datasets. Carlos has already sent some, are there any others, except those included in [4], the group considers as relevant examples? - I think the former, "Quality Vocabulary for express dataset compliance to Best practices" should be firstly addressed in the best practice document. For example, by defining a set of levels/profiles for compliance ( see discussion on 5 stars.. I tend to endorse the Phil's proposal, ) and defining procedure to evaluate compliance (perhaps, lately we might take advantage of SHACL (Shapes Constraint Language) if it serves the goal)). Of course, lately, statements of compliance to a certain level/profile of best practice might be one of the other "quality representations" to put besides metric results. 2) concerning what quality dimensions to consider, .. Surely it is interesting to know which among the possible quality dimensions are more appealing for the group, at the same time, I suspect plenty of efforts are going to be spent defining quality measures in the next years, and it might be that the set of dimensions/ metrics changes a lot in the near and not so near future, so in my opinion, at least for the moment, we should leave the taxonomy about dimensions-metrics out the core quality vocabulary, and we should provide it as a sort of non-normative example taxonomy, perhaps, in a separate namespace. I wonder if there are objections or radically different views in the group about these points? Regards, Riccardo [1] http://butterbur04.iai.uni-bonn.de/ontologies/daq/daq [2] https://www.w3.org/2013/dwbp/wiki/Requirements_From_FPWD_BP [3] Albertoni, Asunción Gómez-Pérez: Assessing linkset quality for complementing third-party datasets. EDBT/ICDT Workshops 2013: 52-59 [4] https://www.w3.org/2013/dwbp/wiki/Data_quality_notes#Links.2C_related_work On 6 April 2015 at 11:22, Carlos Iglesias <contact@carlosiglesias.es> wrote: > Good. I'm adding also the Dataset Quality Vocabulary (daQ) as a reference > as well http://butterbur04.iai.uni-bonn.de/ontologies/daq/daq > > Best, > CI. > > On 4 April 2015 at 18:37, Antoine Isaac <aisaac@few.vu.nl> wrote: > >> Hi Carlos, >> >> Thanks a lot for the links! >> I've been collecting a list at >> https://www.w3.org/2013/dwbp/wiki/Data_quality_notes#Links. >> 2C_related_work >> I've added your ones that were not there (all but one!) >> >> We should certainly study all this at one point. >> For the moment however we'd like to give it a try to define quality by >> our own use cases and best practices. Especially for defining what is in >> scope or not. >> There is indeed a lot of related work, mostly academic, and this could >> end in trying to tackle many things, some perhaps less important than >> others. >> >> Cheers, >> >> Antoine >> >> PS: @Carlos sorry I won't have time to answer on the other >> (BP/vocabulary) thread very soon... >> >> On 4/4/15 3:37 AM, Carlos Iglesias wrote: >> >>> Hi Antoine, all, >>> >>> I think there is extensive literature on the different data quality >>> characteristics that may be useful here as well. >>> Some examples are: >>> >>> - Data quality under the computer science perspective >>> http://www.academia.edu/2746633/Data_quality_under_the_computer_science_ >>> perspective >>> >>> - Data quality at a glance >>> http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1. >>> 106.8628&rep=rep1&type=pdf >>> >>> - A metrics-driven approach for quality assessment of LOD >>> http://www.scielo.cl/pdf/jtaer/v9n2/art06.pdf >>> >>> - Socio-technical impediments of Open Data >>> http://www.ejeg.com/issue/download.html?idArticle=255 >>> >>> - Risk Analysis to Overcome Barriers to Open Data >>> http://www.ejeg.com/issue/download.html?idArticle=296 >>> >>> - Quality Assessment Methodologies for Linked Open Data >>> http://www.semantic-web-journal.net/system/files/swj414.pdf >>> >>> As well as other authoritative resources we may consider as well such as: >>> >>> - The Sebastopol principles >>> https://public.resource.org/8_principles.html >>> >>> - ISO 8000 Data quality series. >>> >>> -- ISO 25012 Data quality model. >>> >>> Hope it helps. >>> Best, >>> CI. >>> >>> On 3 April 2015 at 18:42, Antoine Isaac <aisaac@few.vu.nl <mailto: >>> aisaac@few.vu.nl>> wrote: >>> >>> Dear all, >>> >>> One week has passed since our previous report. The same situation is >>> roughly the same. Since there was no reaction to my previous email I'm >>> trying a different format. >>> >>> We analyzed Q&G aspects in the Use Cases and Requirements FPWD: >>> - assessing which requirements should be in scope for the Q&G work >>> [1] >>> - extracting the relevant Q&G stuff from the descriptions of Use >>> Cases [2] >>> >>> The outcome is that use cases have very diverse views on quality. >>> There are two main issues for scoping the voc: >>> >>> 1. Focusing on expressing metrics for data quality >>> VS. >>> Also expressing compliance of dataset wrt Best practices. from our >>> BP WD. >>> >>> 2. Focusing on a general framework to express metrics for data >>> quality and exchange results along specific quality dimensions >>> VS. >>> Defining specific metrics with such framework. >>> >>> >>> Meanwhile, we have started extracting requirements from the best >>> practices [3] >>> >>> This includes identifying 'competency questions' guiding us to add >>> classes and properties in the voc. >>> >>> In general we feel we don't have much material to continue our work. >>> In fact most of the competency questions come from Riccardo, not >>> from the best practices in the WD. >>> >>> One option is to ask use case owners more precise questions. We >>> started a questionnaire [4]. >>> >>> What is the group's reaction on this? >>> Can this be discussed at the F2F? >>> >>> I am afraid that without further input it will be hard to keep to >>> our schedule [5], which is already very late compared to the charter. >>> >>> Antoine, on behalf of Riccardo, Deirdre and Christophe. >>> >>> [1] https://www.w3.org/2013/dwbp/__wiki/Requirements_In_Scope___ >>> For_Quality <https://www.w3.org/2013/dwbp/wiki/Requirements_In_Scope_ >>> For_Quality> >>> [2] https://www.w3.org/2013/dwbp/__wiki/Quality_Aspects_In_Use__ >>> _Cases <https://www.w3.org/2013/dwbp/wiki/Quality_Aspects_In_Use_Cases> >>> [3] https://www.w3.org/2013/dwbp/__wiki/Requirements_From_FPWD_BP < >>> https://www.w3.org/2013/dwbp/wiki/Requirements_From_FPWD_BP> >>> [4] https://www.w3.org/2013/dwbp/__wiki/QualityQuestionnaire < >>> https://www.w3.org/2013/dwbp/wiki/QualityQuestionnaire> >>> [5] https://www.w3.org/2013/dwbp/__wiki/Data_quality_schedule < >>> https://www.w3.org/2013/dwbp/wiki/Data_quality_schedule> >>> >>> >>> >>> >>> -- >>> --- >>> >>> Carlos Iglesias. >>> Open Data Consultant. >>> +34 687 917 759 >>> contact@carlosiglesias.es <mailto:contact@carlosiglesias.es> >>> @carlosiglesias >>> http://es.linkedin.com/in/carlosiglesiasmoro/en >>> >> >> > > > -- > --- > > Carlos Iglesias. > Internet & Web Consultant. > +34 687 917 759 > contact@carlosiglesias.es > @carlosiglesias > http://es.linkedin.com/in/carlosiglesiasmoro/en > > -- > This message has been scanned for viruses and dangerous content by > *E.F.A. Project* <http://www.efa-project.org>, and is believed to be > clean. -- ---------------------------------------------------------------------------- Riccardo Albertoni Istituto per la Matematica Applicata e Tecnologie Informatiche "Enrico Magenes" Consiglio Nazionale delle Ricerche via de Marini 6 - 16149 GENOVA - ITALIA tel. +39-010-6475624 - fax +39-010-6475660 e-mail: Riccardo.Albertoni@ge.imati.cnr.it Skype: callto://riccardoalbertoni/ LinkedIn: http://www.linkedin.com/in/riccardoalbertoni www: http://www.ge.imati.cnr.it/Albertoni http://purl.oclc.org/NET/riccardoAlbertoni FOAF:http://purl.oclc.org/NET/RiccardoAlbertoni/foaf
Received on Tuesday, 7 April 2015 13:31:43 UTC