- From: Makx Dekkers <mail@makxdekkers.com>
- Date: Tue, 7 Apr 2015 22:31:37 +0200
- To: "'Public DWBP WG'" <public-dwbp-wg@w3.org>
- Message-ID: <000001d07171$d971c760$8c555620$@makxdekkers.com>
I strongly second Riccardo’s suggestion to base the work as much as possible on existing work, and in particular the DaQ vocabulary. Let me also give the group a short report on the session on quality that I moderated at the Share-PSI worship in Timisoara last month. If you’re interested, the raw session notes are available at http://www.w3.org/2013/share-psi/wiki/Timisoara/Scribe#Tuesday_17th_March_.2811:30_-_12:40_Parallel_Sessions_B.29. A major outcome was that people in the session agreed that there are three main aspects of quality for which some form of metric (either quantitative or qualitative) is considered useful: * Availability * Processability * Accuracy/consistency/relevance (These terms as defined and described in http://www.slideshare.net/OpenDataSupport/open-data-quality-29248578) For availability, potential metrics that were mentioned were: * Yes/no, maybe with explanation why the data is not available (privacy, security, archived, lost, not yet captured etc.) * Open/restricted/registration, again possibly with explanation * For access/re-use * Indication of persistence and longevity For processability: * Level on the 5-star scale (although there were opinions that it is dangerous to attach value to the linking because the data might be good but link to ‘bad’ data) * Links to metadata standards used and data model/schema to enable automatic processing In the discussion related to the cluster accuracy/consistency/relevance, it was also noted that it might be useful to include some information about the context (e.g. why was the data created and what purpose is it supposed to serve). On another level, the comment was made that quality is not a stable characteristic of a resource – some quality aspects deteriorate over time, e.g. what is fresh today will be stale tomorrow if it is not maintained, updated, refreshed. At the end, we agreed to look at the ODI certificate approach to see how the elements of the certificate relate to the quality aspects that were discussed. Hope this helps, Makx. From: Riccardo Albertoni [mailto:albertoni@ge.imati.cnr.it] Sent: 7 April 2015 15:31 To: contact@carlosiglesias.es Cc: Antoine Isaac; Public DWBP WG Subject: Re: Data Q&G vocabulary - report and questions for F2F Dear all, let me share with you some of my thoughts hoping they might contribute in the discussion. 1) Antoine has mentioned the following two scope issues "Quality Vocabulary for express dataset compliance to Best practices" vs "Quality vocabulary to express metrics for data quality" I think both are in scope and should be addressed. I might change my mind after a proper discussion, but in my opinion, - the latter, "Quality vocabulary to express metrics for data quality", should be addressed by providing a RDF vocabulary the so called "Quality Vocabulary". I think the quality vocabulary should be provided by revising, extending the Jeremy's DAQ ontology [1], which has been mentioned by Carlos and other, and by specializing some other W3C ontologies. For example, starting from DAQ and other W3C vocabulary, we might (a) doublecheck that any kind of quality metrics can be easily represented and that the Quality vocabulary can be adopted as a mean to exchange quality results; (b) extend the vocabulary, so that, it can cover the competency questions derived from requirement analysis ( e.g., my list of CQ from BP document [2] once the list has properly revised by the group); (c) include other quality representations besides metrics' results. Don't get me wrong, I am a big supporter of metrics, actually, in my own research activity, I am trying to define new metrics for linkset quality ( e.g., [3]), but I suspect not all the providers want to deal with metrics. Might they need to document the quality, perhaps in a less "machine oriented", such as, by providing guided descriptions about known issues? Here, it would be of great help if we get a list of approaches followed in literature or by people in the group, especially for "non linked data" open datasets. Carlos has already sent some, are there any others, except those included in [4], the group considers as relevant examples? - I think the former, "Quality Vocabulary for express dataset compliance to Best practices" should be firstly addressed in the best practice document. For example, by defining a set of levels/profiles for compliance ( see discussion on 5 stars.. I tend to endorse the Phil's proposal, ) and defining procedure to evaluate compliance (perhaps, lately we might take advantage of SHACL (Shapes Constraint Language) if it serves the goal)). Of course, lately, statements of compliance to a certain level/profile of best practice might be one of the other "quality representations" to put besides metric results. 2) concerning what quality dimensions to consider, .. Surely it is interesting to know which among the possible quality dimensions are more appealing for the group, at the same time, I suspect plenty of efforts are going to be spent defining quality measures in the next years, and it might be that the set of dimensions/ metrics changes a lot in the near and not so near future, so in my opinion, at least for the moment, we should leave the taxonomy about dimensions-metrics out the core quality vocabulary, and we should provide it as a sort of non-normative example taxonomy, perhaps, in a separate namespace. I wonder if there are objections or radically different views in the group about these points? Regards, Riccardo [1] <http://butterbur04.iai.uni-bonn.de/ontologies/daq/daq> http://butterbur04.iai.uni-bonn.de/ontologies/daq/daq [2] https://www.w3.org/2013/dwbp/wiki/Requirements_From_FPWD_BP [3] Albertoni, Asunción Gómez-Pérez: Assessing linkset quality for complementing third-party datasets. EDBT/ICDT Workshops 2013: 52-59 [4] https://www.w3.org/2013/dwbp/wiki/Data_quality_notes#Links.2C_related_work On 6 April 2015 at 11:22, Carlos Iglesias <contact@carlosiglesias.es <mailto:contact@carlosiglesias.es> > wrote: Good. I'm adding also the Dataset Quality Vocabulary (daQ) as a reference as well http://butterbur04.iai.uni-bonn.de/ontologies/daq/daq Best, CI. On 4 April 2015 at 18:37, Antoine Isaac <aisaac@few.vu.nl <mailto:aisaac@few.vu.nl> > wrote: Hi Carlos, Thanks a lot for the links! I've been collecting a list at https://www.w3.org/2013/dwbp/wiki/Data_quality_notes#Links.2C_related_work I've added your ones that were not there (all but one!) We should certainly study all this at one point. For the moment however we'd like to give it a try to define quality by our own use cases and best practices. Especially for defining what is in scope or not. There is indeed a lot of related work, mostly academic, and this could end in trying to tackle many things, some perhaps less important than others. Cheers, Antoine PS: @Carlos sorry I won't have time to answer on the other (BP/vocabulary) thread very soon... On 4/4/15 3:37 AM, Carlos Iglesias wrote: Hi Antoine, all, I think there is extensive literature on the different data quality characteristics that may be useful here as well. Some examples are: - Data quality under the computer science perspective http://www.academia.edu/2746633/Data_quality_under_the_computer_science_perspective - Data quality at a glance http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.106.8628 <http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.106.8628&rep=rep1&type=pdf> &rep=rep1&type=pdf - A metrics-driven approach for quality assessment of LOD http://www.scielo.cl/pdf/jtaer/v9n2/art06.pdf - Socio-technical impediments of Open Data http://www.ejeg.com/issue/download.html?idArticle=255 - Risk Analysis to Overcome Barriers to Open Data http://www.ejeg.com/issue/download.html?idArticle=296 - Quality Assessment Methodologies for Linked Open Data http://www.semantic-web-journal.net/system/files/swj414.pdf As well as other authoritative resources we may consider as well such as: - The Sebastopol principles https://public.resource.org/8_principles.html - ISO 8000 Data quality series. -- ISO 25012 Data quality model. Hope it helps. Best, CI. On 3 April 2015 at 18:42, Antoine Isaac <aisaac@few.vu.nl <mailto:aisaac@few.vu.nl> <mailto:aisaac@few.vu.nl <mailto:aisaac@few.vu.nl> >> wrote: Dear all, One week has passed since our previous report. The same situation is roughly the same. Since there was no reaction to my previous email I'm trying a different format. We analyzed Q&G aspects in the Use Cases and Requirements FPWD: - assessing which requirements should be in scope for the Q&G work [1] - extracting the relevant Q&G stuff from the descriptions of Use Cases [2] The outcome is that use cases have very diverse views on quality. There are two main issues for scoping the voc: 1. Focusing on expressing metrics for data quality VS. Also expressing compliance of dataset wrt Best practices. from our BP WD. 2. Focusing on a general framework to express metrics for data quality and exchange results along specific quality dimensions VS. Defining specific metrics with such framework. Meanwhile, we have started extracting requirements from the best practices [3] This includes identifying 'competency questions' guiding us to add classes and properties in the voc. In general we feel we don't have much material to continue our work. In fact most of the competency questions come from Riccardo, not from the best practices in the WD. One option is to ask use case owners more precise questions. We started a questionnaire [4]. What is the group's reaction on this? Can this be discussed at the F2F? I am afraid that without further input it will be hard to keep to our schedule [5], which is already very late compared to the charter. Antoine, on behalf of Riccardo, Deirdre and Christophe. [1] https://www.w3.org/2013/dwbp/__wiki/Requirements_In_Scope___For_Quality <https://www.w3.org/2013/dwbp/wiki/Requirements_In_Scope_For_Quality> [2] https://www.w3.org/2013/dwbp/__wiki/Quality_Aspects_In_Use___Cases <https://www.w3.org/2013/dwbp/wiki/Quality_Aspects_In_Use_Cases> [3] https://www.w3.org/2013/dwbp/__wiki/Requirements_From_FPWD_BP <https://www.w3.org/2013/dwbp/wiki/Requirements_From_FPWD_BP> [4] https://www.w3.org/2013/dwbp/__wiki/QualityQuestionnaire <https://www.w3.org/2013/dwbp/wiki/QualityQuestionnaire> [5] https://www.w3.org/2013/dwbp/__wiki/Data_quality_schedule <https://www.w3.org/2013/dwbp/wiki/Data_quality_schedule> -- --- Carlos Iglesias. Open Data Consultant. +34 687 917 759 <tel:%2B34%20687%20917%20759> contact@carlosiglesias.es <mailto:contact@carlosiglesias.es> <mailto:contact@carlosiglesias.es <mailto:contact@carlosiglesias.es> > @carlosiglesias http://es.linkedin.com/in/carlosiglesiasmoro/en -- --- Carlos Iglesias. Internet & Web Consultant. +34 687 917 759 <mailto:contact@carlosiglesias.es> contact@carlosiglesias.es @carlosiglesias <http://es.linkedin.com/in/carlosiglesiasmoro/en> http://es.linkedin.com/in/carlosiglesiasmoro/en -- This message has been scanned for viruses and dangerous content by <http://www.efa-project.org> E.F.A. Project, and is believed to be clean. -- ---------------------------------------------------------------------------- Riccardo Albertoni Istituto per la Matematica Applicata e Tecnologie Informatiche "Enrico Magenes" Consiglio Nazionale delle Ricerche via de Marini 6 - 16149 GENOVA - ITALIA tel. +39-010-6475624 <tel:%2B39-010-6475624> - fax +39-010-6475660 <tel:%2B39-010-6475660> e-mail: <mailto:Riccardo.Albertoni@ge.imati.cnr.it> Riccardo.Albertoni@ge.imati.cnr.it Skype: callto://riccardoalbertoni/ LinkedIn: <http://www.linkedin.com/in/riccardoalbertoni> http://www.linkedin.com/in/riccardoalbertoni www: <http://www.ge.imati.cnr.it/Albertoni> http://www.ge.imati.cnr.it/Albertoni http://purl.oclc.org/NET/riccardoAlbertoni FOAF:http://purl.oclc.org/NET/RiccardoAlbertoni/foaf
Received on Tuesday, 7 April 2015 20:32:14 UTC