- From: Antoine Isaac <aisaac@few.vu.nl>
- Date: Fri, 8 May 2015 09:11:00 +0200
- To: Public DWBP WG <public-dwbp-wg@w3.org>
Dear all, During the F2F I got an action to look at completeness as one of the quality dimensions [1] At least for me then, it was about trying to gether completeness-related material from our use cases and best practices. Of course there is more about completeness, e.g. in my own (cultural heritage) domain but I would rather focus on our stuff first, as the outside world is wide [2] and going through everything is far beyond one action. So my starting point is the pre-F2F gathering of quality-related aspects in the use cases [3]. Completeness (as represented by the req R-DataMissingIncomplete and R-QualityCompleteness) is mentioned in many UCs: 1 ASO: Airborne Snow Observatory 4 BuildingEye: SME use of public data 10 The Land Portal 12 LusTRE: Linked Thesaurus fRamework for Environment 14 Mass Spectrometry Imaging (MSI) 15 OKFN Transport WG 16 Open City Data Pipeline 18 Resource Discovery for Extreme Scale Collaboration (RDESC) 19 Recife Open Data Portal 20 Retrato da Violência (Violence Map) 22 Tabulae - how to get value out of data 24 Uruguay Open Data Catalog The wiki page at [3] has all quality-related extracts in the UC document. Most of these cases talk in very general terms (e.g. 'dataset must be complete') which strongly hints that completeness is indeed expected to be an indicator for quality. However, I could find only one use case really defines concretely what completeness means in its context: it's UC #12, LusTRE, with Riccardo's paper [4]. It is focused on completeness of owl:sameAs linksets, ie. sets of owl:sameAs links between two different sets. Its goal is to reflect how datasets can be 'complemented' via a linkset. Based on a small set of indicators (number of types, mappable types, etc), it proposes 3 completeness measures: - extent a linkset covers (all) types involved in its subject or object datasets. - level completeness of a linkset with respect to (linkable) types involved in its datasets. - percentage of entities of a selected type considered in the linkset. One can say that linksets are a very specific case, as completeness is 'derived' from datasets. Still this case is the only one I've seen with indicators and measure for completeness. Actually there is another UC that brings concrete hints about completeness is UC #3, Bio2RDF [5] That one doesn't mention explicit completeness-related reqs. However, it does present a number of indicators that I think could relate to completeness: total number of triples number of unique subjects number of unique predicates number of unique objects number of unique types unique predicate-object links and their frequencies unique predicate-literal links and their frequencies unique subject type-predicate-object type links and their frequencies unique subject type-predicate-literal links and their frequencies total number of references to a namespace total number of inter-namespace references total number of inter-namespace-predicate references But I see there is an issue raised precisely about it [6] questioning whether it relates to quality. If we decide that it's not the case, then the Bio2RDF UC has not much about completeness! Best, Antoine [1] http://www.w3.org/2013/dwbp/track/actions/153 [2] https://www.w3.org/2013/dwbp/wiki/Data_quality_notes#Links.2C_related_work [3] https://www.w3.org/2013/dwbp/wiki/Quality_Aspects_In_Use_Cases [4] http://www.edbt.org/Proceedings/2013-Genova/papers/workshops/a8-albertoni.pdf [5] http://www.w3.org/TR/2015/NOTE-dwbp-ucr-20150224/#UC-Bio2RDF [6]http://www.w3.org/2013/dwbp/track/issues/164
Received on Friday, 8 May 2015 07:11:29 UTC