- From: Christophe Guéret <christophe.gueret@dans.knaw.nl>
- Date: Fri, 8 May 2015 13:12:45 +0200
- To: Antoine Isaac <aisaac@few.vu.nl>
- CC: Public DWBP WG <public-dwbp-wg@w3.org>
- Message-ID: <CABP9CAG6m+UjGJ0-Z=hqc7Ur23cXHrhKOYEUS6P9WVZ_GdT_=A@mail.gmail.com>
Hoi, My 2cents on this is that we should remain as generic as possible. As you show with the two examples, different datasets will call for different notions of completness. I bet we could also find different ways to assess completeness on one dataset only. So let's let the publisher decide on what makes sense there, like for the SLAs. Christophe -- Sent with difficulties. Sorry for the brievety and typos... Op 8 mei 2015 09:11 schreef "Antoine Isaac" <aisaac@few.vu.nl>: > Dear all, > > During the F2F I got an action to look at completeness as one of the > quality dimensions [1] > > At least for me then, it was about trying to gether completeness-related > material from our use cases and best practices. Of course there is more > about completeness, e.g. in my own (cultural heritage) domain but I would > rather focus on our stuff first, as the outside world is wide [2] and going > through everything is far beyond one action. > > So my starting point is the pre-F2F gathering of quality-related aspects > in the use cases [3]. Completeness (as represented by the req > R-DataMissingIncomplete and R-QualityCompleteness) is mentioned in many UCs: > 1 ASO: Airborne Snow Observatory > 4 BuildingEye: SME use of public data > 10 The Land Portal > 12 LusTRE: Linked Thesaurus fRamework for Environment > 14 Mass Spectrometry Imaging (MSI) > 15 OKFN Transport WG > 16 Open City Data Pipeline > 18 Resource Discovery for Extreme Scale Collaboration (RDESC) > 19 Recife Open Data Portal > 20 Retrato da Violência (Violence Map) > 22 Tabulae - how to get value out of data > 24 Uruguay Open Data Catalog > > The wiki page at [3] has all quality-related extracts in the UC document. > Most of these cases talk in very general terms (e.g. 'dataset must be > complete') which strongly hints that completeness is indeed expected to be > an indicator for quality. > > However, I could find only one use case really defines concretely what > completeness means in its context: it's UC #12, LusTRE, with Riccardo's > paper [4]. It is focused on completeness of owl:sameAs linksets, ie. sets > of owl:sameAs links between two different sets. Its goal is to reflect how > datasets can be 'complemented' via a linkset. Based on a small set of > indicators (number of types, mappable types, etc), it proposes 3 > completeness measures: > - extent a linkset covers (all) types involved in its subject or object > datasets. > - level completeness of a linkset with respect to (linkable) types > involved in its datasets. > - percentage of entities of a selected type considered in the linkset. > > One can say that linksets are a very specific case, as completeness is > 'derived' from datasets. Still this case is the only one I've seen with > indicators and measure for completeness. > > > Actually there is another UC that brings concrete hints about completeness > is UC #3, Bio2RDF [5] > That one doesn't mention explicit completeness-related reqs. However, it > does present a number of indicators that I think could relate to > completeness: > total number of triples > number of unique subjects > number of unique predicates > number of unique objects > number of unique types > unique predicate-object links and their frequencies > unique predicate-literal links and their frequencies > unique subject type-predicate-object type links and their frequencies > unique subject type-predicate-literal links and their frequencies > total number of references to a namespace > total number of inter-namespace references > total number of inter-namespace-predicate references > > But I see there is an issue raised precisely about it [6] questioning > whether it relates to quality. If we decide that it's not the case, then > the Bio2RDF UC has not much about completeness! > > Best, > > Antoine > > [1] http://www.w3.org/2013/dwbp/track/actions/153 > [2] > https://www.w3.org/2013/dwbp/wiki/Data_quality_notes#Links.2C_related_work > [3] https://www.w3.org/2013/dwbp/wiki/Quality_Aspects_In_Use_Cases > [4] > http://www.edbt.org/Proceedings/2013-Genova/papers/workshops/a8-albertoni.pdf > [5] http://www.w3.org/TR/2015/NOTE-dwbp-ucr-20150224/#UC-Bio2RDF > [6]http://www.w3.org/2013/dwbp/track/issues/164 > >
Received on Friday, 8 May 2015 11:13:14 UTC