Re: ACTION-153: Completeness as one of the quality dimensions

Hoi,

My 2cents on this is that we should remain as generic as possible. As you
show with the two examples, different datasets will call for different
notions of completness. I bet we could also find different ways to assess
completeness on one dataset only. So let's let the publisher decide on what
makes sense there, like for the SLAs.

Christophe

--
Sent with difficulties. Sorry for the brievety and typos...
Op 8 mei 2015 09:11 schreef "Antoine Isaac" <aisaac@few.vu.nl>:

> Dear all,
>
> During the F2F I got an action to look at completeness as one of the
> quality dimensions [1]
>
> At least for me then, it was about trying to gether completeness-related
> material from our use cases and best practices. Of course there is more
> about completeness, e.g. in my own (cultural heritage) domain but I would
> rather focus on our stuff first, as the outside world is wide [2] and going
> through everything is far beyond one action.
>
> So my starting point is the pre-F2F gathering of quality-related aspects
> in the use cases [3]. Completeness (as represented by the req
> R-DataMissingIncomplete and R-QualityCompleteness) is mentioned in many UCs:
> 1 ASO: Airborne Snow Observatory
> 4 BuildingEye: SME use of public data
> 10 The Land Portal
> 12 LusTRE: Linked Thesaurus fRamework for Environment
> 14 Mass Spectrometry Imaging (MSI)
> 15 OKFN Transport WG
> 16 Open City Data Pipeline
> 18 Resource Discovery for Extreme Scale Collaboration (RDESC)
> 19 Recife Open Data Portal
> 20 Retrato da Violência (Violence Map)
> 22 Tabulae - how to get value out of data
> 24 Uruguay Open Data Catalog
>
> The wiki page at [3] has all quality-related extracts in the UC document.
> Most of these cases talk in very general terms (e.g. 'dataset must be
> complete') which strongly hints that completeness is indeed expected to be
> an indicator for quality.
>
> However, I could find only one use case really defines concretely what
> completeness means in its context: it's UC #12, LusTRE, with Riccardo's
> paper [4]. It is focused on completeness of owl:sameAs linksets, ie. sets
> of owl:sameAs links between two different sets. Its goal is to reflect how
> datasets can be 'complemented' via a linkset. Based on a small set of
> indicators (number of types, mappable types, etc), it proposes 3
> completeness measures:
> - extent a linkset covers (all) types involved in its subject or object
> datasets.
> - level completeness of a linkset with respect to (linkable) types
> involved in its datasets.
> - percentage of entities of a selected type considered in the linkset.
>
> One can say that linksets are a very specific case, as completeness is
> 'derived' from datasets. Still this case is the only one I've seen with
> indicators and measure for completeness.
>
>
> Actually there is another UC that brings concrete hints about completeness
> is UC #3, Bio2RDF [5]
> That one doesn't mention explicit completeness-related reqs. However, it
> does present a number of indicators that I think could relate to
> completeness:
>     total number of triples
>     number of unique subjects
>     number of unique predicates
>     number of unique objects
>     number of unique types
>     unique predicate-object links and their frequencies
>     unique predicate-literal links and their frequencies
>     unique subject type-predicate-object type links and their frequencies
>     unique subject type-predicate-literal links and their frequencies
>     total number of references to a namespace
>     total number of inter-namespace references
>     total number of inter-namespace-predicate references
>
> But I see there is an issue raised precisely about it [6] questioning
> whether it relates to quality. If we decide that it's not the case, then
> the Bio2RDF UC has not much about completeness!
>
> Best,
>
> Antoine
>
> [1] http://www.w3.org/2013/dwbp/track/actions/153
> [2]
> https://www.w3.org/2013/dwbp/wiki/Data_quality_notes#Links.2C_related_work
> [3] https://www.w3.org/2013/dwbp/wiki/Quality_Aspects_In_Use_Cases
> [4]
> http://www.edbt.org/Proceedings/2013-Genova/papers/workshops/a8-albertoni.pdf
> [5] http://www.w3.org/TR/2015/NOTE-dwbp-ucr-20150224/#UC-Bio2RDF
> [6]http://www.w3.org/2013/dwbp/track/issues/164
>
>

Received on Friday, 8 May 2015 11:13:14 UTC