W3C home > Mailing lists > Public > public-lod@w3.org > April 2011

Re: 15 Ways to Think About Data Quality (Just for a Start)

From: Kingsley Idehen <kidehen@openlinksw.com>
Date: Tue, 12 Apr 2011 10:18:04 -0400
Message-ID: <4DA45F1C.3060003@openlinksw.com>
To: glenn mcdonald <glenn@furia.com>
CC: "public-lod@w3.org" <public-lod@w3.org>
On 4/12/11 9:53 AM, glenn mcdonald wrote:
> On Tue, Apr 12, 2011 at 8:58 AM, Kingsley Idehen 
> <kidehen@openlinksw.com <mailto:kidehen@openlinksw.com>> wrote:
>
>     1.
>     http://lod.openlinksw.com/describe/?uri=http%3A%2F%2Fdbpedia.org%2Fresource%2FMichael_Jackson
>     -- basic description of 'Micheal Jackson' from DBpedia
>
>
> The very first assertion on this, your first link, is 
> "is sameAs of: Michael Rodrick". And you wonder why I keep distracting 
> your technology demos by talking about data quality...
>
> 	
>
In addition to my prior comments, you could have looked up the source of 
the subjectively errant assertion via its source named graph: 
http://lod.openlinksw.com/fct/rdfdesc/usage.vsp?g=http%3A%2F%2Fdbpedia.org%2Fresource%2FMichael_Jackson&tp=2 
. Or you could have just followed the link: 
http://lod.openlinksw.com/describe/?url=http%3A%2F%2Fsw.opencyc.org%2F2008%2F06%2F10%2Fconcept%2FMx4rvWuBAJwpEbGdrcN5Y29ycA 
. Either way, you would come to realize:

1. The DBMS has many Named Graphs
2. The browser page in question scopes queries to all graphs
3. Nothing about this setup enforces owl:sameAs inference -- the reason 
why you have other links showing application of owl:sameAs reasoning to 
the data in question.

As I've told you repeatedly, we have Named Rules and Named Graphs. In 
our world these parts are all loosely coupled so that humans and agents 
can pursue their desired world views. I am not trying to enforce 
anything on anyone via our technology. Basically, this is about showing 
the virtues of loosely coupling critical parts of this Linked Data 
ecosystem.

BTW - we are already working with Yago2, ProductOntology, OpenCyc re. 
fixes to their DBpedia mappings. All part of a virtuous cycle driven by 
conversations about the data with subjective enhancements via "context 
lenses" as the final destination.

To concluded, finding the subjectively bad needle in the haystack is in 
of itself immensely valuable with regards any pursuit of subjective data 
quality. You can fix what you don't know is broken.  LOD is a large 
community ditto DBpedia, nobody (as far as I know) has ever espoused the 
position that "data quality" is a no-go area. What I think people do 
espouse (I might be wrong) covertly is this: make your contribution 
rather that berate those already making contributions, however perfect 
or imperfect these contributions might be.

-- 

Regards,

Kingsley Idehen	
President&  CEO
OpenLink Software
Web: http://www.openlinksw.com
Weblog: http://www.openlinksw.com/blog/~kidehen
Twitter/Identi.ca: kidehen
Received on Tuesday, 12 April 2011 14:18:27 UTC

This archive was generated by hypermail 2.3.1 : Sunday, 31 March 2013 14:24:32 UTC