W3C home > Mailing lists > Public > public-lod@w3.org > April 2011

Re: 15 Ways to Think About Data Quality (Just for a Start)

From: Kingsley Idehen <kidehen@openlinksw.com>
Date: Tue, 12 Apr 2011 14:59:23 -0400
Message-ID: <4DA4A10B.308@openlinksw.com>
To: glenn mcdonald <glenn@furia.com>
CC: "public-lod@w3.org" <public-lod@w3.org>
On 4/12/11 1:52 PM, glenn mcdonald wrote:
>
>     You continue to imply that seeing subjectively imperfect data
>     projected via a data oriented tool is problematic re., your "total
>     data experience" world view.
>
>
> I continue to think it's hilarious that you consider it "subjectively 
> imperfect" that your dataset says Michael Jackson and Michael Rodrick 
> are the same person. What would constitute "objectively imperfect" to you?
>
> So yes, I think you should feel a little embarrassed about 
> broadcasting links to a demo in which the very first piece of data one 
> sees is obviously wrong. You've got billions of entities in dbpedia, 
> and the technology doesn't care which one you pick, so surely you 
> could pick one where the errors aren't as prominent. The fact that you 
> didn't, and don't seem to care, sends a message about your attitude 
> towards data.
Simple exercise.

Assumptions:

1. an individual (neither you or I) stumbles across one of my demonstrations
2. your characterization of the demo producer (me) is 100% accurate
3. data beholder or consumer seeks to look at the data differently 
modulo my "data quality ambivalence" .

Nothing about the DBMS hosting the datasets (where each has a Named 
Graph IRI) prevents the beholder or consumer from achieving the 
following via the available data access endpoints:

1. Accessing and altering the source query or SPARQL protocol URL  -- I 
seldom publish a demo where all routes to actual query and data sources 
isn't machine and/or human discernible (note the use of footers, <link/> 
in <head/> and HTTP response headers in this regard)

2. Adding or removing pragmas re. inference context (owl:sameAs 
expansion, invocation of fuzzy InverseFunctionalProperty rules, or 
combination of both) as part of the view alteration quest outlined above

3. Viewing original or actual query results via alternative tools that 
can process HTTP response payloads -- remember nothing about SPARQL 
mandates RDF as sole query results format across SELECT, DESCRIBE, or 
CONSTRUCT queries

4. Sharing new query, new result set, new data presentation etc.. via a 
URL as part of an evolving conversation about the data in question.

What I outline above lies at the very core of every demo I produce and 
the very core of Virtuoso's architecture. The problem in our eyes (at 
OpenLink) boils down to dealing with data quality subjectivity at 
InterWeb scales, amongst other matters outside the realm of this 
particular conversation.

Hopefully, you know, we've already done this entire loop in ODBC, JDBC, 
OLE-DB, ADO.NET land eons ago. The limitations inherent in the 
aforementioned realms heavily influenced Virtuoso's architecture and as 
a result every single demo URL that I share. Note, there is a little 
more to every URL I publish. I am most interested in helping people 
appreciate URIs as immensely flexible Data Conductors since (in my world 
view) Data == New Electricity.

Remember, I do espouse to the mantra: Data is like Wine while 
Application code is like Fish. A Good (Cool) URL or URI should be able 
to stand the test of time :-)




-- 

Regards,

Kingsley Idehen	
President&  CEO
OpenLink Software
Web: http://www.openlinksw.com
Weblog: http://www.openlinksw.com/blog/~kidehen
Twitter/Identi.ca: kidehen
Received on Tuesday, 12 April 2011 18:59:46 UTC

This archive was generated by hypermail 2.3.1 : Sunday, 31 March 2013 14:24:32 UTC