Re: How To Do Deal with the Subjective Issue of Data Quality? from Kingsley Idehen on 2011-04-07 (public-lod@w3.org from April 2011)

From: Kingsley Idehen <kidehen@openlinksw.com>
Date: Thu, 07 Apr 2011 17:17:43 -0400
To: Patrick Logan <patrickdlogan@gmail.com>
CC: "public-lod@w3.org" <public-lod@w3.org>
Message-ID: <4D9E29F7.9020404@openlinksw.com>
On 4/7/11 3:17 PM, Patrick Logan wrote:
> Note: my response is only going to public-lod because I wanted to
> choose just one, I subscribe to it, and this is where "data quality"
> takes on new, well, qualities, from it's more typical application
> within a single enterprise.

Okay, but note, DBpedia and Semantic Web mailing lists were initially in 
cc. list due to relevance and the splintered nature of subscriptions. 
Anyway, let's see how it goes confining this to LOD :-)

> On Thu, Apr 7, 2011 at 11:06 AM, Kingsley Idehen<kidehen@openlinksw.com>  wrote:
>> Personally, I subscribe to the doctrine that "data quality" is like "beauty"
>> it lies strictly in the eyes of the beholder i.e., a function of said
>> beholders "context lenses".
> First and foremost is the "eyes of the beholder" have to set different
> expectations for public LOD than they would for something like
> "enterprise LOD".

I do see Linked Data and Linked Open Data as quite distinct. The former 
is about whole data representation using Links, where content creation 
is constrained by a conceptual schema grounded in first-order logic. 
Linked Open Data is about the application of the aforementioned to 
publicly available structured data e.g., via the World Wide Web. That 
said, in either realm, quality remains constrained by context fluidity 
of the beholder.

If I had the time to illustrate and animate "context fluidity" I would 
have an animation that showcased individual context halos (around 
people's head like the Biblical depictions) and the fluidity inherent in 
their time variant states of mind i.e., real sense of "current status" 
or "current state of mind" beyond what you see today re. Twitter and 
OStatus oriented apps.
> Ideally each source of data would publish something about their DQ
> goals and current status, so consumers have an idea what to expect and
> where improvements may be heading.

Via the power of Linked Data graphs, each person can filter data in such 
a way that everyone sees what's individually important to them, and then 
in the cause of information exchange differences are ironed out, 
basically a reconciliation process driven by a specific pragmatic goals. 
In a sense, most enterprises sorta try to work this way but the IT 
infrastructure fails, woefully. Therein, lies one of many massive 
opportunities for Linked Data across Intranets and the InterWeb.

> As a community, public LOD providers and consumers have to discuss the
> quality of these various sources and the implications for things such
> as "same as" and "counting".

Yes, discussion is good, but cognitive beings are wired (I believe) with 
the ability to observe aspects of the same Subject differently. Thus, 
keeping all the layers loose is paramount. I can never really dictate to 
you what constitutes quality data, all we can do is attempt to reconcile 
individual observations of common Subjects. Some cases we'll agree and 
sometime we wont. The beauty of the Web (to me) is that it's 
architecture ultimately allows everyone to "agree to disagree", without 
going to war. For example, without 404's the Web would have been yet 
another failed global network attempt :-)

> A foundation for that are the
> specifications of the public data and how to specify aspects of
> quality, and how to *publish* that DQ information in a consumable
> way... make DQ statements part of the public LOD.

I think the Web will allow user agents coalesce around data spaces that 
over value. Others will simply wither away over time. No set of 
draconian rules will avert this reality because said reality is wired 
into the fabric of scale-free networks such as the Web.


I believe Data Wikis will go long way to crowd sourcing data 
reconciliation. Of course, for that to happen you need access control 
lists (ACLs) and verifiable identity, which is why the WebID protocol 
(an application of Linked Data) is so important to this whole topic of 
subjective data quality.

If the logic is already making its way into the data, why not make 
conversations about data reconciliation part of the data too? Wikipedia 
sorta, works, but Data Wikis will take this matter to much greater 
heights. We'll never be able to compute "Why" from "Who", "What", 
"Where", and "When" data with 100% precision. Adding reconciliatory 
conversations into the data via Data Wikis will get us much closer than 
we are today.

Kingsley
> -Patrick
>
>


-- 

Regards,

Kingsley Idehen	
President&  CEO
OpenLink Software
Web: http://www.openlinksw.com
Weblog: http://www.openlinksw.com/blog/~kidehen
Twitter/Identi.ca: kidehen
Received on Thursday, 7 April 2011 21:18:07 UTC