Re: Low Quality Data (was before Re: AW: ANN: LOD Cloud - Statistics and compliance with best practices) from Kingsley Idehen on 2010-10-22 (public-lod@w3.org from October 2010)

From: Kingsley Idehen <kidehen@openlinksw.com>
Date: Fri, 22 Oct 2010 13:52:20 -0400
To: Juan Sequeda <juanfederico@gmail.com>
CC: Martin Hepp <martin.hepp@ebusiness-unibw.org>, public-lod <public-lod@w3.org>
Message-ID: <4CC1CF54.8020109@openlinksw.com>

On 10/22/10 10:47 AM, Juan Sequeda wrote:
> Martin and all,
>
> Can somebody point me to papers or maybe give their definition of low 
> quality data when it comes to LOD. What is the criteria for data to be 
> considered low quality.

My Subjective Data Quality Factors:

1. Unambiguous Names -- Resolvable URIs based Names
2. Data Representation Format Dexterity -- HTTP + Content Negotiation 
which loosens the coupling between model Semantics and Data Representation
3. Platform Agnostic Data Access -- HTTP delivers this well
4. Change Sensitivity -- speaks for itself, hopefully
5. Provenance -- data about the data (metadata) that helps establish 
"Who, What, When, Where, and ~ Why" re. curation
6. Mesh Navigability  -- inference context enables this ..

This is why I say: look at Data like a cube of sugar. Especially when 
trying to fashion Linked Data oriented business models. 1-6 nullify many 
of the concerns about data driven business models:

1. Wholesale Imports (crawls) that reconstitute data in a new data space 
-- #1 allows you to brand your data, when combined with licensing it 
also allows you track conformance (remember, Web Architecture makes the 
Web sticky via http logs amongst other things, so entropy is your 
friend, ultimately)

2. Attribution -- ditto

3. Data Consumer Identity -- WebID will put an end to API Keys (major 
relics) so QoS based on quality factors #2-6 is absolutely plausible.



Kingsley
>
> Thanks
>
> Juan Sequeda
> +1-575-SEQ-UEDA
> www.juansequeda.com <http://www.juansequeda.com>
>
>
> On Fri, Oct 22, 2010 at 9:01 AM, Martin Hepp 
> <martin.hepp@ebusiness-unibw.org 
> <mailto:martin.hepp@ebusiness-unibw.org>> wrote:
>
>         The Web of documents is an open system built on people
>         agreeing on standards
>         and best practices.
>         Open system means in this context that everybody can publish
>         content and
>         that there are no restrictions on the quality of the content.
>         This is in my opinion one of the central facts that made the
>         Web successful.
>
>         +10000000000
>
>
>         The same is true for the Web of Data. There obviously cannot
>         be any
>         restrictions on what people can/should publish (including,
>         different
>         opinions on a topic, but also including pure SPAM). As on the
>         classic Web,
>         it is a job of the information/data consumer to figure out
>         which data it
>         wants to believe and use (definition of information quality =
>         usefulness of
>         information, which is a subjective thing).
>         +10000000000
>
>
>     The fact that there is obviously a lot of low quality data on the
>     current Web should not encourage us to publish masses of
>     low-quality data and then celebrate ourselves for having achieved
>     a lot. The current Web tolerates buggy markup, broken links, and
>     questionable content of all types. But I hope everybody agrees
>     that the Web is successful because of this tolerance, not because
>     of the buggy content itself. Quite to the contrary, the Web has
>     been broadly adopted because of the lots of commonly agreed
>     high-quality contents.
>
>     If you continue to live the linked data landfill style it will
>     fall back on you, reputation-wise, funding-wise, and career-wise.
>     Some rules hold in ecosystems of all kinds and sizes.
>
>     Best
>
>     Martin
>
>


-- 

Regards,

Kingsley Idehen	
President&  CEO
OpenLink Software
Web: http://www.openlinksw.com
Weblog: http://www.openlinksw.com/blog/~kidehen
Twitter/Identi.ca: kidehen

Received on Friday, 22 October 2010 17:52:50 UTC