W3C home > Mailing lists > Public > public-lod@w3.org > October 2010

Re: Low Quality Data (was before Re: AW: ANN: LOD Cloud - Statistics and compliance with best practices)

From: Chris Bizer <chris@bizer.de>
Date: Fri, 22 Oct 2010 17:00:02 +0200
To: "'Juan Sequeda'" <juanfederico@gmail.com>, "'Martin Hepp'" <martin.hepp@ebusiness-unibw.org>
Cc: "'public-lod'" <public-lod@w3.org>
Message-ID: <016501cb71f9$cde89bc0$69b9d340$@bizer.de>
Hi Juan,

 

 

Martin and all,

 

Can somebody point me to papers or maybe give their definition of low
quality data when it comes to LOD. What is the criteria for data to be
considered low quality.

 

An overview about the literature on data quality can be found in my PhD,
including the different definitions of the term and the like .

 

See:

 

http://www.diss.fu-berlin.de/diss/servlets/MCRFileNodeServlet/FUDISS_derivat
e_000000002736/02_Chapter2-Information-Quality.pdf?hosts=

also

http://www.diss.fu-berlin.de/2007/217/indexe.html

 

All this is from 2008. Thus, I guess there will also be newer stuff around,
but the text should properly reflect the state-of-the-art back then.

 

Cheers,

 

Chris

 

 

Thanks


Juan Sequeda
+1-575-SEQ-UEDA
www.juansequeda.com



On Fri, Oct 22, 2010 at 9:01 AM, Martin Hepp
<martin.hepp@ebusiness-unibw.org> wrote:

The Web of documents is an open system built on people agreeing on standards
and best practices.
Open system means in this context that everybody can publish content and
that there are no restrictions on the quality of the content.
This is in my opinion one of the central facts that made the Web successful.

+10000000000


The same is true for the Web of Data. There obviously cannot be any
restrictions on what people can/should publish (including, different
opinions on a topic, but also including pure SPAM). As on the classic Web,
it is a job of the information/data consumer to figure out which data it
wants to believe and use (definition of information quality = usefulness of
information, which is a subjective thing).
+10000000000

 

The fact that there is obviously a lot of low quality data on the current
Web should not encourage us to publish masses of low-quality data and then
celebrate ourselves for having achieved a lot. The current Web tolerates
buggy markup, broken links, and questionable content of all types. But I
hope everybody agrees that the Web is successful because of this tolerance,
not because of the buggy content itself. Quite to the contrary, the Web has
been broadly adopted because of the lots of commonly agreed high-quality
contents.

If you continue to live the linked data landfill style it will fall back on
you, reputation-wise, funding-wise, and career-wise. Some rules hold in
ecosystems of all kinds and sizes.

Best

Martin

 
Received on Friday, 22 October 2010 14:57:56 UTC

This archive was generated by hypermail 2.3.1 : Sunday, 31 March 2013 14:24:29 UTC