- From: <paul@ontology2.com>
- Date: Thu, 15 Nov 2012 13:59:39 -0500
- To: <zaveri@informatik.uni-leipzig.de>, "dbpedia-discussion" <dbpedia-discussion@lists.sourceforge.net>, <public-lod@w3.org>
- Cc: <dbpedia-data-quality@googlegroups.com>
I'd be pretty skeptical that the error rate for unpaid evaluators would
be less than the error rate in the data itself. Are you making it clear to
people what the standard of performance is? Are we supposed to check stuff
against a human reading of Wikipedia or actually verify the facts?
When I see data quality problems in Freebase or DBpedia they often
involve global properties that aren't detectable at the level of individual
nodes. For instance, there are the two great trees of living things and
geographical containment. Often these have obscure breakages at high level
nodes that will break any algorithm that assumes these things are trees.
And it generally turns out that things are sketchy at certain high level
nodes where some taxonomists introduce levels of classification that others
don't and don't get me started on those anglophone islands on the other side
of the English channel. In cases like that you can't count on getting
accurate answers from average people and your odds aren't even that good if
you ask an expert.
Certainly there is a lot of noise in the category assignments in
Wikipedia. It might be reasonable to expect people to flag incorrect
category assignments but without some global view, finding the ones that
are missing (maybe 40% of them in some cases) is too much to ask.
Received on Thursday, 15 November 2012 18:59:38 UTC