- From: <paul@ontology2.com>
- Date: Thu, 15 Nov 2012 13:59:39 -0500
- To: <zaveri@informatik.uni-leipzig.de>, "dbpedia-discussion" <dbpedia-discussion@lists.sourceforge.net>, <public-lod@w3.org>
- Cc: <dbpedia-data-quality@googlegroups.com>
I'd be pretty skeptical that the error rate for unpaid evaluators would be less than the error rate in the data itself. Are you making it clear to people what the standard of performance is? Are we supposed to check stuff against a human reading of Wikipedia or actually verify the facts? When I see data quality problems in Freebase or DBpedia they often involve global properties that aren't detectable at the level of individual nodes. For instance, there are the two great trees of living things and geographical containment. Often these have obscure breakages at high level nodes that will break any algorithm that assumes these things are trees. And it generally turns out that things are sketchy at certain high level nodes where some taxonomists introduce levels of classification that others don't and don't get me started on those anglophone islands on the other side of the English channel. In cases like that you can't count on getting accurate answers from average people and your odds aren't even that good if you ask an expert. Certainly there is a lot of noise in the category assignments in Wikipedia. It might be reasonable to expect people to flag incorrect category assignments but without some global view, finding the ones that are missing (maybe 40% of them in some cases) is too much to ask.
Received on Thursday, 15 November 2012 18:59:38 UTC