- From: Paul Houle <ontology2@gmail.com>
- Date: Thu, 5 Aug 2010 10:07:32 -0400
- To: Jörn Hees <j_hees@cs.uni-kl.de>
- Cc: public-lod@w3.org
- Message-ID: <AANLkTinf-UP3VizFJ1cMoP+z8saoCURxKyX7Rd+sx+Vr@mail.gmail.com>
If you want to get something done with dbpedia, you should (i) work from the data dumps, or (ii) give up and use Freebase instead. I used to spend weeks figuring how to to clean up the mess in dbpedia until the day I wised up and realized I could do in 15 minutes w/ Freebase what takes 2 weeks to do w/ dbpedia, because w/ dbpedia you need to do a huge amount of data cleaning to get anything that makes sense. The issue here isn't primarily "RDF vs Freebase" but it's really a matter of the business model (or lack thereof) behind dbpedia; frankly, nobody gets excited when dbpedia doesn't work, and that's the problem. For instance, nobody at dbpedia seems to give a damn that dbpedia contains 3000 "countries", wheras there's more like 200 actual active countries in the world... Sure, it's great to have a category for things like "Austria-Hungary" and "The Teutonic Knights", but an awful lot of people give up on dbpedia when they see they can't easily get a list of very basic things, like a list of countries. Now, I was able to, more-or-less, define "active country" as a restriction type: anything that has an ISO country code in freebase is an active country, or is pretty close. The ISO codes aren't in dbpedia (because they're not in wikipedia infoboxes) so this can't be done with dbpedia: i'd probably need to code some complex rules that try to guess at this based on category memberships and what facts are available in the infobox. I complained on both dbpedia and freebase discussion lists, and found that: (i) nobody at dbpedia wants to do anything about this, and (ii) the people at freebase have investigated this and they are going to do something about it. -------- In my mind, anyway, the semantic web is a set of structured boxes. It's not like there's one "T Box" and one "A Box" but there are nested boxes of increasing specificity. In the systems I'm building, a Freebase-dbpedia merge is used as a sort of "T' Box" that helps to structure and interpret information that comes from other sources. With a little thinking about data structures, it's efficient to have a local copy of this data and use it as a skeleton that gets fleshed out with other stuff. Closed-world reasoning about this "taxonomic core" is useful in a number of ways, particularly in the detection of key integrity problems, data holes, inconsistencies, junk data, etc. I think the "dereference and merge" paradigm is useful once you've got the taxocore and you're merging little bits of high-qualtiy data, but w/o control of the taxocore you're just doomed.
Received on Thursday, 5 August 2010 14:08:05 UTC