DBpedia - revised infobox extraction

Hi all,

in the past, DBpedia data extracted from Wikipedia infoboxes always
lacked some structure. We had no own ontology describing and structuring
our data, no own class or property definitions.

That made it quite difficult to query for example for "all people born
in Berlin". There were many different rdf predicates for "born in", such
as dbpedia:birthplace, dbpedia:placebirth, dbpedia:placeofbirth, etc.
And there was no canonical class hierarchy with working inference to
query for "all people".

I'm glad to announce that we've made an important first step to solving
these problems by creating the canonical ontology for DBpedia and
mappings for Wikipedia infobox. We've created a flat class hierarchy,
mapped Wikipedia templates to DBpedia classes and re-written the infobox
extraction code to be configurable on a very granular level.

I wrote a blog post with details and some extraction result statistics
[1]. 

A preview of the class hierarchy is here [2] (a fully browsable version
will follow soon).

The new infobox dataset is available at [3], the according dataset with
rdf:type statements at [4]. That data will be available soon in our
DBpedia sparql endpoint. I'll post some demo queries and make the
ontology available as rdfs as well.

Until that, have a look at the new data and let us know your thoughts.

Many thanks to Anja Jentzsch for her great help on building the
ontology.


Any comments are highly appreciated.

Cheers,
Georgi


[1]
http://blog.georgikobilarov.com/2008/10/dbpedia-rethinking-wikipedia-inf
obox-extraction/
[2] http://www4.wiwiss.fu-berlin.de/dbpedia/georgi/dataset/stats.htm
[3] http://www4.wiwiss.fu-berlin.de/dbpedia/dev/infobox/infobox.zip
[4] http://www4.wiwiss.fu-berlin.de/dbpedia/dev/infobox/types.zip

--
Georgi Kobilarov
Freie Universität Berlin
www.georgikobilarov.com

Received on Monday, 6 October 2008 16:34:10 UTC