W3C home > Mailing lists > Public > public-lod@w3.org > October 2008

DBpedia - revised infobox extraction

From: Georgi Kobilarov <georgi.kobilarov@gmx.de>
Date: Mon, 6 Oct 2008 18:34:10 +0200
Message-ID: <180C011CD4FF654AB4B73A9A5AD7472C0A4A85@aristoteles.zuhause.lan>
To: <dbpedia-discussion@lists.sourceforge.net>, <public-lod@w3.org>, <semantic-web@w3c.org>

Hi all,

in the past, DBpedia data extracted from Wikipedia infoboxes always
lacked some structure. We had no own ontology describing and structuring
our data, no own class or property definitions.

That made it quite difficult to query for example for "all people born
in Berlin". There were many different rdf predicates for "born in", such
as dbpedia:birthplace, dbpedia:placebirth, dbpedia:placeofbirth, etc.
And there was no canonical class hierarchy with working inference to
query for "all people".

I'm glad to announce that we've made an important first step to solving
these problems by creating the canonical ontology for DBpedia and
mappings for Wikipedia infobox. We've created a flat class hierarchy,
mapped Wikipedia templates to DBpedia classes and re-written the infobox
extraction code to be configurable on a very granular level.

I wrote a blog post with details and some extraction result statistics

A preview of the class hierarchy is here [2] (a fully browsable version
will follow soon).

The new infobox dataset is available at [3], the according dataset with
rdf:type statements at [4]. That data will be available soon in our
DBpedia sparql endpoint. I'll post some demo queries and make the
ontology available as rdfs as well.

Until that, have a look at the new data and let us know your thoughts.

Many thanks to Anja Jentzsch for her great help on building the

Any comments are highly appreciated.


[2] http://www4.wiwiss.fu-berlin.de/dbpedia/georgi/dataset/stats.htm
[3] http://www4.wiwiss.fu-berlin.de/dbpedia/dev/infobox/infobox.zip
[4] http://www4.wiwiss.fu-berlin.de/dbpedia/dev/infobox/types.zip

Georgi Kobilarov
Freie Universitšt Berlin
Received on Monday, 6 October 2008 16:34:06 UTC

This archive was generated by hypermail 2.3.1 : Wednesday, 7 January 2015 15:15:53 UTC