Invitation to contribute to DBpedia by improving the infobox mappings + New Scala-based Extraction Framework

Hi all,

in order to extract high quality data from Wikipedia, the DBpedia extraction
framework relies on infobox to ontology mappings which define how Wikipedia
infobox templates are mapped to classes of the DBpedia ontology.

Up to now, these mappings were defined only by the DBpedia team and as
Wikipedia is huge and contains lots of different infobox templates, we were
only able to define mappings for a small subset of all Wikipedia infoboxes
and also only managed to map a subset of the properties of these infoboxes.

In order to enable the DBpedia user community to contribute to improving the
coverage and the quality of the mappings, we have set up a public wiki at 

http://mappings.dbpedia.org/index.php/Main_Page 

which contains: 

1. all mappings that are currently used by the DBpedia extraction framework
2. the definition of the DBpedia ontology and
3. documentation for the DBpedia mapping language as well as step-by-step
guides on how to extend and refine mappings and the ontology.

So if you are using DBpedia data and you you were always annoyed that
DBpedia did not properly cover the infobox template that is most important
to you, you are highly invited to extend the mappings and the ontology in
the wiki. Your edits will be used for the next DBpedia release expected to
be published in the first week of April.

The process of contributing to the ontology and the mappings is as follows:

1.  You familiarize yourself with the DBpedia mapping language by reading
the documentation in the wiki.
2.  In order to prevent random SPAM, the wiki is read-only and new editors
need to be confirmed by a member of the DBpedia team (currently Anja
Jentzsch does the clearing). Therefore, please create an account in the wiki
for yourself. After this, Anja will give you editing rights and you can edit
the mappings as well as the ontology.
3. For contributing to the next DBpedia relase, you can edit until Sunday,
March 21. After this, we will check the mappings and the ontology definition
in the Wiki for consistency and then use both for the next DBpedia release.

So, we are starting kind of a social experiment on if the DBpedia user
community is willing to contribute to the improvement of DBpedia and on how
the DBpedia ontology develops through community contributions :-)

Please excuse, that it is currently still rather cumbersome to edit the
mappings and the ontology. We are currently working on a visual editor for
the mappings as well as a validation service, which will check edits to the
mappings and test the new mappings against example pages from Wikipedia. We
hope that we will be able to deploy these tools in the next two months, but
still wanted to release the wiki as early as possible in order to already
allow community contributions to the DBpedia 3.5 release.

If you have questions about the wiki and the mapping language, please ask
them on the DBpedia mailing list where Anja and Robert will answer them.

What else is happening around DBpedia?

In order to speed up the data extraction process and to lay a solid
foundation for the DBpedia Live extraction, we have ported the DBpedia
extraction framework from PHP to Scala/Java. The new framework extracts
exactly the same types of data from Wikipedia as the old framework, but
processes a single page now in 13 milliseconds instead of the 200
milliseconds. In addition, the new framework can extract data from tables
within articles and can handle multiple infobox templates per article. The
new framework is available under GPL license in the DBpedia SVN and is
documented at http://wiki.dbpedia.org/Documentation.

The whole DBpedia team is very thankful to two companies which enabled us to
do all this by sponsoring the DBpedia project:

1. Vulcan Inc. as part of its Project Halo (www.projecthalo.com). Vulcan
Inc. creates and advances a variety of world-class endeavors and high impact
initiatives that change and improve the way we live, learn, do business
(http://www.vulcan.com/).
2.  Neofonie GmbH, a Berlin-based company offering leading technologies in
the area of Web search, social media and mobile applications
(http://www.neofonie.de/index.jsp).

Thank you a lot for your support!

I personally would also like to thank:

1.  Anja Jentzsch, Robert Isele, and Christopher Sahnwaldt for all their
great work on implementing the new extraction framework and for setting up
the mapping wiki.
2.  Andreas Lange and Sidney Bofah for correcting and extending the mappings
in the Wiki.

Cheers, 

Chris 


--
Prof. Dr. Christian Bizer
Web-based Systems Group
Freie Universität Berlin
+49 30 838 55509
http://www.bizer.de
chris@bizer.de

Received on Friday, 12 March 2010 11:25:24 UTC