ANN: DBpedia 3.4 released

Hi all,

we are happy to announce the release of DBpedia 3.4. The new release is
based on Wikipedia dumps dating from September 2009. 

The new DBpedia data set describes more than 2.9 million things, including
282,000 persons, 339,000 places, 88,000 music albums, 44,000 films, 15,000
video games, 119,000 organizations, 130,000 species and 4400 diseases. The
DBpedia data set now features labels and abstracts for these things in 91
different languages; 807,000 links to images and 3,840,000 links to external
web pages; 4,878,100 external links into other RDF datasets, 415,000
Wikipedia categories, and 75,000 YAGO categories. The data set consists of
479 million pieces of information (RDF triples) out of which 190 million
were extracted from the English edition of Wikipedia and 289 million were
extracted from other language editions. 

The new release provides the following improvements and changes compared to
the DBpedia 3.3 release:

1. the data set has been extracted from more recent Wikipedia dumps.
2. the data set now provides labels, abstracts and infobox data in 91
different languages.
3. we provide two different version of the DBpedia Infobox Ontology (loose
and strict) in order to meet different application requirements. Please
refer to http://wiki.dbpedia.org/Datasets#h18-11 for details.
4. as Wikipedia has moved to dual-licensing, we also dual-license DBpedia.
The DBpedia 3.4 data set is licensed under the terms of the Creative Commons
Attribution-ShareAlike 3.0 license and the GNU Free Documentation License.
5. the mapping-based infobox data extractor has been improved and now
normalizes units of measurement.
6. various bug fixes and improvements throughout the code base. Please refer
to the change log for the complete list http://wiki.dbpedia.org/Changelog

You can download the new DBpedia dataset from
http://wiki.dbpedia.org/Downloads34. As usual, the dataset is also available
as Linked Data and via the DBpedia SPARQL endpoint.

Lots of thanks to

* Anja Jentzsch, Christopher Sahnwaldt, Robert Isele, and Paul Kreis (all
Freie Universität Berlin) for improving the DBpedia extraction framework and
for extracting the new data set.
* Jens Lehmann and Sören Auer (both Universität Leipzig) for providing new
data set via the DBpedia download server at Universität Leipzig.
* Kingsley Idehen and Mitko Iliev (both OpenLink Software) for loading the
dataset into the Virtuoso instance that serves the Linked Data view and
SPARQL endpoint. OpenLink Software (http://www.openlinksw.com/) altogether
for providing the server infrastructure for DBpedia.
* neofonie GmbH (http://www.neofonie.de/index.jsp) for supporting the
DBpedia project by paying Christopher Sahnwaldt.

The next steps for the DBpedia project will be to

1. synchronize Wikipedia and DBpedia by deploying the DBpedia live
extraction which updates the DBpedia knowledge base immediately when a
Wikipedia article changes. 
2. enable the DBpedia user community to edit and maintain the DBpedia
ontology and the infobox mappings that are used by the extraction framework
in a public Wiki. 
3. increase the quality of the extracted data by improving and fine-tuning
the extraction code.

All this hopefully will happen soon.

More information about DBpedia is found at http://dbpedia.org/About


Have fun with the new data set!

Cheers

Chris Bizer


--
Chris Bizer
Web-based Systems Group
Freie Universität Berlin
+49 30 838 55509
http://www.bizer.de
chris@bizer.de

Received on Wednesday, 11 November 2009 12:20:42 UTC