ANN: DBpedia Version 2014 released

Hi all,

 

we are happy to announce the release of DBpedia 2014.

 

The most important improvements of the new release compared to DBpedia 3.9
are:

 

1. the new release is based on updated Wikipedia dumps dating from April /
May 2014 (the 3.9 release was based on dumps from March / April 2013),
leading to an overall increase of the number of things described in the
English edition from 4.26 to 4.58 million things.

 

2. the DBpedia ontology is enlarged and the number of infobox to ontology
mappings has risen, leading to richer and cleaner data.

 

The English version of the DBpedia knowledge base currently describes 4.58
million things, out of which 4.22 million are classified in a consistent
ontology (http://wiki.dbpedia.org/Ontology2014), including 1,445,000
persons, 735,000 places (including 478,000 populated places), 411,000
creative works (including 123,000 music albums, 87,000 films and 19,000
video games), 241,000 organizations (including 58,000 companies and 49,000
educational institutions), 251,000 species and 6,000 diseases.

 

We provide localized versions of DBpedia in 125 languages. All these
versions together describe 38.3 million things, out of which 23.8 million
are localized descriptions of things that also exist in the English version
of DBpedia. The full DBpedia data set features 38 million labels and
abstracts in 125 different languages, 25.2 million links to images and 29.8
million links to external web pages; 80.9 million links to Wikipedia
categories, and 41.2 million links to YAGO categories. DBpedia is connected
with other Linked Datasets by around 50 million RDF links.

 

Altogether the DBpedia 2014 release consists of 3 billion pieces of
information (RDF triples) out of which 580 million were extracted from the
English edition of Wikipedia, 2.46 billion were extracted from other
language editions.

 

Detailed statistics about the DBpedia data sets in 28 popular languages are
provided at Dataset Statistics page
(http://wiki.dbpedia.org/Datasets2014/DatasetStatistics).

 

The main changes between DBpedia 3.9 and 2014 are described below. For
additional, more detailed information please refer to the DBpedia Change Log
(http://wiki.dbpedia.org/Changelog).

 

1. Enlarged Ontology

 

The DBpedia community added new classes and properties to the DBpedia
ontology via the mappings wiki. The DBpedia 2014 ontology encompasses

 

685  classes (DBpedia 3.9: 529)

1,079 object properties (DBpedia 3.9: 927)

1,600 datatype properties (DBpedia 3.9: 1,290)

116 specialized datatype properties (DBpedia 3.9: 116)

47 owl:equivalentClass and 35 owl:equivalentProperty mappings to
http://schema.org

 

2. Additional Infobox to Ontology Mappings

 

The editors community of the mappings wiki also defined many new mappings
from Wikipedia templates to DBpedia classes. For the DBpedia 2014
extraction, we used 4,339 mappings (DBpedia 3.9: 3,177 mappings), which are
distributed as follows over the languages covered in the release.

 

English: 586 mappings

Dutch: 469 mappings

Serbian: 450 mappings

Polish: 383 mappings

German: 295 mappings

Greek: 281 mappings

French: 221 mappings

Portuguese: 211 mappings

Slovenian: 170 mappings

Korean: 148 mappings

Spanish: 137 mappings

Italian: 125 mappings

Belarusian: 125 mappings

Hungarian: 111 mappings

Turkish: 91 mappings

Japanese: 81 mappings

Czech: 66 mappings

Bulgarian: 61 mappings

Indonesian: 59 mappings

Catalan: 52 mappings

Arabic: 52 mappings

Russian: 48 mappings

Basque: 37 mappings

Croatian: 36 mappings

Irish: 17 mappings

Wiki-Commons: 12 mappings

Welsh: 7 mappings

Bengali: 6 mappings

Slovak: 2 Mappings

 

3. Extended Type System to cover Articles without Infobox

 

Until the DBpedia 3.8 release, a concept was only assigned a type (like
person or place) if the corresponding Wikipedia article contains an infobox
indicating this type. Starting from the 3.9 release, we provide type
statements for articles without infobox that are inferred based on the link
structure within the DBpedia knowledge base using the algorithm described in
Paulheim/Bizer 2014
(http://www.heikopaulheim.com/documents/ijswis_2014.pdf). For the new
release, an improved version of the algorithm was run to produce type
information for 400,000 things that were formerly not typed. A similar
algorithm (presented in the same paper) was used to identify and remove
potentially wrong statements from the knowledge base.

 

4. New and updated RDF Links into External Data Sources

 

We updated the following RDF link sets pointing at other Linked Data
sources: Freebase, Wikidata, Geonames and GADM. For an overview about all
data sets that are interlinked from DBpedia please refer to
http://wiki.dbpedia.org/Interlinking.

 

 

*** Accessing the DBpedia 2014 Release ***

 

You can download the new DBpedia datasets from
http://wiki.dbpedia.org/Downloads.

 

As usual, the new dataset is also available as Linked Data and via the
DBpedia SPARQL endpoint at http://dbpedia.org/sparql. 

 

 

*** Credits ***

 

Lots of thanks to

 

1.       Daniel Fleischhacker and Volha Bryl (University of Mannheim,
Germany) for improving the DBpedia extraction framework, for extracting the
DBpedia 2014 data sets for all 125 languages, for generating the updated RDF
links to external data sets, and for generating the statistics about the new
release.

2.       All editors that contributed to the DBpedia ontology mappings via
the Mappings Wiki.

3.       The whole DBpedia Internationalization Committee for pushing the
DBpedia internationalization forward.

4.       Dimitris Kontokostas (University of Leipzig) for improving the
DBpedia extraction framework and loading the new release onto the DBpedia
download server in Leipzig.

5.       Heiko Paulheim (University of Mannheim, Germany) for re-running his
algorithm to generate additional type statements for formerly untyped
resources and identify and removed wrong statements.

6.       Petar Ristoski (University of Mannheim, Germany) for generating the
updated links pointing at the GADM database of Global Administrative Areas.
Petar will also generate an updated release of DBpedia as Tables soon.

7.       Aldo Gangemi (LIPN University, France & ISTC-CNR, Italy) for
providing the links from DOLCE to DBpedia ontology.

8.       Kingsley Idehen, Patrick van Kleef, and Mitko Iliev (all OpenLink
Software) for loading the new data set into the Virtuoso instance that
serves the Linked Data view and SPARQL endpoint. 

9.       OpenLink Software (http://www.openlinksw.com/) altogether for
providing the server infrastructure for DBpedia.

10.   Michael Moore (University of Waterloo, as an intern at the University
of Mannheim) for implementing the anchor text extractor and and contribution
to the statistics scripts. 

11.   Ali Ismayilov (University of Bonn) for implementing Wikidata
extraction, on which the interlanguage link generation was based.

12.   Gaurav Vaidya (University of Colorado Boulder) for implementing and
running Wikimedia Commons extraction.

13.   Andrea Di Menna, Jona Christopher Sahnwaldt, Julien Cojan, Julien Plu,
Nilesh Chakraborty and others who contributed improvements to the DBpedia
extraction framework via the source code repository on GitHub. 

14.   All GSoC mentors and students for working directly or indirectly on
this release:
https://github.com/dbpedia/extraction-framework/graphs/contributors 

 

The work on the DBpedia 2014 release was financially supported by the
European Commission through the project LOD2 - Creating Knowledge out of
Linked Data (http://lod2.eu/).

 

More information about DBpedia is found at http://dbpedia.org/About as well
as in the new overview article about the project available at
http://wiki.dbpedia.org/Publications.

 

Have fun with the new DBpedia 2014 release!

 

Cheers,

 

Daniel Fleischhacker, Volha Bryl, and Christian Bizer

 

 

--

Prof. Dr. Christian Bizer

Data and Web Science Group

University of Mannheim, Germany 
chris@informatik.uni-mannheim.de

www.bizer.de

 

Received on Tuesday, 9 September 2014 09:07:34 UTC