ANN: DBpedia 3.9 released, including wider infobox coverage, additional type statements, and new YAGO and Wikidata links

Hi all,

we are happy to announce the release of DBpedia 3.9.

The most important improvements of the new release compared to DBpedia 3.8
are:

1. the new release is based on updated Wikipedia dumps dating from March /
April 2013 (the 3.8 release was based on dumps from June 2012), leading to
an overall increase in the number of concepts in the English edition from
3.7 to 4.0 million things.

2. the DBpedia ontology is enlarged and the number of infobox to ontology
mappings has risen, leading to richer and cleaner concept descriptions.

3. we extended the DBpedia type system to also cover Wikipedia articles that
do not contain an infobox.

4. we provide links pointing from DBpedia concepts to Wikidata concepts and
updated the links pointing at YAGO concepts and classes, making it easier to
integrate knowledge from these sources.

The English version of the DBpedia knowledge base currently describes 4.0
million things, out of which 3.22 million are classified in a consistent
Ontology, including 832,000 persons, 639,000 places (including 427,000
populated places), 372,000 creative works (including 116,000 music albums,
78,000 films and 18,500 video games), 209,000 organizations (including
49,000 companies and 45,000 educational institutions), 226,000 species and
5,600 diseases.

We provide localized versions of DBpedia in 119 languages. All these
versions together describe 24.9 million things, out of which 16.8 million
overlap (are interlinked) with the concepts from the English DBpedia. The
full DBpedia data set features labels and abstracts for 12.6 million unique
things in 119 different languages; 24.6 million links to images and 27.6
million links to external web pages; 45.0 million external links into other
RDF datasets, 67.0 million links to Wikipedia categories, and 41.2 million
YAGO categories.

Altogether the DBpedia 3.9 release consists of 2.46 billion pieces of
information (RDF triples) out of which 470 million were extracted from the
English edition of Wikipedia, 1.98 billion were extracted from other
language editions, and about 45 million are links to external data sets.

Detailed statistics about the DBpedia data sets in 24 popular languages are
provided at http://wiki.dbpedia.org/Datasets39/DatasetStatistics

The main changes between DBpedia 3.8 and 3.9 are described below. For
additional, more detailed information please refer to the Change Log
(http://wiki.dbpedia.org/Changelog)


1. Enlarged Ontology

The DBpedia community added new classes and properties to the DBpedia
ontology via the mappings wiki. The DBpedia 3.9 ontology encompasses

529 classes (DBpedia 3.8: 359)
927 object properties (DBpedia 3.8: 800)
1290 datatype properties (DBpedia 3.8: 859)
116 specialized datatype properties (DBpedia 3.8: 116)
46 owl:equivalentClass and 31 owl:equivalentProperty mappings to
http://schema.org


2. Additional Infobox to Ontology Mappings

The editors of the mappings wiki also defined many new mappings from
Wikipedia templates to DBpedia classes. For the DBpedia 3.9 extraction, we
used 3177 mappings (DBpedia 3.8: 2347 mappings), that are distributed as
follows over the languages covered in the release.

English: 431 mappings
Polish: 382 mappings
Dutch: 335 mappings
German: 219 mappings
Greek: 215 mappings
Portuguese: 211 mappings
Slovenian: 170 mappings
French: 165 mappings
Korean: 148 mappings
Spanish: 137 mappings
Hungarian: 111 mappings
Turkish: 91 mappings
Japanese: 72 mappings
Czech: 66 mappings
Italian: 62 mappings
Bulgarian: 61 mappings
Indonesian: 59 mappings
Catalan: 52 mappings
Arabic: 51 mappings
Russian: 48 mappings
Croatian: 36 mappings
Basque: 32 mappings
Irish: 17 mappings
Bengali: 6 mappings


3. Extended Type System to cover Articles without Infobox

Until the DBpedia 3.8 release, a concept was only assigned a type (like
person or place) if the corresponding Wikipedia article contains an infobox
indicating this type. The new 3.9 release now also contains type statements
for articles without infobox that were inferred based on the link structure
within the DBpedia knowledge base using the algorithm described in
Paulheim/Bizer 2013 [1]. Applying the algorithm allowed us to provide type
information for 440,000 concepts that were formerly not typed. A similar
algorithm was also used to identify and remove potentially wrong links from
the knowledge base.


4. New and updated RDF Links into External Data Sources

We added RDF links to Wikidata and updated the following RDF link sets
pointing at other Linked Data sources: YAGO, Freebase, Geonames, GADM and
EUNIS. For an overview about all data sets that are interlinked from DBpedia
please refer to http://wiki.dbpedia.org/Interlinking


5. New Find Related Concepts Service

We offer a new service for finding resources that are related to a given
DBpedia seed resource. More information about the service is found at
http://wiki.dbpedia.org/FindRelated



Accessing the DBpedia 3.9  Release:

You can download the new DBpedia datasets from
http://wiki.dbpedia.org/Downloads39

As usual, the dataset is also available as Linked Data and via the DBpedia
SPARQL endpoint at http://dbpedia.org/sparql


Lots of thanks to:

* Jona Christopher Sahnwaldt (Freelancer funded by the University of
Mannheim, Germany) for improving the DBpedia extraction framework, for
extracting the DBpedia 3.9 data sets for all 119 languages, and for
generating the updated RDF links to external data sets.
* All editors that contributed to the DBpedia ontology mappings via the
Mappings Wiki.
* Heiko Paulheim (University of Mannheim, Germany) for inventing and
implementing the algorithm to generate additional type statements for
formerly untyped resources.
* The whole Internationalization Committee for pushing the DBpedia
internationalization forward.
* Dimitris Kontokostas (University of Leipzig) for improving the DBpedia
extraction framework and loading the new release onto the DBpedia download
server in Leipzig.
* Volha Bryl (University of Mannheim, Germany) for generating the statistics
about the new release.
* Petar Ristoski (University of Mannheim, Germany) for generating the
updated links pointing at the GADM database of Global Administrative Areas.
* Kingsley Idehen, Patrick van Kleef, and Mitko Iliev (all OpenLink
Software) for loading the new data set into the Virtuoso instance that
serves the Linked Data view and SPARQL endpoint.
* OpenLink Software (http://www.openlinksw.com/) altogether for providing
the server infrastructure for DBpedia.
* Julien Cojan, Andrea Di Menna, Ahmed Ktob, Julien Plu, Jim Regan and
others who contributed improvements to the DBpedia extraction framework via
the source code repository on GitHub.

The work on the DBpedia 3.9 release was financially supported by the
European Commission through the project LOD2 - Creating Knowledge out of
Linked Data (http://lod2.eu/).


More information about DBpedia is found at http://dbpedia.org/About as well
as in the new overview article [2] about the project.

Have fun with the new DBpedia release!

Cheers,

Christian Bizer and Christopher Sahnwaldt



[1] http://www.heikopaulheim.com/docs/iswc2013.pdf
[2] http://svn.aksw.org/papers/2013/SWJ_DBpedia/public.pdf

Received on Monday, 23 September 2013 10:27:30 UTC