W3C home > Mailing lists > Public > semantic-web@w3.org > April 2016

ANN: DBpedia Version 2015-10 released

From: Dimitris Kontokostas <jimkont@gmail.com>
Date: Mon, 4 Apr 2016 15:31:42 +0300
Message-ID: <CA+u4+a0GtWXE5zPpCB8KFddmEn_3dfo3Owc0M=JkvTYzs4CfXA@mail.gmail.com>
To: Linked Data community <public-lod@w3.org>, "semantic-web@w3.org" <semantic-web@w3.org>
A few days late but for some reason it could not be sent by Markus to
public-lod / semantic-web lists, enjoy the new release!


---------- Forwarded message ----------
From: Markus Freudenberg <markus.freudenberg@gmail.com>
Date: Thu, Mar 31, 2016 at 2:51 PM
Subject: [Dbpedia-discussion] ANN: DBpedia Version 2015-10 released
To: DBpedia <dbpedia-discussion@lists.sourceforge.net>,
dbpedia-ontology@list.sourceforge.net, semantic-web@w3.org,
public-lod@w3.org, wikidata@lists.wikimedia.org,

Hereby we announce the release of DBpedia 2015-10 (also known as: 2015 B).

This DBpedia release is based on updated Wikipedia dumps dating from
October 2015 featuring a significantly expanded base of information as well
as richer and (hopefully) cleaner data conforming to the DBpedia ontology.

You can download the new DBpedia datasets in RDF format from
http://wiki.dbpedia.org/Downloads2015-10 or directly here:


The English version of the DBpedia knowledge base currently describes 6.2M
things of which 4.6M have abstracts, 955K have geo coordinates and 1.54M
depictions. In total, 5M resources are classified in a consistent ontology
and consists of 1.6M persons, 800K places (including 500K populated
places), 480K works (including 133K music albums, 102K films and 20K video
games), 267K organizations (including 66K companies and 52K educational
institutions), 293K species and 5K diseases. The total number of resources
in English DBpedia is 16.4M that, besides the 4.6M resources with
abstracts, includes 1.3M skos concepts (categories), 7.1M redirect pages,
254K disambiguation pages and 1.6M intermediate nodes.

Altogether the DBpedia 2015-10 release consists of 8.8 billion (2015-04:
6.9 billion) pieces of information (RDF triples) out of which 1.1 billion
(2015-04: 737 million) were extracted from the English edition of
Wikipedia, 4.4 billion (2015-04: 3.8 billion) were extracted from other
language editions and 3.2 billion (2015-04: 2.4 billion) from  DBpedia
Commons and Wikidata. In general we observed a significant growth in raw
infobox and mapping-based statements of close to 10%.

Thorough statistics can be found on the DBpedia website
and general information on the DBpedia datasets here

The DBpedia community added new classes and properties to the DBpedia
ontology via the mappings wiki. The DBpedia 2015-10 ontology encompasses


   739 classes (DBpedia 2015-04: 735)

   1,099 object properties (DBpedia 2015-04: 1,098)

   1,596 datatype properties (DBpedia 2015-04: 1,583)

   132 specialized datatype properties (DBpedia 2015-04: 132)

   407 owl:equivalentClass and 222 owl:equivalentProperty mappings external
   vocabularies (DBpedia 2015-04: 408 - 200)

The editors community of the mappings wiki also defined many new mappings
from Wikipedia templates to DBpedia classes. For the DBpedia 2015-10
extraction, we used a total of 5553 template mappings (DBpedia 2015-04:
4317 mappings). For the first time the top language, gauged by number of
mappings, is Dutch (606 mappings), surpassing the English community (600
(Breaking) Changes


   English DBpedia switched to IRIs from URIs. Some URIs will not resolve
   and we provide the “uri-same-as-iri” dataset for English to ease the
   transition. For more technical details on this issue read section 6
   <http://svn.aksw.org/papers/2011/DBpedia_I18n/public.pdf> p. 19-23 (old
   but still valid)

   The instance-types dataset is now split to two files:

      instance-types (containing only direct types)

      Instance-types-transitive containing the transitive types of a
      resource based on the DBpedia ontology

   The mappingbased-properties file is now split in three (3) files:

      “geo-coordinates-mappingbased” that contains the coordinated
      originating from the mappings wiki. the “geo-coordinates” continues to
      provide the coordinates originating from the GeoExtractor

      “mappingbased-literals” that contains mapping based fact with literal

      “mappingbased-objects” that contains mapping based fact with object

      the “mappingbased-objects-disjoint-[domain|range]” are facts that are
      filtered out from the “mappingbased-objects” datasets as errors but are
      still provided

   We added a new extractor for citation data

   All datasets are available in .ttl and .tql serialization (nt, nq
   dataset were neglected for reasons of redundancy and server capacity).

   We are providing DBpedia as a Docker image.
   Dockerized-DBpedia <https://github.com/dbpedia/Dockerized-DBpedia>:
   Creates and runs an Virtuoso Open Source instance preloaded with the latest
   DBpedia dataset inside a Docker container.

   Starting with this release we provide extensive dataset metadata by
   adding DataIDs <http://dbpedia.org/projects/dbpedia-dataid> for all
   extracted languages to the respective language directories.

   In addition we revamped the dataset table on the download-page
   <http://wiki.dbpedia.org/Downloads2015-10>. It’s created dynamically
   based on the DataIDs of all languages. Likewise the tables on the
   is now based on files <http://downloads.dbpedia.org/2015-10/statistics/>
   providing information about all mapping languages.

   From now on forward we also include the original Wikipedia dump files
   alongside the extracted datasets (‘pages_articles.xml.bz2’).

   A complete changelog can always be found in the git log

Upcoming Changes


   We are working to move away from the mappings wiki but we will have at
   least one more mapping sprint.

   We have some cool ideas <http://wiki.dbpedia.org/ideas/> for gsoc this
   year. Additional mentors are more than welcome:)

Extended Type System to cover Articles without Infobox

Until the DBpedia 3.8 release, a concept was only assigned a type (like
person or place) if the corresponding Wikipedia article contains an infobox
indicating this type. Starting from the 3.9 release, we provide type
statements for articles without infobox that are inferred based on the link
structure within the DBpedia knowledge base using the algorithm described in
Paulheim/Bizer 2014 <http://www.heikopaulheim.com/documents/ijswis_2014.pdf>.
For the new release, an improved version of the algorithm was run to
produce type information for 400,000 things that were formerly not typed. A
similar algorithm (presented in the same paper) was used to identify and
remove potentially wrong statements from the knowledge base.

In addition, this release include four new type datasets, although not
included in the online sparql endpoint: 1) LHD datasets
<http://ner.vse.cz/datasets/linkedhypernyms/> for English, German and Dutch
and 2) DBTax
for English.

Both of these datasets use a typing system beyond the DBpedia ontology and
we provide a subset, mapped to the DBpedia ontology (dbo) and a full one
with all types (ext).


Lots of thanks to


   Markus Freudenberg (University of Leipzig / DBpedia Association) for
   taking over the whole release process and creating the revamped download &
   statistics pages.

   Dimitris Kontokostas (University of Leipzig / DBpedia Association) for
   conveying his considerable knowledge of the extraction and release process.

   Volha Bryl (University of Mannheim / Springer) for their work on
   previous releases and their continuous support in this release.

   All editors that contributed to the DBpedia ontology mappings via the
   Mappings Wiki.

   The whole DBpedia Internationalization Committee for pushing the DBpedia
   internationalization forward.

   Heiko Paulheim (University of Mannheim) for re-running his algorithm to
   generate additional type statements for formerly untyped resources and
   identify and removed wrong statements.

   Václav Zeman and the whole LHD team (University of Prague) for their
   contribution of additional DBpedia types

   Marco Fossati (FBK) for contributing the DBTax types

   Alan Meehan (TCD) for performing a big external link cleanup

   Aldo Gangemi (LIPN University, France & ISTC-CNR, Italy) for providing
   the links from DOLCE to DBpedia ontology.

   Kingsley Idehen, Patrick van Kleef, and Mitko Iliev (all OpenLink
   Software) for loading the new data set into the Virtuoso instance that
   provides 5-Star Linked Open Data publication and SPARQL Query Services.

   OpenLink Software (http://www.openlinksw.com/) altogether for providing
   the SPARQL Query Services and Linked Open Data publishing  infrastructure
   for DBpedia in addition to their continuous infrastructure support.

   Ruben Verborgh from Ghent University – iMinds for publishing the dataset
   as Triple Pattern Fragments <http://fragments.dbpedia.org>, and iMinds
   for sponsoring DBpedia’s Triple Pattern Fragments server.

   Ali Ismayilov (University of Bonn) for extending the DBpedia Wikidata

   Vladimir Alexiev (Ontotext) for leading a successful mapping and
   ontology clean up effort.

   All the GSoC students and mentors working directly or indirectly on the
   DBpedia release

   Special thanks to members of the DBpedia Association
   <http://dbpedia.org/dbpedia-association>, the AKSW
   <http://aksw.org/About.html> and the department for Business Information
   Systems <http://bis.informatik.uni-leipzig.de/en/Welcome> of the
   University of Leipzig.

The work on the DBpedia 2015-10 release was financially supported by the
European Commission through the project ALIGNED – quality-centric, software
and data engineering  (http://aligned-project.eu/).

More information about DBpedia is found at http://dbpedia.org as well as in
the new overview article about the project available at

Have fun with the new DBpedia 2015-10 release!


Markus Freudenberg, Dimitris Kontokostas, Sebastian Hellmann

Transform Data into Opportunity.
Accelerate data analysis in your applications with
Intel Data Analytics Acceleration Library.
Click to learn more.
Dbpedia-discussion mailing list

Kontokostas Dimitris
Received on Monday, 4 April 2016 12:32:33 UTC

This archive was generated by hypermail 2.3.1 : Monday, 4 April 2016 12:32:37 UTC