ANN: DBpedia Version 2015-04 released

Dear all,


we are happy to announce the release of DBpedia 2015-04 (also known as:
2015 A). The new release is based on updated Wikipedia dumps dating from
February/March 2015 and features an enlarged DBpedia ontology with more
infobox to ontology mappings, leading to richer and cleaner data.


http://wiki.dbpedia.org/Downloads2015-04


The English version of the DBpedia knowledge base currently describes 5.9M
things out of which 4.3M resources have abstracts, 452K geo coordinates and
1.45M depictions. In total, 4 million resources are classified in a
consistent ontology and consists of  2,06M persons, 682K places (including
455K populated places), 376K creative works (including 92K music albums,
90K films and 17K video games), 188K organizations (including 51K companies
and 33K educational institutions), 278K species and 5K diseases. The total
number of resources in English DBpedia is 15.3M that, besides the 5.9M
resources, includes 1.2M skos concepts (categories), 6.83M redirect pages, 256K
disambiguation pages and 1.13M intermediate nodes.


We provide localized versions of DBpedia in 128 languages. All these
versions together describe 38.3 million things, out of which 23.8 million
are localized descriptions of things that also exist in the English version
of DBpedia. The full DBpedia data set features 38 million labels and
abstracts in 128 different languages, 25.2 million links to images and 29.8
million links to external web pages; 80.9 million links to Wikipedia
categories, and 41.2 million links to YAGO categories. DBpedia is connected
with other Linked Datasets by around 50 million RDF links.


In addition we provide DBpedia datasets for Wikimedia Commons and Wikidata
<http://downloads.dbpedia.org/2015-04/ext/>.


Altogether the DBpedia 2015-04 release consists of 6.9 billion pieces of
information (RDF triples) out of which 737 million were extracted from the
English edition of Wikipedia, 3.76 billion were extracted from other
language editions and 2.4 billion from  DBpedia Commons and Wikidata.


Thorough statistics can be found on the DBpedia website
<http://wiki.dbpedia.org/services-resources/datasets/dataset-2015-04/dataset-2015-04-statistics>
and general information on the DBpedia datasets here
<http://wiki.dbpedia.org/services-resources/datasets/dbpedia-datasets>.


>From this release on we will try to provide two releases per year, one in
April and the next in October. The 2015-04 release was delayed by 3 months
but we will try to keep the schedule and release the 2015-10 at the end of
October or early November.


On our plans for the next release is to remove the URI encoding of English
DBpedia (dbpedia.org) and switch to IRIs only. This will simplify the
release process and will be aligned with all other DBpedia language
datasets. We know that this will probably break some links to DBpedia but
we feel is the only way to move forward. If you have any reasons against
this action, please let us know now.


A complete list of changes in this release can be found on GitHub
<https://github.com/dbpedia/extraction-framework/issues?q=milestone%3A2015-04+is%3Aclosed>
.


>From this release we adjusted the download page folder structure, giving us
more flexibility to offer more datasets in the near future

http://downloads.dbpedia.org/2015-04/


Enlarged Ontology

The DBpedia community added new classes and properties to the DBpedia
ontology via the mappings wiki. The DBpedia 2015 ontology encompasses

   -

   735 classes (DBpedia 2014: 685)
   -

   1,098 object properties (DBpedia 2014: 1079)
   -

   1,583 datatype properties (DBpedia 2014: 1,600)
   -

   132 specialized datatype properties (DBpedia 2014: 116)
   -

   408 owl:equivalentClass and 200 owl:equivalentProperty mappings external
   vocabularies


Additional Infobox to Ontology Mappings

The editors community of the mappings wiki also defined many new mappings
from Wikipedia templates to DBpedia classes. There are six new languages
with mappings: Arabic, Bulgarian, Armenian, Romanian, Swedish and Ukrainian.

For the DBpedia 2015 extraction, we used a total of 4317 template mappings
(DBpedia 2014: 3814 mappings).


Extended Type System to cover Articles without Infobox

Until the DBpedia 3.8 release, a concept was only assigned a type (like
person or place) if the corresponding Wikipedia article contains an infobox
indicating this type. Starting from the 3.9 release, we provide type
statements for articles without infobox that are inferred based on the link
structure within the DBpedia knowledge base using the algorithm described
in Paulheim/Bizer 2014
<http://www.heikopaulheim.com/documents/ijswis_2014.pdf>. For the new
release, an improved version of the algorithm was run to produce type
information for 400,000 things that were formerly not typed. A similar
algorithm (presented in the same paper) was used to identify and remove
potentially wrong statements from the knowledge base.

In addition, this release include four new type datasets, although not
included in the online sparql endpoint: 1) LHD datasets
<http://ner.vse.cz/datasets/linkedhypernyms/> for English, German and Dutch
and 2) DBTax
<http://it.dbpedia.org/2015/02/dbpedia-italiana-release-3-4-wikidata-e-dbtax/>
for English.

Both of these datasets use a typing system beyond the DBpedia ontology and
we provide a subset, mapped to the DBpedia ontology (dbo) and a full one
with all types (ext).


New and updated RDF Links into External Data Sources

We updated the following RDF link sets pointing at other Linked Data
sources: Freebase, Wikidata, Geonames and GADM.


Accessing the DBpedia 2015-04 Release

You can download the new DBpedia datasets in RDF format from
http://wiki.dbpedia.org/Downloads or

http://downloads.dbpedia.org/2015-04/


Additional external dataset contributions

>From the following releases we will provide additional datasets related to
DBpedia. For 2015-04 we provide a pagerank dataset for English and German,
provided by HPI.

http://downloads.dbpedia.org/2015-04/ext/


As usual, the new dataset is also published in 5-Star Linked Open Data form
and accessible via the SPARQL Query Service endpoint at
http://dbpedia.org/sparql and Triple Pattern Fragments service at
http://fragments.dbpedia.org/.
Credits

Lots of thanks to

   -

   Markus Freudenberg (University of Leipzig) for taking over the whole
   release process
   -

   Volha Bryl and Daniel Fleischhacker (University of Mannheim) for their
   work on the previous release and their continuous support in this release.
   -

   Alexandru Todor (University of Berlin) for contributing time and
   computing resources for the abstract extraction.
   -

   All editors that contributed to the DBpedia ontology mappings via the
   Mappings Wiki.
   -

   The whole DBpedia Internationalization Committee for pushing the DBpedia
   internationalization forward.
   -

   Heiko Paulheim (University of Mannheim) for re-running his algorithm to
   generate additional type statements for formerly untyped resources and
   identify and removed wrong statements.
   -

   Václav Zeman and the whole LHD team (University of Prague) for their
   contribution of additional DBpedia types
   -

   Marco Fossati (FBK) for contributing the DBTax types
   -

   Petar Ristoski (University of Mannheim) for generating the updated links
   pointing at the GADM database of Global Administrative Areas. Petar will
   also generate an updated release of DBpedia as Tables soon.
   -

   Aldo Gangemi (LIPN University, France & ISTC-CNR, Italy) for providing
   the links from DOLCE to DBpedia ontology.
   -

   Kingsley Idehen, Patrick van Kleef, and Mitko Iliev (all OpenLink
   Software) for loading the new data set into the Virtuoso instance that
   provides 5-Star Linked Open Data publication and SPARQL Query Services.
   -

   OpenLink Software (http://www.openlinksw.com/) altogether for providing
   the SPARQL Query Services and Linked Open Data publishing  infrastructure
   for DBpedia in addition to their continuous infrastructure support.
   -

   Ruben Verborgh from Ghent University – iMinds for publishing the dataset
   as Triple Pattern Fragments <http://fragments.dbpedia.org>, and iMinds
   for sponsoring DBpedia’s Triple Pattern Fragments server.
   -

   Magnus Knuth (HPI) for providing a pagerank dataset for English and
   German
   -

   Ali Ismayilov (University of Bonn) for implementing DBpedia Wikidata
   dataset.
   -

   Vladimir Alexiev (Ontotext) for leading a successful mapping and
   ontology clean up effort.
   -

   Nono314
   <https://github.com/dbpedia/extraction-framework/pulls?utf8=%E2%9C%93&q=is%3Apr+author%3ANono314>
   for contributing a lot of improvements and bug fixes in the extraction
   framework as well as other community members
   <https://github.com/dbpedia/extraction-framework/graphs/contributors?from=2014-09-01&to=2015-07-16&type=c>
   .
   -

   All the GSoC students and mentors working directly or indirectly on the
   DBpedia release


The work on the DBpedia 2015-04 release was financially supported by the
European Commission through the project ALIGNED – quality-centric, software
and data engineering  (http://aligned-project.eu/).

More information about DBpedia is found at http://dbpedia.org as well as in
the new overview article about the project available at
http://wiki.dbpedia.org/Publications.

Have fun with the new DBpedia 2015-04 release!


Cheers,

Markus Freudenberg, Dimitris Kontokostas, Sebastian Hellmann

Received on Thursday, 3 September 2015 14:22:07 UTC