W3C home > Mailing lists > Public > semantic-web@w3.org > April 2016

Re: ANN: DBpedia Version 2015-10 released

From: Michael Brunnbauer <brunni@netestate.de>
Date: Tue, 5 Apr 2016 14:08:13 +0200
To: Dimitris Kontokostas <jimkont@gmail.com>
Cc: "semantic-web@w3.org" <semantic-web@w3.org>
Message-ID: <20160405120812.GA20204@netestate.de>

Hello Dimitris,

I got DBpedia 2015-04 from 

 http://downloads.dbpedia.org/2015-04/core/

It seems that 

 http://downloads.dbpedia.org/2015-10/core/

also contains a reasonable and current subset of DBpedia. Is this correct?
What is the difference to 

 http://downloads.dbpedia.org/2015-10/core-i18n/en/

?

As a result of your switch to ttl, there is now a mixture of .nt and .ttl 
files in 

 http://downloads.dbpedia.org/2015-10/core/

but the .ttl files do not seem to use full Turtle syntax at first glance.
Can they be parsed as N-Triple? I ask because I usually had to fix some
syntax errors before Jena would parse your N-Triples files.

Regards,

Michael Brunnbauer

On Mon, Apr 04, 2016 at 03:31:42PM +0300, Dimitris Kontokostas wrote:
> A few days late but for some reason it could not be sent by Markus to
> public-lod / semantic-web lists, enjoy the new release!
> 
> Dimitris
> 
> ---------- Forwarded message ----------
> From: Markus Freudenberg <markus.freudenberg@gmail.com>
> Date: Thu, Mar 31, 2016 at 2:51 PM
> Subject: [Dbpedia-discussion] ANN: DBpedia Version 2015-10 released
> To: DBpedia <dbpedia-discussion@lists.sourceforge.net>,
> dbpedia-developers@list.sourceforge.net,
> dbpedia-ontology@list.sourceforge.net, semantic-web@w3.org,
> public-lod@w3.org, wikidata@lists.wikimedia.org,
> dbp-spotlight-users@lists.sourceforge.net
> 
> 
> Hereby we announce the release of DBpedia 2015-10 (also known as: 2015 B).
> 
> 
> This DBpedia release is based on updated Wikipedia dumps dating from
> October 2015 featuring a significantly expanded base of information as well
> as richer and (hopefully) cleaner data conforming to the DBpedia ontology.
> 
> You can download the new DBpedia datasets in RDF format from
> http://wiki.dbpedia.org/Downloads2015-10 or directly here:
> http://downloads.dbpedia.org/2015-10/.
> 
> Statistics
> 
> The English version of the DBpedia knowledge base currently describes 6.2M
> things of which 4.6M have abstracts, 955K have geo coordinates and 1.54M
> depictions. In total, 5M resources are classified in a consistent ontology
> and consists of 1.6M persons, 800K places (including 500K populated
> places), 480K works (including 133K music albums, 102K films and 20K video
> games), 267K organizations (including 66K companies and 52K educational
> institutions), 293K species and 5K diseases. The total number of resources
> in English DBpedia is 16.4M that, besides the 4.6M resources with
> abstracts, includes 1.3M skos concepts (categories), 7.1M redirect pages,
> 254K disambiguation pages and 1.6M intermediate nodes.
> 
> Altogether the DBpedia 2015-10 release consists of 8.8 billion (2015-04:
> 6.9 billion) pieces of information (RDF triples) out of which 1.1 billion
> (2015-04: 737 million) were extracted from the English edition of
> Wikipedia, 4.4 billion (2015-04: 3.8 billion) were extracted from other
> language editions and 3.2 billion (2015-04: 2.4 billion) from  DBpedia
> Commons and Wikidata. In general we observed a significant growth in raw
> infobox and mapping-based statements of close to 10%.
> 
> Thorough statistics can be found on the DBpedia website
> <http://wiki.dbpedia.org/services-resources/datasets/dataset-2015-10/dataset-2015-10-statistics>
> and general information on the DBpedia datasets here
> <http://wiki.dbpedia.org/services-resources/datasets/dbpedia-datasets>.
> Community
> 
> The DBpedia community added new classes and properties to the DBpedia
> ontology via the mappings wiki. The DBpedia 2015-10 ontology encompasses
> 
>    -
> 
>    739 classes (DBpedia 2015-04: 735)
>    -
> 
>    1,099 object properties (DBpedia 2015-04: 1,098)
>    -
> 
>    1,596 datatype properties (DBpedia 2015-04: 1,583)
>    -
> 
>    132 specialized datatype properties (DBpedia 2015-04: 132)
>    -
> 
>    407 owl:equivalentClass and 222 owl:equivalentProperty mappings external
>    vocabularies (DBpedia 2015-04: 408 - 200)
> 
> 
> The editors community of the mappings wiki also defined many new mappings
> from Wikipedia templates to DBpedia classes. For the DBpedia 2015-10
> extraction, we used a total of 5553 template mappings (DBpedia 2015-04:
> 4317 mappings). For the first time the top language, gauged by number of
> mappings, is Dutch (606 mappings), surpassing the English community (600
> mappings).
> (Breaking) Changes
> 
>    -
> 
>    English DBpedia switched to IRIs from URIs. Some URIs will not resolve
>    and we provide the ???uri-same-as-iri??? dataset for English to ease the
>    transition. For more technical details on this issue read section 6
>    <http://svn.aksw.org/papers/2011/DBpedia_I18n/public.pdf> p. 19-23 (old
>    but still valid)
>    -
> 
>    The instance-types dataset is now split to two files:
>    -
> 
>       instance-types (containing only direct types)
>       -
> 
>       Instance-types-transitive containing the transitive types of a
>       resource based on the DBpedia ontology
>       -
> 
>    The mappingbased-properties file is now split in three (3) files:
>    -
> 
>       ???geo-coordinates-mappingbased??? that contains the coordinated
>       originating from the mappings wiki. the ???geo-coordinates??? continues to
>       provide the coordinates originating from the GeoExtractor
>       -
> 
>       ???mappingbased-literals??? that contains mapping based fact with literal
>       values
>       -
> 
>       ???mappingbased-objects??? that contains mapping based fact with object
>       values
>       -
> 
>       the ???mappingbased-objects-disjoint-[domain|range]??? are facts that are
>       filtered out from the ???mappingbased-objects??? datasets as errors but are
>       still provided
>       -
> 
>    We added a new extractor for citation data
>    -
> 
>    All datasets are available in .ttl and .tql serialization (nt, nq
>    dataset were neglected for reasons of redundancy and server capacity).
>    -
> 
>    We are providing DBpedia as a Docker image.
>    Dockerized-DBpedia <https://github.com/dbpedia/Dockerized-DBpedia>:
>    Creates and runs an Virtuoso Open Source instance preloaded with the latest
>    DBpedia dataset inside a Docker container.
>    -
> 
>    Starting with this release we provide extensive dataset metadata by
>    adding DataIDs <http://dbpedia.org/projects/dbpedia-dataid> for all
>    extracted languages to the respective language directories.
>    -
> 
>    In addition we revamped the dataset table on the download-page
>    <http://wiki.dbpedia.org/Downloads2015-10>. It???s created dynamically
>    based on the DataIDs of all languages. Likewise the tables on the
>    statistics-page
>    <http://wiki.dbpedia.org/services-resources/datasets/dataset-2015-10/dataset-2015-10-statistics>
>    is now based on files <http://downloads.dbpedia.org/2015-10/statistics/>
>    providing information about all mapping languages.
>    -
> 
>    From now on forward we also include the original Wikipedia dump files
>    alongside the extracted datasets (???pages_articles.xml.bz2???).
>    -
> 
>    A complete changelog can always be found in the git log
>    <https://github.com/dbpedia/extraction-framework/compare/DBpedia_2015-04...master>
> 
> Upcoming Changes
> 
>    -
> 
>    We are working to move away from the mappings wiki but we will have at
>    least one more mapping sprint.
>    -
> 
>    We have some cool ideas <http://wiki.dbpedia.org/ideas/> for gsoc this
>    year. Additional mentors are more than welcome:)
> 
> 
> Extended Type System to cover Articles without Infobox
> 
> Until the DBpedia 3.8 release, a concept was only assigned a type (like
> person or place) if the corresponding Wikipedia article contains an infobox
> indicating this type. Starting from the 3.9 release, we provide type
> statements for articles without infobox that are inferred based on the link
> structure within the DBpedia knowledge base using the algorithm described in
> Paulheim/Bizer 2014 <http://www.heikopaulheim.com/documents/ijswis_2014.pdf>.
> For the new release, an improved version of the algorithm was run to
> produce type information for 400,000 things that were formerly not typed. A
> similar algorithm (presented in the same paper) was used to identify and
> remove potentially wrong statements from the knowledge base.
> 
> In addition, this release include four new type datasets, although not
> included in the online sparql endpoint: 1) LHD datasets
> <http://ner.vse.cz/datasets/linkedhypernyms/> for English, German and Dutch
> and 2) DBTax
> <http://it.dbpedia.org/2015/02/dbpedia-italiana-release-3-4-wikidata-e-dbtax/>
> for English.
> 
> Both of these datasets use a typing system beyond the DBpedia ontology and
> we provide a subset, mapped to the DBpedia ontology (dbo) and a full one
> with all types (ext).
> 
> Credits
> 
> Lots of thanks to
> 
>    -
> 
>    Markus Freudenberg (University of Leipzig / DBpedia Association) for
>    taking over the whole release process and creating the revamped download &
>    statistics pages.
>    -
> 
>    Dimitris Kontokostas (University of Leipzig / DBpedia Association) for
>    conveying his considerable knowledge of the extraction and release process.
>    -
> 
>    Volha Bryl (University of Mannheim / Springer) for their work on
>    previous releases and their continuous support in this release.
>    -
> 
>    All editors that contributed to the DBpedia ontology mappings via the
>    Mappings Wiki.
>    -
> 
>    The whole DBpedia Internationalization Committee for pushing the DBpedia
>    internationalization forward.
>    -
> 
>    Heiko Paulheim (University of Mannheim) for re-running his algorithm to
>    generate additional type statements for formerly untyped resources and
>    identify and removed wrong statements.
>    -
> 
>    Václav Zeman and the whole LHD team (University of Prague) for their
>    contribution of additional DBpedia types
>    -
> 
>    Marco Fossati (FBK) for contributing the DBTax types
>    -
> 
>    Alan Meehan (TCD) for performing a big external link cleanup
>    -
> 
>    Aldo Gangemi (LIPN University, France & ISTC-CNR, Italy) for providing
>    the links from DOLCE to DBpedia ontology.
>    -
> 
>    Kingsley Idehen, Patrick van Kleef, and Mitko Iliev (all OpenLink
>    Software) for loading the new data set into the Virtuoso instance that
>    provides 5-Star Linked Open Data publication and SPARQL Query Services.
>    -
> 
>    OpenLink Software (http://www.openlinksw.com/) altogether for providing
>    the SPARQL Query Services and Linked Open Data publishing  infrastructure
>    for DBpedia in addition to their continuous infrastructure support.
>    -
> 
>    Ruben Verborgh from Ghent University ??? iMinds for publishing the dataset
>    as Triple Pattern Fragments <http://fragments.dbpedia.org>, and iMinds
>    for sponsoring DBpedia???s Triple Pattern Fragments server.
>    -
> 
>    Ali Ismayilov (University of Bonn) for extending the DBpedia Wikidata
>    dataset.
>    -
> 
>    Vladimir Alexiev (Ontotext) for leading a successful mapping and
>    ontology clean up effort.
>    -
> 
>    All the GSoC students and mentors working directly or indirectly on the
>    DBpedia release
>    -
> 
>    Special thanks to members of the DBpedia Association
>    <http://dbpedia.org/dbpedia-association>, the AKSW
>    <http://aksw.org/About.html> and the department for Business Information
>    Systems <http://bis.informatik.uni-leipzig.de/en/Welcome> of the
>    University of Leipzig.
> 
> 
> The work on the DBpedia 2015-10 release was financially supported by the
> European Commission through the project ALIGNED ??? quality-centric, software
> and data engineering  (http://aligned-project.eu/).
> 
> More information about DBpedia is found at http://dbpedia.org as well as in
> the new overview article about the project available at
> http://wiki.dbpedia.org/Publications.
> 
> Have fun with the new DBpedia 2015-10 release!
> 
> Cheers,
> 
> Markus Freudenberg, Dimitris Kontokostas, Sebastian Hellmann
> 
> 
> ------------------------------------------------------------------------------
> Transform Data into Opportunity.
> Accelerate data analysis in your applications with
> Intel Data Analytics Acceleration Library.
> Click to learn more.
> http://pubads.g.doubleclick.net/gampad/clk?id=278785471&iu=/4140
> _______________________________________________
> Dbpedia-discussion mailing list
> Dbpedia-discussion@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion
> 
> 
> 
> 
> -- 
> Kontokostas Dimitris

-- 
++  Michael Brunnbauer
++  netEstate GmbH
++  Geisenhausener Straße 11a
++  81379 München
++  Tel +49 89 32 19 77 80
++  Fax +49 89 32 19 77 89 
++  E-Mail brunni@netestate.de
++  http://www.netestate.de/
++
++  Sitz: München, HRB Nr.142452 (Handelsregister B München)
++  USt-IdNr. DE221033342
++  Geschäftsführer: Michael Brunnbauer, Franz Brunnbauer
++  Prokurist: Dipl. Kfm. (Univ.) Markus Hendel

Received on Tuesday, 5 April 2016 12:08:39 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 5 April 2016 12:08:43 UTC