ANN: DBpedia version 2016-04 released

FYI

On 10/15/16 9:14 AM, Markus Freudenberg wrote:
>
> Hereby we announce the release of DBpedia 2016-04. The new release is
> based on updated Wikipedia dumps dating from March/April 2016
> featuring a significantly expanded base of information as well as
> richer and (hopefully) cleaner data based on the DBpedia ontology.
>
>
> You can download the new DBpedia datasets in a variety of RDF-document
> formats from: http://wiki.dbpedia.org/downloads-2016-04or directly
> here: http://downloads.dbpedia.org/2016-04/
>
>
>     Support DBpedia
>
> During the latest DBpedia meeting in Leipzig we discussed about ways
> to support DBpedia <http://blog.dbpedia.org/?p=210>and what benefits
> this support would bring
> <http://wiki.dbpedia.org/why-is-dbpedia-so-important>. For the next
> two months, we are aiming to raise money to support the hosting of the
> main services and the next DBpedia release (especially to shorten
> release intervals). On top of that we need to buy a new server to host
> DBpedia Spotlight that was so generously hosted so far by third
> parties. If you use DBpedia and want us to keep going forward, we
> kindly invite you to donate here <http://wiki.dbpedia.org/donate>or
> become a member of the DBpedia association
> <http://wiki.dbpedia.org/membership>.
>
>
>     Statistics
>
> The English version of the DBpedia knowledge base currently describes
> 6.0M entities of which 4.6M have abstracts, 1.53M have geo coordinates
> and 1.6M depictions. In total, 5.2M resources are classified in a
> consistent ontology, consisting of 1.5M persons, 810K places
> (including 505K populated places), 490K works (including 135K music
> albums, 106K films and 20K video games), 275K organizations (including
> 67K companies and 53K educational institutions), 301K species and 5K
> diseases. The total number of resources in English DBpedia is 16.9M
> that, besides the 6.0M resources, includes 1.7M skos concepts
> (categories), 7.3M redirect pages, 260K disambiguation pages and 1.7M
> intermediate nodes.
>
>
> Altogether the DBpedia 2016-04 release consists of 9.5 billion
> (2015-10: 8.8 billion) pieces of information (RDF triples) out of
> which 1.3 billion (2015-10: 1.1 billion) were extracted from the
> English edition of Wikipedia, 5.0 billion (2015-04: 4.4 billion) were
> extracted from other language editions and 3.2 billion (2015-10: 3.2
> billion) from  DBpedia Commons and Wikidata. In general, we observed a
> growth in mapping-based statements of about 2%.
>
>
> Thorough statistics can be found on theDBpedia website
> <http://wiki.dbpedia.org/dbpedia-2016-04-statisticsdatasets/dataset-2015-10/dataset-2015-10-statistics>and
> general information on the DBpedia datasetshere
> <http://wiki.dbpedia.org/services-resources/datasets/dbpedia-datasets>.
>
>
>     Community
>
> The DBpedia community added new classes and properties to the DBpedia
> ontology via the mappings wiki. The DBpedia 2016-04 ontology encompasses:
>
>  *
>
>     754 classes (DBpedia 2015-10: 739)
>
>  *
>
>     1,103 object properties (DBpedia 2015-10: 1,099)
>
>  *
>
>     1,608 datatype properties (DBpedia 2015-10: 1,596)
>
>  *
>
>     132 specialized datatype properties (DBpedia 2015-10: 132)
>
>  *
>
>     410 owl:equivalentClass and 221 owl:equivalentProperty mappings
>     external vocabularies (DBpedia 2015-04: 407 - 221)
>
>
> The editor community of the mappings wiki also defined many new
> mappings from Wikipedia templates to DBpedia classes. For the DBpedia
> 2016-04 extraction, we used a total of 5800 template mappings (DBpedia
> 2015-10: 5553 mappings). For the second time the top language, gauged
> by the number of mappings, is Dutch (646 mappings), followed by the
> English community (604 mappings).
>
>
>     (Breaking) Changes
>
>  *
>
>     In addition to normalized datasets to English DBpedia (en-uris) we
>     additionally provide normalized datasets based on the DBpedia
>     Wikidata (DBw) datasets (wkd-uris). These sorted datasets will be
>     the foundation for the upcoming fusion process with wikidata. The
>     DBw-based uris will be the only ones provided from the following
>     releases on.
>
>  *
>
>     We now filter out triples from the Raw Infobox Extractor that are
>     already mapped. E.g. no more “<x> dbo:birthPlace <z>” and “<x>
>     dbp:birthPlace|dbp:placeOfBirth|... <z>” in the same resource.
>     These triples are now moved to the “infobox-properties-mapped”
>     datasets and not loaded on the main endpoint. See issue 22
>     <https://github.com/dbpedia/extraction-framework/issues/22>for
>     more details.
>
>  *
>
>     Major improvements in our citation extraction. See here
>     <http://www.mail-archive.com/dbpedia-discussion@lists.sourceforge.net/msg07762.html>for
>     more details.
>
>  *
>
>     We incorporated the statistical distribution approach
>     <http://www.heikopaulheim.com/docs/iswc2013.pdf>of Heiko Paulheim
>     in creating type statements automatically and providing them as an
>     additional datasets (instance_types_sdtyped_dbo).
>
>
> In case you missed it, what we changed in the previous release (2015-10)
>
>  *
>
>     English DBpedia switched to IRIs. This can be a breaking change to
>     some applications that need to change their stored DBpedia
>     resource URIs / links. We provide the “uri-same-as-iri” dataset
>     for English to ease the transition.
>
>  *
>
>     The instance-types dataset is now split into two files:
>     instance-types (containing only direct types) and
>     instance-types-transitive containing the transitive types of a
>     resource based on the DBpedia ontology
>
>  *
>
>     The mappingbased-properties file is now split into three (3) files:
>
>      o
>
>         “geo-coordinates-mappingbased” that contains the coordinated
>         originating from the mappings wiki. the “geo-coordinates”
>         continues to provide the coordinates originating from the
>         GeoExtractor
>
>      o
>
>         “mappingbased-literals” that contains mapping based fact with
>         literal values
>
>      o
>
>         “mappingbased-objects” that contains mapping based fact with
>         object values
>
>      o
>
>         the “mappingbased-objects-disjoint-[domain|range]” are facts
>         that are filtered out from the “mappingbased-objects” datasets
>         as errors but are still provided
>
>  *
>
>     We added a new extractor for citation data that provides two files:
>
>      o
>
>         citation links: linking resources to citations
>
>      o
>
>         citation data: trying to get additional data from citations.
>         This is a quite interesting dataset but we need help to clean
>         it up
>
>  *
>
>     All datasets are available in .ttl and .tql serialization (nt, nq
>     dataset were neglected for reasons of redundancy and server capacity).
>
>
>     Upcoming Changes
>
>  *
>
>     Dataset normalization: We are going to normalize datasets based on
>     wikidata uris and no longer on the English language edition, as a
>     prerequisite to finally start the fusion process with wikidata.
>
>  *
>
>     RML Integration: Wouter Maroy did already provide the necessary
>     groundwork for switching the mappings wiki to aRML based approach
>     <https://drive.google.com/file/d/0B7je1jgVmCgISXBPOHc3NDktblU/view?usp=sharing>on
>     Github. We are not there yet but this is at the top of our list of
>     changes.
>
>  *
>
>     Starting with the next release we are adding datasets with NIF
>     annotations
>     <http://persistence.uni-leipzig.org/nlp2rdf/ontologies/nif-core/nif-core.html>of
>     the abstracts (as we already provided those for the 2015-04
>     release
>     <http://downloads.dbpedia.org/2015-04/ext/nlp/abstracts/>). We
>     will eventually extend the NIF annotation dataset to cover the
>     whole Wikipedia article of a resource.
>
>
>     New Datasets
>
>  *
>
>     SDTypes:We extended the coverage of the automatically created type
>     statements (instance_types_sdtyped_dbo) to English, German and
>     Dutch (see above).
>
>  *
>
>     Extensions:In the extension folder (2016-04/ext
>     <http://downloads.dbpedia.org/2016-04/ext/>) we provide two new
>     datasets, both are to be considered in an experimental state:
>
>      o
>
>         DBpedia World Facts:This dataset is authored by the DBpedia
>         association itself. It lists all countries, all currencies in
>         use and (most) languages spoken in the world as well as how
>         these concepts relate to each other (spoken in, primary
>         language etc.) and useful properties like iso codes (ontology
>         diagram
>         <https://raw.githubusercontent.com/dbpedia/WorldFacts/master/DBpediaWorldFactsOntology.png>).
>         This Dataset extends the very useful LEXVO
>         <http://www.lexvo.org>dataset with facts from DBpedia and the
>         CIA Factbook
>         <https://www.cia.gov/library/publications/the-world-factbook/>.
>         Please report any error or suggestions in regard to this
>         dataset to Markus <mailto:markus.freudenberg@gmail.com>.
>
>      o
>
>         Lector Facts:This experimental dataset was provided by Matteo
>         Cannaviccio and demonstrates his approach
>         <http://dl.acm.org/citation.cfm?id=2932203>to generating facts
>         by using common sequences of words (i.e. phrases) that are
>         frequently used to describe instances of binary relations in a
>         text. We are looking into using this approach as a regular
>         extraction step. It would be helpful to get some feedback from
>         you.
>
>
>     
>
>
>     Credits
>
> Lots of thanks to
>
>  *
>
>     Markus Freudenberg (University of Leipzig / DBpedia Association)
>     for taking over the whole release process and creating the
>     revamped download & statistics pages.
>
>  *
>
>     Dimitris Kontokostas (University of Leipzig / DBpedia Association)
>     for conveying his considerable knowledge of the extraction and
>     release process.
>
>  *
>
>     All editors that contributed to the DBpedia ontology mappings via
>     the Mappings Wiki.
>
>  *
>
>     The whole DBpedia Internationalization Committee for pushing the
>     DBpedia internationalization forward.
>
>  *
>
>     Heiko Paulheim (University of Mannheim) for providing the
>     necessary code for his algorithm to generate additional type
>     statements for formerly untyped resources and identify and removed
>     wrong statements. Which is now part of the DIEF.
>
>  *
>
>     Václav Zeman, Thomas Klieger and the whole LHD team (University of
>     Prague) for their contribution of additional DBpedia types
>
>  *
>
>     Marco Fossati (FBK) for contributing the DBTax types
>
>  *
>
>     Alan Meehan (TCD) for performing a big external link cleanup
>
>  *
>
>     Aldo Gangemi (LIPN University, France & ISTC-CNR, Italy) for
>     providing the links from DOLCE to DBpedia ontology.
>
>  *
>
>     Kingsley Idehen, Patrick van Kleef, and Mitko Iliev (all OpenLink
>     Software) for loading the new data set into the Virtuoso instance
>     that provides 5-Star Linked Open Data publication and SPARQL Query
>     Services.
>
>  *
>
>     OpenLink Software (http://www.openlinksw.com/) collectively for
>     providing the SPARQL Query Services and Linked Open Data
>     publishing  infrastructure for DBpedia in addition to their
>     continuous infrastructure support.
>
>  *
>
>     Ruben Verborgh from Ghent University – iMinds for publishing the
>     dataset asTriple Pattern Fragments
>     <http://fragments.dbpedia.org/>, and iMinds for sponsoring
>     DBpedia’s Triple Pattern Fragments server.
>
>  *
>
>     Ali Ismayilov (University of Bonn) for extending the DBpedia
>     Wikidata dataset.
>
>  *
>
>     Vladimir Alexiev (Ontotext) for leading a successful mapping and
>     ontology clean up effort.
>
>  *
>
>     All the GSoC students and mentors which directly or indirectly
>     influenced the DBpedia release
>
>  *
>
>     Special thanks to members of theDBpedia Association
>     <http://dbpedia.org/dbpedia-association>, theAKSW
>     <http://aksw.org/About.html>and the department forBusiness
>     Information Systems
>     <http://bis.informatik.uni-leipzig.de/en/Welcome>of the University
>     of Leipzig.
>
>
>
>
> The work on the DBpedia 2016-04 release was financially supported by
> the European Commission through the project ALIGNED – quality-centric,
> software and data engineering  (http://aligned-project.eu/).
>
> More information about DBpedia is found athttp://dbpedia.org
> <http://dbpedia.org/>as well as in the new overview article about the
> project available athttp://wiki.dbpedia.org/Publications
> <http://wiki.dbpedia.org/Publications>.
>
> Have fun with the new DBpedia 2016-04 release!
>
> Cheers,
>
> Markus Freudenberg, Dimitris Kontokostas, Sebastian Hellmann


-- 
Regards,

Kingsley Idehen	      
Founder & CEO 
OpenLink Software   (Home Page: http://www.openlinksw.com)

Weblogs (Blogs):
Legacy Blog: http://www.openlinksw.com/blog/~kidehen/
Blogspot Blog: http://kidehen.blogspot.com
Medium Blog: https://medium.com/@kidehen

Profile Pages:
Pinterest: https://www.pinterest.com/kidehen/
Quora: https://www.quora.com/profile/Kingsley-Uyi-Idehen
Twitter: https://twitter.com/kidehen
Google+: https://plus.google.com/+KingsleyIdehen/about
LinkedIn: http://www.linkedin.com/in/kidehen

Web Identities (WebID):
Personal: http://kingsley.idehen.net/dataspace/person/kidehen#this
        : http://id.myopenlink.net/DAV/home/KingsleyUyiIdehen/Public/kingsley.ttl#this

Received on Saturday, 15 October 2016 15:44:16 UTC