- From: Dimitris Kontokostas <jimkont@gmail.com>
- Date: Mon, 4 Apr 2016 15:31:42 +0300
- To: Linked Data community <public-lod@w3.org>, "semantic-web@w3.org" <semantic-web@w3.org>
- Message-ID: <CA+u4+a0GtWXE5zPpCB8KFddmEn_3dfo3Owc0M=JkvTYzs4CfXA@mail.gmail.com>
A few days late but for some reason it could not be sent by Markus to public-lod / semantic-web lists, enjoy the new release! Dimitris ---------- Forwarded message ---------- From: Markus Freudenberg <markus.freudenberg@gmail.com> Date: Thu, Mar 31, 2016 at 2:51 PM Subject: [Dbpedia-discussion] ANN: DBpedia Version 2015-10 released To: DBpedia <dbpedia-discussion@lists.sourceforge.net>, dbpedia-developers@list.sourceforge.net, dbpedia-ontology@list.sourceforge.net, semantic-web@w3.org, public-lod@w3.org, wikidata@lists.wikimedia.org, dbp-spotlight-users@lists.sourceforge.net Hereby we announce the release of DBpedia 2015-10 (also known as: 2015 B). This DBpedia release is based on updated Wikipedia dumps dating from October 2015 featuring a significantly expanded base of information as well as richer and (hopefully) cleaner data conforming to the DBpedia ontology. You can download the new DBpedia datasets in RDF format from http://wiki.dbpedia.org/Downloads2015-10 or directly here: http://downloads.dbpedia.org/2015-10/. Statistics The English version of the DBpedia knowledge base currently describes 6.2M things of which 4.6M have abstracts, 955K have geo coordinates and 1.54M depictions. In total, 5M resources are classified in a consistent ontology and consists of 1.6M persons, 800K places (including 500K populated places), 480K works (including 133K music albums, 102K films and 20K video games), 267K organizations (including 66K companies and 52K educational institutions), 293K species and 5K diseases. The total number of resources in English DBpedia is 16.4M that, besides the 4.6M resources with abstracts, includes 1.3M skos concepts (categories), 7.1M redirect pages, 254K disambiguation pages and 1.6M intermediate nodes. Altogether the DBpedia 2015-10 release consists of 8.8 billion (2015-04: 6.9 billion) pieces of information (RDF triples) out of which 1.1 billion (2015-04: 737 million) were extracted from the English edition of Wikipedia, 4.4 billion (2015-04: 3.8 billion) were extracted from other language editions and 3.2 billion (2015-04: 2.4 billion) from DBpedia Commons and Wikidata. In general we observed a significant growth in raw infobox and mapping-based statements of close to 10%. Thorough statistics can be found on the DBpedia website <http://wiki.dbpedia.org/services-resources/datasets/dataset-2015-10/dataset-2015-10-statistics> and general information on the DBpedia datasets here <http://wiki.dbpedia.org/services-resources/datasets/dbpedia-datasets>. Community The DBpedia community added new classes and properties to the DBpedia ontology via the mappings wiki. The DBpedia 2015-10 ontology encompasses - 739 classes (DBpedia 2015-04: 735) - 1,099 object properties (DBpedia 2015-04: 1,098) - 1,596 datatype properties (DBpedia 2015-04: 1,583) - 132 specialized datatype properties (DBpedia 2015-04: 132) - 407 owl:equivalentClass and 222 owl:equivalentProperty mappings external vocabularies (DBpedia 2015-04: 408 - 200) The editors community of the mappings wiki also defined many new mappings from Wikipedia templates to DBpedia classes. For the DBpedia 2015-10 extraction, we used a total of 5553 template mappings (DBpedia 2015-04: 4317 mappings). For the first time the top language, gauged by number of mappings, is Dutch (606 mappings), surpassing the English community (600 mappings). (Breaking) Changes - English DBpedia switched to IRIs from URIs. Some URIs will not resolve and we provide the “uri-same-as-iri” dataset for English to ease the transition. For more technical details on this issue read section 6 <http://svn.aksw.org/papers/2011/DBpedia_I18n/public.pdf> p. 19-23 (old but still valid) - The instance-types dataset is now split to two files: - instance-types (containing only direct types) - Instance-types-transitive containing the transitive types of a resource based on the DBpedia ontology - The mappingbased-properties file is now split in three (3) files: - “geo-coordinates-mappingbased” that contains the coordinated originating from the mappings wiki. the “geo-coordinates” continues to provide the coordinates originating from the GeoExtractor - “mappingbased-literals” that contains mapping based fact with literal values - “mappingbased-objects” that contains mapping based fact with object values - the “mappingbased-objects-disjoint-[domain|range]” are facts that are filtered out from the “mappingbased-objects” datasets as errors but are still provided - We added a new extractor for citation data - All datasets are available in .ttl and .tql serialization (nt, nq dataset were neglected for reasons of redundancy and server capacity). - We are providing DBpedia as a Docker image. Dockerized-DBpedia <https://github.com/dbpedia/Dockerized-DBpedia>: Creates and runs an Virtuoso Open Source instance preloaded with the latest DBpedia dataset inside a Docker container. - Starting with this release we provide extensive dataset metadata by adding DataIDs <http://dbpedia.org/projects/dbpedia-dataid> for all extracted languages to the respective language directories. - In addition we revamped the dataset table on the download-page <http://wiki.dbpedia.org/Downloads2015-10>. It’s created dynamically based on the DataIDs of all languages. Likewise the tables on the statistics-page <http://wiki.dbpedia.org/services-resources/datasets/dataset-2015-10/dataset-2015-10-statistics> is now based on files <http://downloads.dbpedia.org/2015-10/statistics/> providing information about all mapping languages. - From now on forward we also include the original Wikipedia dump files alongside the extracted datasets (‘pages_articles.xml.bz2’). - A complete changelog can always be found in the git log <https://github.com/dbpedia/extraction-framework/compare/DBpedia_2015-04...master> Upcoming Changes - We are working to move away from the mappings wiki but we will have at least one more mapping sprint. - We have some cool ideas <http://wiki.dbpedia.org/ideas/> for gsoc this year. Additional mentors are more than welcome:) Extended Type System to cover Articles without Infobox Until the DBpedia 3.8 release, a concept was only assigned a type (like person or place) if the corresponding Wikipedia article contains an infobox indicating this type. Starting from the 3.9 release, we provide type statements for articles without infobox that are inferred based on the link structure within the DBpedia knowledge base using the algorithm described in Paulheim/Bizer 2014 <http://www.heikopaulheim.com/documents/ijswis_2014.pdf>. For the new release, an improved version of the algorithm was run to produce type information for 400,000 things that were formerly not typed. A similar algorithm (presented in the same paper) was used to identify and remove potentially wrong statements from the knowledge base. In addition, this release include four new type datasets, although not included in the online sparql endpoint: 1) LHD datasets <http://ner.vse.cz/datasets/linkedhypernyms/> for English, German and Dutch and 2) DBTax <http://it.dbpedia.org/2015/02/dbpedia-italiana-release-3-4-wikidata-e-dbtax/> for English. Both of these datasets use a typing system beyond the DBpedia ontology and we provide a subset, mapped to the DBpedia ontology (dbo) and a full one with all types (ext). Credits Lots of thanks to - Markus Freudenberg (University of Leipzig / DBpedia Association) for taking over the whole release process and creating the revamped download & statistics pages. - Dimitris Kontokostas (University of Leipzig / DBpedia Association) for conveying his considerable knowledge of the extraction and release process. - Volha Bryl (University of Mannheim / Springer) for their work on previous releases and their continuous support in this release. - All editors that contributed to the DBpedia ontology mappings via the Mappings Wiki. - The whole DBpedia Internationalization Committee for pushing the DBpedia internationalization forward. - Heiko Paulheim (University of Mannheim) for re-running his algorithm to generate additional type statements for formerly untyped resources and identify and removed wrong statements. - Václav Zeman and the whole LHD team (University of Prague) for their contribution of additional DBpedia types - Marco Fossati (FBK) for contributing the DBTax types - Alan Meehan (TCD) for performing a big external link cleanup - Aldo Gangemi (LIPN University, France & ISTC-CNR, Italy) for providing the links from DOLCE to DBpedia ontology. - Kingsley Idehen, Patrick van Kleef, and Mitko Iliev (all OpenLink Software) for loading the new data set into the Virtuoso instance that provides 5-Star Linked Open Data publication and SPARQL Query Services. - OpenLink Software (http://www.openlinksw.com/) altogether for providing the SPARQL Query Services and Linked Open Data publishing infrastructure for DBpedia in addition to their continuous infrastructure support. - Ruben Verborgh from Ghent University – iMinds for publishing the dataset as Triple Pattern Fragments <http://fragments.dbpedia.org>, and iMinds for sponsoring DBpedia’s Triple Pattern Fragments server. - Ali Ismayilov (University of Bonn) for extending the DBpedia Wikidata dataset. - Vladimir Alexiev (Ontotext) for leading a successful mapping and ontology clean up effort. - All the GSoC students and mentors working directly or indirectly on the DBpedia release - Special thanks to members of the DBpedia Association <http://dbpedia.org/dbpedia-association>, the AKSW <http://aksw.org/About.html> and the department for Business Information Systems <http://bis.informatik.uni-leipzig.de/en/Welcome> of the University of Leipzig. The work on the DBpedia 2015-10 release was financially supported by the European Commission through the project ALIGNED – quality-centric, software and data engineering (http://aligned-project.eu/). More information about DBpedia is found at http://dbpedia.org as well as in the new overview article about the project available at http://wiki.dbpedia.org/Publications. Have fun with the new DBpedia 2015-10 release! Cheers, Markus Freudenberg, Dimitris Kontokostas, Sebastian Hellmann ------------------------------------------------------------------------------ Transform Data into Opportunity. Accelerate data analysis in your applications with Intel Data Analytics Acceleration Library. Click to learn more. http://pubads.g.doubleclick.net/gampad/clk?id=278785471&iu=/4140 _______________________________________________ Dbpedia-discussion mailing list Dbpedia-discussion@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion -- Kontokostas Dimitris
Received on Monday, 4 April 2016 12:32:33 UTC