- From: Michael Brunnbauer <brunni@netestate.de>
- Date: Tue, 5 Apr 2016 14:08:13 +0200
- To: Dimitris Kontokostas <jimkont@gmail.com>
- Cc: "semantic-web@w3.org" <semantic-web@w3.org>
- Message-ID: <20160405120812.GA20204@netestate.de>
Hello Dimitris, I got DBpedia 2015-04 from http://downloads.dbpedia.org/2015-04/core/ It seems that http://downloads.dbpedia.org/2015-10/core/ also contains a reasonable and current subset of DBpedia. Is this correct? What is the difference to http://downloads.dbpedia.org/2015-10/core-i18n/en/ ? As a result of your switch to ttl, there is now a mixture of .nt and .ttl files in http://downloads.dbpedia.org/2015-10/core/ but the .ttl files do not seem to use full Turtle syntax at first glance. Can they be parsed as N-Triple? I ask because I usually had to fix some syntax errors before Jena would parse your N-Triples files. Regards, Michael Brunnbauer On Mon, Apr 04, 2016 at 03:31:42PM +0300, Dimitris Kontokostas wrote: > A few days late but for some reason it could not be sent by Markus to > public-lod / semantic-web lists, enjoy the new release! > > Dimitris > > ---------- Forwarded message ---------- > From: Markus Freudenberg <markus.freudenberg@gmail.com> > Date: Thu, Mar 31, 2016 at 2:51 PM > Subject: [Dbpedia-discussion] ANN: DBpedia Version 2015-10 released > To: DBpedia <dbpedia-discussion@lists.sourceforge.net>, > dbpedia-developers@list.sourceforge.net, > dbpedia-ontology@list.sourceforge.net, semantic-web@w3.org, > public-lod@w3.org, wikidata@lists.wikimedia.org, > dbp-spotlight-users@lists.sourceforge.net > > > Hereby we announce the release of DBpedia 2015-10 (also known as: 2015 B). > > > This DBpedia release is based on updated Wikipedia dumps dating from > October 2015 featuring a significantly expanded base of information as well > as richer and (hopefully) cleaner data conforming to the DBpedia ontology. > > You can download the new DBpedia datasets in RDF format from > http://wiki.dbpedia.org/Downloads2015-10 or directly here: > http://downloads.dbpedia.org/2015-10/. > > Statistics > > The English version of the DBpedia knowledge base currently describes 6.2M > things of which 4.6M have abstracts, 955K have geo coordinates and 1.54M > depictions. In total, 5M resources are classified in a consistent ontology > and consists of 1.6M persons, 800K places (including 500K populated > places), 480K works (including 133K music albums, 102K films and 20K video > games), 267K organizations (including 66K companies and 52K educational > institutions), 293K species and 5K diseases. The total number of resources > in English DBpedia is 16.4M that, besides the 4.6M resources with > abstracts, includes 1.3M skos concepts (categories), 7.1M redirect pages, > 254K disambiguation pages and 1.6M intermediate nodes. > > Altogether the DBpedia 2015-10 release consists of 8.8 billion (2015-04: > 6.9 billion) pieces of information (RDF triples) out of which 1.1 billion > (2015-04: 737 million) were extracted from the English edition of > Wikipedia, 4.4 billion (2015-04: 3.8 billion) were extracted from other > language editions and 3.2 billion (2015-04: 2.4 billion) from DBpedia > Commons and Wikidata. In general we observed a significant growth in raw > infobox and mapping-based statements of close to 10%. > > Thorough statistics can be found on the DBpedia website > <http://wiki.dbpedia.org/services-resources/datasets/dataset-2015-10/dataset-2015-10-statistics> > and general information on the DBpedia datasets here > <http://wiki.dbpedia.org/services-resources/datasets/dbpedia-datasets>. > Community > > The DBpedia community added new classes and properties to the DBpedia > ontology via the mappings wiki. The DBpedia 2015-10 ontology encompasses > > - > > 739 classes (DBpedia 2015-04: 735) > - > > 1,099 object properties (DBpedia 2015-04: 1,098) > - > > 1,596 datatype properties (DBpedia 2015-04: 1,583) > - > > 132 specialized datatype properties (DBpedia 2015-04: 132) > - > > 407 owl:equivalentClass and 222 owl:equivalentProperty mappings external > vocabularies (DBpedia 2015-04: 408 - 200) > > > The editors community of the mappings wiki also defined many new mappings > from Wikipedia templates to DBpedia classes. For the DBpedia 2015-10 > extraction, we used a total of 5553 template mappings (DBpedia 2015-04: > 4317 mappings). For the first time the top language, gauged by number of > mappings, is Dutch (606 mappings), surpassing the English community (600 > mappings). > (Breaking) Changes > > - > > English DBpedia switched to IRIs from URIs. Some URIs will not resolve > and we provide the ???uri-same-as-iri??? dataset for English to ease the > transition. For more technical details on this issue read section 6 > <http://svn.aksw.org/papers/2011/DBpedia_I18n/public.pdf> p. 19-23 (old > but still valid) > - > > The instance-types dataset is now split to two files: > - > > instance-types (containing only direct types) > - > > Instance-types-transitive containing the transitive types of a > resource based on the DBpedia ontology > - > > The mappingbased-properties file is now split in three (3) files: > - > > ???geo-coordinates-mappingbased??? that contains the coordinated > originating from the mappings wiki. the ???geo-coordinates??? continues to > provide the coordinates originating from the GeoExtractor > - > > ???mappingbased-literals??? that contains mapping based fact with literal > values > - > > ???mappingbased-objects??? that contains mapping based fact with object > values > - > > the ???mappingbased-objects-disjoint-[domain|range]??? are facts that are > filtered out from the ???mappingbased-objects??? datasets as errors but are > still provided > - > > We added a new extractor for citation data > - > > All datasets are available in .ttl and .tql serialization (nt, nq > dataset were neglected for reasons of redundancy and server capacity). > - > > We are providing DBpedia as a Docker image. > Dockerized-DBpedia <https://github.com/dbpedia/Dockerized-DBpedia>: > Creates and runs an Virtuoso Open Source instance preloaded with the latest > DBpedia dataset inside a Docker container. > - > > Starting with this release we provide extensive dataset metadata by > adding DataIDs <http://dbpedia.org/projects/dbpedia-dataid> for all > extracted languages to the respective language directories. > - > > In addition we revamped the dataset table on the download-page > <http://wiki.dbpedia.org/Downloads2015-10>. It???s created dynamically > based on the DataIDs of all languages. Likewise the tables on the > statistics-page > <http://wiki.dbpedia.org/services-resources/datasets/dataset-2015-10/dataset-2015-10-statistics> > is now based on files <http://downloads.dbpedia.org/2015-10/statistics/> > providing information about all mapping languages. > - > > From now on forward we also include the original Wikipedia dump files > alongside the extracted datasets (???pages_articles.xml.bz2???). > - > > A complete changelog can always be found in the git log > <https://github.com/dbpedia/extraction-framework/compare/DBpedia_2015-04...master> > > Upcoming Changes > > - > > We are working to move away from the mappings wiki but we will have at > least one more mapping sprint. > - > > We have some cool ideas <http://wiki.dbpedia.org/ideas/> for gsoc this > year. Additional mentors are more than welcome:) > > > Extended Type System to cover Articles without Infobox > > Until the DBpedia 3.8 release, a concept was only assigned a type (like > person or place) if the corresponding Wikipedia article contains an infobox > indicating this type. Starting from the 3.9 release, we provide type > statements for articles without infobox that are inferred based on the link > structure within the DBpedia knowledge base using the algorithm described in > Paulheim/Bizer 2014 <http://www.heikopaulheim.com/documents/ijswis_2014.pdf>. > For the new release, an improved version of the algorithm was run to > produce type information for 400,000 things that were formerly not typed. A > similar algorithm (presented in the same paper) was used to identify and > remove potentially wrong statements from the knowledge base. > > In addition, this release include four new type datasets, although not > included in the online sparql endpoint: 1) LHD datasets > <http://ner.vse.cz/datasets/linkedhypernyms/> for English, German and Dutch > and 2) DBTax > <http://it.dbpedia.org/2015/02/dbpedia-italiana-release-3-4-wikidata-e-dbtax/> > for English. > > Both of these datasets use a typing system beyond the DBpedia ontology and > we provide a subset, mapped to the DBpedia ontology (dbo) and a full one > with all types (ext). > > Credits > > Lots of thanks to > > - > > Markus Freudenberg (University of Leipzig / DBpedia Association) for > taking over the whole release process and creating the revamped download & > statistics pages. > - > > Dimitris Kontokostas (University of Leipzig / DBpedia Association) for > conveying his considerable knowledge of the extraction and release process. > - > > Volha Bryl (University of Mannheim / Springer) for their work on > previous releases and their continuous support in this release. > - > > All editors that contributed to the DBpedia ontology mappings via the > Mappings Wiki. > - > > The whole DBpedia Internationalization Committee for pushing the DBpedia > internationalization forward. > - > > Heiko Paulheim (University of Mannheim) for re-running his algorithm to > generate additional type statements for formerly untyped resources and > identify and removed wrong statements. > - > > Václav Zeman and the whole LHD team (University of Prague) for their > contribution of additional DBpedia types > - > > Marco Fossati (FBK) for contributing the DBTax types > - > > Alan Meehan (TCD) for performing a big external link cleanup > - > > Aldo Gangemi (LIPN University, France & ISTC-CNR, Italy) for providing > the links from DOLCE to DBpedia ontology. > - > > Kingsley Idehen, Patrick van Kleef, and Mitko Iliev (all OpenLink > Software) for loading the new data set into the Virtuoso instance that > provides 5-Star Linked Open Data publication and SPARQL Query Services. > - > > OpenLink Software (http://www.openlinksw.com/) altogether for providing > the SPARQL Query Services and Linked Open Data publishing infrastructure > for DBpedia in addition to their continuous infrastructure support. > - > > Ruben Verborgh from Ghent University ??? iMinds for publishing the dataset > as Triple Pattern Fragments <http://fragments.dbpedia.org>, and iMinds > for sponsoring DBpedia???s Triple Pattern Fragments server. > - > > Ali Ismayilov (University of Bonn) for extending the DBpedia Wikidata > dataset. > - > > Vladimir Alexiev (Ontotext) for leading a successful mapping and > ontology clean up effort. > - > > All the GSoC students and mentors working directly or indirectly on the > DBpedia release > - > > Special thanks to members of the DBpedia Association > <http://dbpedia.org/dbpedia-association>, the AKSW > <http://aksw.org/About.html> and the department for Business Information > Systems <http://bis.informatik.uni-leipzig.de/en/Welcome> of the > University of Leipzig. > > > The work on the DBpedia 2015-10 release was financially supported by the > European Commission through the project ALIGNED ??? quality-centric, software > and data engineering (http://aligned-project.eu/). > > More information about DBpedia is found at http://dbpedia.org as well as in > the new overview article about the project available at > http://wiki.dbpedia.org/Publications. > > Have fun with the new DBpedia 2015-10 release! > > Cheers, > > Markus Freudenberg, Dimitris Kontokostas, Sebastian Hellmann > > > ------------------------------------------------------------------------------ > Transform Data into Opportunity. > Accelerate data analysis in your applications with > Intel Data Analytics Acceleration Library. > Click to learn more. > http://pubads.g.doubleclick.net/gampad/clk?id=278785471&iu=/4140 > _______________________________________________ > Dbpedia-discussion mailing list > Dbpedia-discussion@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion > > > > > -- > Kontokostas Dimitris -- ++ Michael Brunnbauer ++ netEstate GmbH ++ Geisenhausener Straße 11a ++ 81379 München ++ Tel +49 89 32 19 77 80 ++ Fax +49 89 32 19 77 89 ++ E-Mail brunni@netestate.de ++ http://www.netestate.de/ ++ ++ Sitz: München, HRB Nr.142452 (Handelsregister B München) ++ USt-IdNr. DE221033342 ++ Geschäftsführer: Michael Brunnbauer, Franz Brunnbauer ++ Prokurist: Dipl. Kfm. (Univ.) Markus Hendel
Received on Tuesday, 5 April 2016 12:08:39 UTC