- From: Kingsley Idehen <kidehen@openlinksw.com>
- Date: Sat, 15 Oct 2016 11:43:48 -0400
- To: "public-lod@w3.org" <public-lod@w3.org>
- Cc: 'W3C Web Schemas Task Force' <public-vocabs@w3.org>, business-of-linked-data-bold <business-of-linked-data-bold@googlegroups.com>, Virtuoso-users <Virtuoso-users@lists.sourceforge.net>, bio2rdf <bio2rdf@googlegroups.com>
- Message-ID: <a67cfc5f-ef04-d387-0a5a-9a1996d46dc1@openlinksw.com>
FYI
On 10/15/16 9:14 AM, Markus Freudenberg wrote:
>
> Hereby we announce the release of DBpedia 2016-04. The new release is
> based on updated Wikipedia dumps dating from March/April 2016
> featuring a significantly expanded base of information as well as
> richer and (hopefully) cleaner data based on the DBpedia ontology.
>
>
> You can download the new DBpedia datasets in a variety of RDF-document
> formats from: http://wiki.dbpedia.org/downloads-2016-04or directly
> here: http://downloads.dbpedia.org/2016-04/
>
>
> Support DBpedia
>
> During the latest DBpedia meeting in Leipzig we discussed about ways
> to support DBpedia <http://blog.dbpedia.org/?p=210>and what benefits
> this support would bring
> <http://wiki.dbpedia.org/why-is-dbpedia-so-important>. For the next
> two months, we are aiming to raise money to support the hosting of the
> main services and the next DBpedia release (especially to shorten
> release intervals). On top of that we need to buy a new server to host
> DBpedia Spotlight that was so generously hosted so far by third
> parties. If you use DBpedia and want us to keep going forward, we
> kindly invite you to donate here <http://wiki.dbpedia.org/donate>or
> become a member of the DBpedia association
> <http://wiki.dbpedia.org/membership>.
>
>
> Statistics
>
> The English version of the DBpedia knowledge base currently describes
> 6.0M entities of which 4.6M have abstracts, 1.53M have geo coordinates
> and 1.6M depictions. In total, 5.2M resources are classified in a
> consistent ontology, consisting of 1.5M persons, 810K places
> (including 505K populated places), 490K works (including 135K music
> albums, 106K films and 20K video games), 275K organizations (including
> 67K companies and 53K educational institutions), 301K species and 5K
> diseases. The total number of resources in English DBpedia is 16.9M
> that, besides the 6.0M resources, includes 1.7M skos concepts
> (categories), 7.3M redirect pages, 260K disambiguation pages and 1.7M
> intermediate nodes.
>
>
> Altogether the DBpedia 2016-04 release consists of 9.5 billion
> (2015-10: 8.8 billion) pieces of information (RDF triples) out of
> which 1.3 billion (2015-10: 1.1 billion) were extracted from the
> English edition of Wikipedia, 5.0 billion (2015-04: 4.4 billion) were
> extracted from other language editions and 3.2 billion (2015-10: 3.2
> billion) from DBpedia Commons and Wikidata. In general, we observed a
> growth in mapping-based statements of about 2%.
>
>
> Thorough statistics can be found on theDBpedia website
> <http://wiki.dbpedia.org/dbpedia-2016-04-statisticsdatasets/dataset-2015-10/dataset-2015-10-statistics>and
> general information on the DBpedia datasetshere
> <http://wiki.dbpedia.org/services-resources/datasets/dbpedia-datasets>.
>
>
> Community
>
> The DBpedia community added new classes and properties to the DBpedia
> ontology via the mappings wiki. The DBpedia 2016-04 ontology encompasses:
>
> *
>
> 754 classes (DBpedia 2015-10: 739)
>
> *
>
> 1,103 object properties (DBpedia 2015-10: 1,099)
>
> *
>
> 1,608 datatype properties (DBpedia 2015-10: 1,596)
>
> *
>
> 132 specialized datatype properties (DBpedia 2015-10: 132)
>
> *
>
> 410 owl:equivalentClass and 221 owl:equivalentProperty mappings
> external vocabularies (DBpedia 2015-04: 407 - 221)
>
>
> The editor community of the mappings wiki also defined many new
> mappings from Wikipedia templates to DBpedia classes. For the DBpedia
> 2016-04 extraction, we used a total of 5800 template mappings (DBpedia
> 2015-10: 5553 mappings). For the second time the top language, gauged
> by the number of mappings, is Dutch (646 mappings), followed by the
> English community (604 mappings).
>
>
> (Breaking) Changes
>
> *
>
> In addition to normalized datasets to English DBpedia (en-uris) we
> additionally provide normalized datasets based on the DBpedia
> Wikidata (DBw) datasets (wkd-uris). These sorted datasets will be
> the foundation for the upcoming fusion process with wikidata. The
> DBw-based uris will be the only ones provided from the following
> releases on.
>
> *
>
> We now filter out triples from the Raw Infobox Extractor that are
> already mapped. E.g. no more “<x> dbo:birthPlace <z>” and “<x>
> dbp:birthPlace|dbp:placeOfBirth|... <z>” in the same resource.
> These triples are now moved to the “infobox-properties-mapped”
> datasets and not loaded on the main endpoint. See issue 22
> <https://github.com/dbpedia/extraction-framework/issues/22>for
> more details.
>
> *
>
> Major improvements in our citation extraction. See here
> <http://www.mail-archive.com/dbpedia-discussion@lists.sourceforge.net/msg07762.html>for
> more details.
>
> *
>
> We incorporated the statistical distribution approach
> <http://www.heikopaulheim.com/docs/iswc2013.pdf>of Heiko Paulheim
> in creating type statements automatically and providing them as an
> additional datasets (instance_types_sdtyped_dbo).
>
>
> In case you missed it, what we changed in the previous release (2015-10)
>
> *
>
> English DBpedia switched to IRIs. This can be a breaking change to
> some applications that need to change their stored DBpedia
> resource URIs / links. We provide the “uri-same-as-iri” dataset
> for English to ease the transition.
>
> *
>
> The instance-types dataset is now split into two files:
> instance-types (containing only direct types) and
> instance-types-transitive containing the transitive types of a
> resource based on the DBpedia ontology
>
> *
>
> The mappingbased-properties file is now split into three (3) files:
>
> o
>
> “geo-coordinates-mappingbased” that contains the coordinated
> originating from the mappings wiki. the “geo-coordinates”
> continues to provide the coordinates originating from the
> GeoExtractor
>
> o
>
> “mappingbased-literals” that contains mapping based fact with
> literal values
>
> o
>
> “mappingbased-objects” that contains mapping based fact with
> object values
>
> o
>
> the “mappingbased-objects-disjoint-[domain|range]” are facts
> that are filtered out from the “mappingbased-objects” datasets
> as errors but are still provided
>
> *
>
> We added a new extractor for citation data that provides two files:
>
> o
>
> citation links: linking resources to citations
>
> o
>
> citation data: trying to get additional data from citations.
> This is a quite interesting dataset but we need help to clean
> it up
>
> *
>
> All datasets are available in .ttl and .tql serialization (nt, nq
> dataset were neglected for reasons of redundancy and server capacity).
>
>
> Upcoming Changes
>
> *
>
> Dataset normalization: We are going to normalize datasets based on
> wikidata uris and no longer on the English language edition, as a
> prerequisite to finally start the fusion process with wikidata.
>
> *
>
> RML Integration: Wouter Maroy did already provide the necessary
> groundwork for switching the mappings wiki to aRML based approach
> <https://drive.google.com/file/d/0B7je1jgVmCgISXBPOHc3NDktblU/view?usp=sharing>on
> Github. We are not there yet but this is at the top of our list of
> changes.
>
> *
>
> Starting with the next release we are adding datasets with NIF
> annotations
> <http://persistence.uni-leipzig.org/nlp2rdf/ontologies/nif-core/nif-core.html>of
> the abstracts (as we already provided those for the 2015-04
> release
> <http://downloads.dbpedia.org/2015-04/ext/nlp/abstracts/>). We
> will eventually extend the NIF annotation dataset to cover the
> whole Wikipedia article of a resource.
>
>
> New Datasets
>
> *
>
> SDTypes:We extended the coverage of the automatically created type
> statements (instance_types_sdtyped_dbo) to English, German and
> Dutch (see above).
>
> *
>
> Extensions:In the extension folder (2016-04/ext
> <http://downloads.dbpedia.org/2016-04/ext/>) we provide two new
> datasets, both are to be considered in an experimental state:
>
> o
>
> DBpedia World Facts:This dataset is authored by the DBpedia
> association itself. It lists all countries, all currencies in
> use and (most) languages spoken in the world as well as how
> these concepts relate to each other (spoken in, primary
> language etc.) and useful properties like iso codes (ontology
> diagram
> <https://raw.githubusercontent.com/dbpedia/WorldFacts/master/DBpediaWorldFactsOntology.png>).
> This Dataset extends the very useful LEXVO
> <http://www.lexvo.org>dataset with facts from DBpedia and the
> CIA Factbook
> <https://www.cia.gov/library/publications/the-world-factbook/>.
> Please report any error or suggestions in regard to this
> dataset to Markus <mailto:markus.freudenberg@gmail.com>.
>
> o
>
> Lector Facts:This experimental dataset was provided by Matteo
> Cannaviccio and demonstrates his approach
> <http://dl.acm.org/citation.cfm?id=2932203>to generating facts
> by using common sequences of words (i.e. phrases) that are
> frequently used to describe instances of binary relations in a
> text. We are looking into using this approach as a regular
> extraction step. It would be helpful to get some feedback from
> you.
>
>
>
>
>
> Credits
>
> Lots of thanks to
>
> *
>
> Markus Freudenberg (University of Leipzig / DBpedia Association)
> for taking over the whole release process and creating the
> revamped download & statistics pages.
>
> *
>
> Dimitris Kontokostas (University of Leipzig / DBpedia Association)
> for conveying his considerable knowledge of the extraction and
> release process.
>
> *
>
> All editors that contributed to the DBpedia ontology mappings via
> the Mappings Wiki.
>
> *
>
> The whole DBpedia Internationalization Committee for pushing the
> DBpedia internationalization forward.
>
> *
>
> Heiko Paulheim (University of Mannheim) for providing the
> necessary code for his algorithm to generate additional type
> statements for formerly untyped resources and identify and removed
> wrong statements. Which is now part of the DIEF.
>
> *
>
> Václav Zeman, Thomas Klieger and the whole LHD team (University of
> Prague) for their contribution of additional DBpedia types
>
> *
>
> Marco Fossati (FBK) for contributing the DBTax types
>
> *
>
> Alan Meehan (TCD) for performing a big external link cleanup
>
> *
>
> Aldo Gangemi (LIPN University, France & ISTC-CNR, Italy) for
> providing the links from DOLCE to DBpedia ontology.
>
> *
>
> Kingsley Idehen, Patrick van Kleef, and Mitko Iliev (all OpenLink
> Software) for loading the new data set into the Virtuoso instance
> that provides 5-Star Linked Open Data publication and SPARQL Query
> Services.
>
> *
>
> OpenLink Software (http://www.openlinksw.com/) collectively for
> providing the SPARQL Query Services and Linked Open Data
> publishing infrastructure for DBpedia in addition to their
> continuous infrastructure support.
>
> *
>
> Ruben Verborgh from Ghent University – iMinds for publishing the
> dataset asTriple Pattern Fragments
> <http://fragments.dbpedia.org/>, and iMinds for sponsoring
> DBpedia’s Triple Pattern Fragments server.
>
> *
>
> Ali Ismayilov (University of Bonn) for extending the DBpedia
> Wikidata dataset.
>
> *
>
> Vladimir Alexiev (Ontotext) for leading a successful mapping and
> ontology clean up effort.
>
> *
>
> All the GSoC students and mentors which directly or indirectly
> influenced the DBpedia release
>
> *
>
> Special thanks to members of theDBpedia Association
> <http://dbpedia.org/dbpedia-association>, theAKSW
> <http://aksw.org/About.html>and the department forBusiness
> Information Systems
> <http://bis.informatik.uni-leipzig.de/en/Welcome>of the University
> of Leipzig.
>
>
>
>
> The work on the DBpedia 2016-04 release was financially supported by
> the European Commission through the project ALIGNED – quality-centric,
> software and data engineering (http://aligned-project.eu/).
>
> More information about DBpedia is found athttp://dbpedia.org
> <http://dbpedia.org/>as well as in the new overview article about the
> project available athttp://wiki.dbpedia.org/Publications
> <http://wiki.dbpedia.org/Publications>.
>
> Have fun with the new DBpedia 2016-04 release!
>
> Cheers,
>
> Markus Freudenberg, Dimitris Kontokostas, Sebastian Hellmann
--
Regards,
Kingsley Idehen
Founder & CEO
OpenLink Software (Home Page: http://www.openlinksw.com)
Weblogs (Blogs):
Legacy Blog: http://www.openlinksw.com/blog/~kidehen/
Blogspot Blog: http://kidehen.blogspot.com
Medium Blog: https://medium.com/@kidehen
Profile Pages:
Pinterest: https://www.pinterest.com/kidehen/
Quora: https://www.quora.com/profile/Kingsley-Uyi-Idehen
Twitter: https://twitter.com/kidehen
Google+: https://plus.google.com/+KingsleyIdehen/about
LinkedIn: http://www.linkedin.com/in/kidehen
Web Identities (WebID):
Personal: http://kingsley.idehen.net/dataspace/person/kidehen#this
: http://id.myopenlink.net/DAV/home/KingsleyUyiIdehen/Public/kingsley.ttl#this
Attachments
- application/pkcs7-signature attachment: S/MIME Cryptographic Signature
Received on Saturday, 15 October 2016 15:44:15 UTC