- From: Antoine Zimmermann <antoine.zimmermann@insa-lyon.fr>
- Date: Wed, 19 Jan 2011 16:49:01 +0100
- To: Chris Bizer <chris@bizer.de>
- CC: dbpedia-announcements@lists.sourceforge.net, dbpedia-discussion@lists.sourceforge.net, 'Semantic Web' <semantic-web@w3.org>, 'public-lod' <public-lod@w3.org>
Dear Chris and the DBpedia crew, As always, a new version of DBpedia is very good news for the Semantic Web and Linked Data. I was wondering: do you keep older versions of the DBpedia datasets? If yes, would you allow people to download older versions for research purposes? This would be very useful in order to study the dynamics of RDF data, or the dynamics of DBpedia itself. There are already papers on the dynamics of Wikipedia but I am not aware of corresponding work for DBPedia. Regards, AZ. Le 17/01/2011 14:10, Chris Bizer a écrit : > Hi all, > > we are happy to announce the release of DBpedia 3.6. The new release is > based on Wikipedia dumps dating from October/November 2010. > > The new DBpedia dataset describes more than 3.5 million things, of which > 1.67 million are classified in a consistent ontology, including 364,000 > persons, 462,000 places, 99,000 music albums, 54,000 films, 16,500 video > games, 148,000 organizations, 148,000 species and 5,200 diseases. > > The DBpedia dataset features labels and abstracts for 3.5 million things in > up to 97 different languages; 1,850,000 links to images and 5,900,000 links > to external web pages; 6,500,000 external links into other RDF datasets, and > 632,000 Wikipedia categories. > > The dataset consists of 672 million pieces of information (RDF triples) out > of which 286 million were extracted from the English edition of Wikipedia > and 386 million were extracted from other language editions and links to > external datasets. > > Along with the release of the new datasets, we are happy to announce the > initial release of the DBpedia MappingTool > (http://mappings.dbpedia.org/index.php/MappingTool): a graphical user > interface to support the community in creating and editing mappings as well > as the ontology. > > The new release provides the following improvements and changes compared to > the DBpedia 3.5.1 release: > > 1. Improved DBpedia Ontology as well as improved Infobox mappings using > http://mappings.dbpedia.org/. > > Furthermore, there are now also mappings in languages other than English. > These improvements are largely due to collective work by the community. > There are 13.8 million RDF statements based on mappings (11.1 million in > version 3.5.1). All this data is in the /ontology/ namespace. Note that this > data is of much higher quality than the Raw Infobox data in the /property/ > namespace. > > Statistics of the mappings wiki on the date of release 3.6: > > + Mappings: > + English: 315 Infobox mappings (covers 1124 templates including > redirects) > + Greek: 137 Infobox mappings (covers 192 templates including > redirects) > + Hungarian: 111 Infobox mappings (covers 151 templates including > redirects) > + Croatian: 36 Infobox mappings (covers 67 templates including > redirects) > + German: 9 Infobox mappings > + Slovenian: 4 Infobox mappings > + Ontology: > + 272 classes > + Properties: > + 629 object properties > + 706 datatype properties (they are all in the /datatype/ namespace) > > 2. Some commonly used property names changed. > > + Please see http://dbpedia.org/ChangeLog and > http://dbpedia.org/Datasets/Properties to know which relations changed and > update your applications accordingly! > > 3. New Datatypes for increased quality in mapping-based properties > > + xsd:positiveInteger, xsd:nonNegativeInteger, xsd:nonPositiveInteger, > xsd:negativeInteger > > 4. Improved parsing coverage. > > + Parsing of lists of elements in Infobox property values that improves the > completeness of extracted facts. > + Method to deal with missing repeated links in Infoboxes that do appear > somewhere else on the page. > + Flag templates are parsed. > + Various improvements on internationalization. > > 5. Improved recognition of > > + Wikipedia namespace identifiers. > + Wikipedia language codes. > + Category hierarchies. > > 6. Disambiguation links for acronyms (all upper-case title) are now > extracted (for example, Kilobyte and Knowledge_base for "KB"): > > + Wikilinks consisting of multiple words: If the starting letters of the > words appear in correct order (with possible gaps) and cover all acronym > letters. > + Wikilinks consisting of a single word: If the case-insensitive longest > common subsequence with the acronym is equal to the acronym. > > 7. Encoding (bugfixes): > > + The new datasets support the complete range of Unicode code points (up to > 0x10ffff). 16-bit code points start with '\u', code points larger than > 16-bits start with '\U'. > + Commas and ampersands do not get encoded anymore in URIs. Please see > http://dbpedia.org/URIencoding for an explanation regarding the DBpedia URI > encoding scheme. > > 8. Extended Datasets: > > + Thanks to Johannes Hoffart (Max-Planck-Institut für Informatik) for > contributing links to YAGO2. > + Freebase links have been updated. They now refer to mids > (http://wiki.freebase.com/wiki/Machine_ID) because guids have been > deprecated. > > You can download the new DBpedia dataset from http://dbpedia.org/Downloads36 > > As usual, the dataset is also available as Linked Data and via the DBpedia > SPARQL endpoint at http://dbpedia.org/sparql > > Lots of thanks to: > > + All editors that contributed to the DBpedia ontology mappings via the > Mappings Wiki. > + Max Jakob (Freie Universität Berlin, Germany) for improving the DBpedia > extraction framework and for extracting the new datasets. > + Robert Isele and Anja Jentzsch (both Freie Universität Berlin, Germany) > for helping Max with their expertise on the extraction framework. > + Paul Kreis (Freie Universität Berlin, Germany) for analyzing the DBpedia > data of the previous release and suggesting ways to increase quality and > quantity. Some results of his work were implemented in this release. > + Dimitris Kontokostas (Aristotle University of Thessaloniki, Greece), Jimmy > O'Regan (Eolaistriu Technologies, Ireland), José Paulo Leal (University of > Porto, Portugal) for providing patches to improve the extraction framework. > + Jens Lehmann and Sören Auer (both Universität Leipzig, Germany) for > providing the new dataset via the DBpedia download server at Universität > Leipzig. > + Kingsley Idehen and Mitko Iliev (both OpenLink Software) for loading the > dataset into the Virtuoso instance that serves the Linked Data view and > SPARQL endpoint. OpenLink Software (http://www.openlinksw.com/) altogether > for providing the server infrastructure for DBpedia. > > The work on the new release was financially supported by: > > + Neofonie GmbH, a Berlin-based company offering leading technologies in the > area of Web search, social media and mobile applications > (http://www.neofonie.de/). > + The European Commission through the project LOD2 - Creating Knowledge out > of Linked Data (http://lod2.eu/). > + Vulcan Inc. as part of its Project Halo (http://www.projecthalo.com/). > Vulcan Inc. creates and advances a variety of world-class endeavors and high > impact initiatives that change and improve the way we live, learn, do > business (http://www.vulcan.com/). > > More information about DBpedia is found at http://dbpedia.org/About > > Have fun with the new dataset! > > The whole DBpedia team also congratulates Wikipedia to its 10th Birthday > which was this weekend! > > Cheers, > > Chris Bizer > > > -- > Prof. Dr. Christian Bizer > Web-based Systems Group > Freie Universität Berlin > +49 30 838 55509 > http://www.bizer.de > chris@bizer.de > > > -- Antoine Zimmermann Researcher at: Laboratoire d'InfoRmatique en Image et Systèmes d'information Database Group 7 Avenue Jean Capelle 69621 Villeurbanne Cedex France Lecturer at: Institut National des Sciences Appliquées de Lyon 20 Avenue Albert Einstein 69621 Villeurbanne Cedex France antoine.zimmermann@insa-lyon.fr http://zimmer.aprilfoolsreview.com/
Received on Wednesday, 19 January 2011 15:49:39 UTC