- From: Roberto Mirizzi <roberto.mirizzi@gmail.com>
- Date: Tue, 18 Jan 2011 12:00:29 +0100
- To: Chris Bizer <chris@bizer.de>
- CC: dbpedia-announcements@lists.sourceforge.net, 'public-lod' <public-lod@w3.org>, 'Semantic Web' <semantic-web@w3.org>, dbpedia-discussion@lists.sourceforge.net
Hi all, and congratulations for the new release and all the improvements. Anyway, I've some remarks against the "Improved parsing coverage". :-) Looking at the new dataset, I've found an issue that in the previous dataset did not exist. For example, if you look at this page: http://dbpedia.org/page/PHP you will find that the object of the property "dbpprop:influencedBy" is a string "C, Perl, Java, C++, Tcl", while in the previous version of the dataset (I've loaded it on my local machine), there are five distinct triples having as object, respectively: "dbpedia:C_(programming_language)", "dbpedia:Java_(programming_language)", "dbpedia:Perl", "dbpedia:Tcl" and "dbpedia:C++_(programming_language)". I think this problem should be fixed, specially because a corresponding "ontology property" still does not exist for such properties. All the best, roberto Il 17/01/2011 14:10, Chris Bizer ha scritto: > Hi all, > > we are happy to announce the release of DBpedia 3.6. The new release is > based on Wikipedia dumps dating from October/November 2010. > > The new DBpedia dataset describes more than 3.5 million things, of which > 1.67 million are classified in a consistent ontology, including 364,000 > persons, 462,000 places, 99,000 music albums, 54,000 films, 16,500 video > games, 148,000 organizations, 148,000 species and 5,200 diseases. > > The DBpedia dataset features labels and abstracts for 3.5 million things in > up to 97 different languages; 1,850,000 links to images and 5,900,000 links > to external web pages; 6,500,000 external links into other RDF datasets, and > 632,000 Wikipedia categories. > > The dataset consists of 672 million pieces of information (RDF triples) out > of which 286 million were extracted from the English edition of Wikipedia > and 386 million were extracted from other language editions and links to > external datasets. > > Along with the release of the new datasets, we are happy to announce the > initial release of the DBpedia MappingTool > (http://mappings.dbpedia.org/index.php/MappingTool): a graphical user > interface to support the community in creating and editing mappings as well > as the ontology. > > The new release provides the following improvements and changes compared to > the DBpedia 3.5.1 release: > > 1. Improved DBpedia Ontology as well as improved Infobox mappings using > http://mappings.dbpedia.org/. > > Furthermore, there are now also mappings in languages other than English. > These improvements are largely due to collective work by the community. > There are 13.8 million RDF statements based on mappings (11.1 million in > version 3.5.1). All this data is in the /ontology/ namespace. Note that this > data is of much higher quality than the Raw Infobox data in the /property/ > namespace. > > Statistics of the mappings wiki on the date of release 3.6: > > + Mappings: > + English: 315 Infobox mappings (covers 1124 templates including > redirects) > + Greek: 137 Infobox mappings (covers 192 templates including > redirects) > + Hungarian: 111 Infobox mappings (covers 151 templates including > redirects) > + Croatian: 36 Infobox mappings (covers 67 templates including > redirects) > + German: 9 Infobox mappings > + Slovenian: 4 Infobox mappings > + Ontology: > + 272 classes > + Properties: > + 629 object properties > + 706 datatype properties (they are all in the /datatype/ namespace) > > 2. Some commonly used property names changed. > > + Please see http://dbpedia.org/ChangeLog and > http://dbpedia.org/Datasets/Properties to know which relations changed and > update your applications accordingly! > > 3. New Datatypes for increased quality in mapping-based properties > > + xsd:positiveInteger, xsd:nonNegativeInteger, xsd:nonPositiveInteger, > xsd:negativeInteger > > 4. Improved parsing coverage. > > + Parsing of lists of elements in Infobox property values that improves the > completeness of extracted facts. > + Method to deal with missing repeated links in Infoboxes that do appear > somewhere else on the page. > + Flag templates are parsed. > + Various improvements on internationalization. > > 5. Improved recognition of > > + Wikipedia namespace identifiers. > + Wikipedia language codes. > + Category hierarchies. > > 6. Disambiguation links for acronyms (all upper-case title) are now > extracted (for example, Kilobyte and Knowledge_base for "KB"): > > + Wikilinks consisting of multiple words: If the starting letters of the > words appear in correct order (with possible gaps) and cover all acronym > letters. > + Wikilinks consisting of a single word: If the case-insensitive longest > common subsequence with the acronym is equal to the acronym. > > 7. Encoding (bugfixes): > > + The new datasets support the complete range of Unicode code points (up to > 0x10ffff). 16-bit code points start with '\u', code points larger than > 16-bits start with '\U'. > + Commas and ampersands do not get encoded anymore in URIs. Please see > http://dbpedia.org/URIencoding for an explanation regarding the DBpedia URI > encoding scheme. > > 8. Extended Datasets: > > + Thanks to Johannes Hoffart (Max-Planck-Institut für Informatik) for > contributing links to YAGO2. > + Freebase links have been updated. They now refer to mids > (http://wiki.freebase.com/wiki/Machine_ID) because guids have been > deprecated. > > You can download the new DBpedia dataset from http://dbpedia.org/Downloads36 > > As usual, the dataset is also available as Linked Data and via the DBpedia > SPARQL endpoint at http://dbpedia.org/sparql > > Lots of thanks to: > > + All editors that contributed to the DBpedia ontology mappings via the > Mappings Wiki. > + Max Jakob (Freie Universität Berlin, Germany) for improving the DBpedia > extraction framework and for extracting the new datasets. > + Robert Isele and Anja Jentzsch (both Freie Universität Berlin, Germany) > for helping Max with their expertise on the extraction framework. > + Paul Kreis (Freie Universität Berlin, Germany) for analyzing the DBpedia > data of the previous release and suggesting ways to increase quality and > quantity. Some results of his work were implemented in this release. > + Dimitris Kontokostas (Aristotle University of Thessaloniki, Greece), Jimmy > O'Regan (Eolaistriu Technologies, Ireland), José Paulo Leal (University of > Porto, Portugal) for providing patches to improve the extraction framework. > + Jens Lehmann and Sören Auer (both Universität Leipzig, Germany) for > providing the new dataset via the DBpedia download server at Universität > Leipzig. > + Kingsley Idehen and Mitko Iliev (both OpenLink Software) for loading the > dataset into the Virtuoso instance that serves the Linked Data view and > SPARQL endpoint. OpenLink Software (http://www.openlinksw.com/) altogether > for providing the server infrastructure for DBpedia. > > The work on the new release was financially supported by: > > + Neofonie GmbH, a Berlin-based company offering leading technologies in the > area of Web search, social media and mobile applications > (http://www.neofonie.de/). > + The European Commission through the project LOD2 - Creating Knowledge out > of Linked Data (http://lod2.eu/). > + Vulcan Inc. as part of its Project Halo (http://www.projecthalo.com/). > Vulcan Inc. creates and advances a variety of world-class endeavors and high > impact initiatives that change and improve the way we live, learn, do > business (http://www.vulcan.com/). > > More information about DBpedia is found at http://dbpedia.org/About > > Have fun with the new dataset! > > The whole DBpedia team also congratulates Wikipedia to its 10th Birthday > which was this weekend! > > Cheers, > > Chris Bizer > > > -- > Prof. Dr. Christian Bizer > Web-based Systems Group > Freie Universität Berlin > +49 30 838 55509 > http://www.bizer.de > chris@bizer.de > > > > ------------------------------------------------------------------------------ > Protect Your Site and Customers from Malware Attacks > Learn about various malware tactics and how to avoid them. Understand > malware threats, the impact they can have on your business, and how you > can protect your company and customers by using code signing. > http://p.sf.net/sfu/oracle-sfdevnl > _______________________________________________ > Dbpedia-discussion mailing list > Dbpedia-discussion@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion -- Roberto Mirizzi Information System Lab Politecnico di Bari (Italy) http://sisinflab.poliba.it/mirizzi
Received on Tuesday, 18 January 2011 11:00:54 UTC