W3C home > Mailing lists > Public > public-lod@w3.org > January 2011

ANN: DBpedia 3.6 released

From: Chris Bizer <chris@bizer.de>
Date: Mon, 17 Jan 2011 14:10:43 +0100
To: <dbpedia-announcements@lists.sourceforge.net>
Cc: <dbpedia-discussion@lists.sourceforge.net>, "'Semantic Web'" <semantic-web@w3.org>, "'public-lod'" <public-lod@w3.org>
Message-ID: <01df01cbb647$f29f71e0$d7de55a0$@bizer.de>
Hi all, 

we are happy to announce the release of DBpedia 3.6. The new release is
based on Wikipedia dumps dating from October/November 2010. 

The new DBpedia dataset describes more than 3.5 million things, of which
1.67 million are classified in a consistent ontology, including 364,000
persons, 462,000 places, 99,000 music albums, 54,000 films, 16,500 video
games, 148,000 organizations, 148,000 species and 5,200 diseases. 

The DBpedia dataset features labels and abstracts for 3.5 million things in
up to 97 different languages; 1,850,000 links to images and 5,900,000 links
to external web pages; 6,500,000 external links into other RDF datasets, and
632,000 Wikipedia categories. 

The dataset consists of 672 million pieces of information (RDF triples) out
of which 286 million were extracted from the English edition of Wikipedia
and 386 million were extracted from other language editions and links to
external datasets. 

Along with the release of the new datasets, we are happy to announce the
initial release of the DBpedia MappingTool
(http://mappings.dbpedia.org/index.php/MappingTool): a graphical user
interface to support the community in creating and editing mappings as well
as the ontology. 

The new release provides the following improvements and changes compared to
the DBpedia 3.5.1 release: 

1. Improved DBpedia Ontology as well as improved Infobox mappings using
http://mappings.dbpedia.org/. 

Furthermore, there are now also mappings in languages other than English.
These improvements are largely due to collective work by the community.
There are 13.8 million RDF statements based on mappings (11.1 million in
version 3.5.1). All this data is in the /ontology/ namespace. Note that this
data is of much higher quality than the Raw Infobox data in the /property/
namespace. 

Statistics of the mappings wiki on the date of release 3.6: 

+ Mappings: 
     + English: 315 Infobox mappings (covers 1124 templates including
redirects) 
     + Greek: 137 Infobox mappings (covers 192 templates including
redirects) 
     + Hungarian: 111 Infobox mappings (covers 151 templates including
redirects) 
     + Croatian: 36 Infobox mappings (covers 67 templates including
redirects) 
     + German: 9 Infobox mappings 
     + Slovenian: 4 Infobox mappings 
+ Ontology: 
     +  272 classes 
+  Properties: 
     + 629 object properties 
     + 706 datatype properties (they are all in the /datatype/ namespace) 

2.  Some commonly used property names changed. 

+ Please see http://dbpedia.org/ChangeLog and
http://dbpedia.org/Datasets/Properties to know which relations changed and
update your applications accordingly! 

3. New Datatypes for increased quality in mapping-based properties 

+ xsd:positiveInteger, xsd:nonNegativeInteger, xsd:nonPositiveInteger,
xsd:negativeInteger 

4. Improved parsing coverage.

+ Parsing of lists of elements in Infobox property values that improves the
completeness of extracted facts. 
+ Method to deal with missing repeated links in Infoboxes that do appear
somewhere else on the page. 
+ Flag templates are parsed. 
+ Various improvements on internationalization. 

5. Improved recognition of 

+ Wikipedia namespace identifiers. 
+ Wikipedia language codes. 
+ Category hierarchies. 

6. Disambiguation links for acronyms (all upper-case title) are now
extracted (for example, Kilobyte and Knowledge_base for "KB"): 

+ Wikilinks consisting of multiple words: If the starting letters of the
words appear in correct order (with possible gaps) and cover all acronym
letters. 
+ Wikilinks consisting of a single word: If the case-insensitive longest
common subsequence with the acronym is equal to the acronym. 

7. Encoding (bugfixes): 

+ The new datasets support the complete range of Unicode code points (up to
0x10ffff). 16-bit code points start with '\u', code points larger than
16-bits start with '\U'. 
+ Commas and ampersands do not get encoded anymore in URIs. Please see
http://dbpedia.org/URIencoding for an explanation regarding the DBpedia URI
encoding scheme. 

8. Extended Datasets: 

+ Thanks to Johannes Hoffart (Max-Planck-Institut für Informatik) for
contributing links to YAGO2. 
+ Freebase links have been updated. They now refer to mids
(http://wiki.freebase.com/wiki/Machine_ID) because guids have been
deprecated. 

You can download the new DBpedia dataset from http://dbpedia.org/Downloads36

As usual, the dataset is also available as Linked Data and via the DBpedia
SPARQL endpoint at http://dbpedia.org/sparql

Lots of thanks to: 

+ All editors that contributed to the DBpedia ontology mappings via the
Mappings Wiki. 
+ Max Jakob (Freie Universität Berlin, Germany) for improving the DBpedia
extraction framework and for extracting the new datasets. 
+ Robert Isele and Anja Jentzsch (both Freie Universität Berlin, Germany)
for helping Max with their expertise on the extraction framework. 
+ Paul Kreis (Freie Universität Berlin, Germany) for analyzing the DBpedia
data of the previous release and suggesting ways to increase quality and
quantity. Some results of his work were implemented in this release. 
+ Dimitris Kontokostas (Aristotle University of Thessaloniki, Greece), Jimmy
O'Regan (Eolaistriu Technologies, Ireland), José Paulo Leal (University of
Porto, Portugal) for providing patches to improve the extraction framework. 
+ Jens Lehmann and Sören Auer (both Universität Leipzig, Germany) for
providing the new dataset via the DBpedia download server at Universität
Leipzig. 
+ Kingsley Idehen and Mitko Iliev (both OpenLink Software) for loading the
dataset into the Virtuoso instance that serves the Linked Data view and
SPARQL endpoint. OpenLink Software (http://www.openlinksw.com/) altogether
for providing the server infrastructure for DBpedia. 

The work on the new release was financially supported by: 

+ Neofonie GmbH, a Berlin-based company offering leading technologies in the
area of Web search, social media and mobile applications
(http://www.neofonie.de/). 
+ The European Commission through the project LOD2 - Creating Knowledge out
of Linked Data (http://lod2.eu/). 
+ Vulcan Inc. as part of its Project Halo (http://www.projecthalo.com/).
Vulcan Inc. creates and advances a variety of world-class endeavors and high
impact initiatives that change and improve the way we live, learn, do
business (http://www.vulcan.com/). 

More information about DBpedia is found at http://dbpedia.org/About 

Have fun with the new dataset! 

The whole DBpedia team also congratulates Wikipedia to its 10th Birthday
which was this weekend! 

Cheers, 

Chris Bizer 


--
Prof. Dr. Christian Bizer
Web-based Systems Group
Freie Universität Berlin
+49 30 838 55509
http://www.bizer.de
chris@bizer.de
Received on Monday, 17 January 2011 13:10:10 UTC

This archive was generated by hypermail 2.3.1 : Sunday, 31 March 2013 14:24:31 UTC