Re: [semanticweb] ANN: DBpedia 3.5 released

A fact of my experience since many years:

The homepage of my grandma is better accessible than the flagship(!) of  
'linked data' dbpedia.org...

On Mon, 12 Apr 2010 12:06:01 +0300, Chris Bizer <chris@bizer.de> wrote:

> Hi all,
>
> we are happy to announce the release of DBpedia 3.5.
>
> The new release is based on Wikipedia dumps dating from March 2010. 
> Compared
> to the 3.4 release, we were able to increase the quality of the DBpedia
> knowledge base by employing a new data extraction framework which applies
> various data cleansing heuristics as well as by extending the
> infobox-to-ontology mappings that guide the data extraction process.
>
> The new DBpedia knowledge base describes more than 3.4 million things,out
> of which 1.47 million are classified in a consistent ontology, including
> 312,000 persons, 413,000 places, 94,000 music albums, 49,000 films,15,000
> video games, 140,000 organizations, 146,000 species and 4,600 diseases. 
> The
> DBpedia data set features labels and abstracts for these 3.2 million 
> things
> in up to 92 different languages; 1,460,000 links to images and 5,543,000
> links to external web pages; 4,887,000 external links into other RDF
> datasets, 565,000 Wikipedia categories, and 75,000 YAGO categories. The
> DBpedia knowledge base altogether consists of over 1 billion pieces of
> information (RDF triples) out of which 257 million were extracted fromthe
> English edition of Wikipedia and 766 million were extracted from other
> language editions.
>
> The new release provides the following improvements and changes compared 
> to
> the DBpedia 3.4 release:
>
> 1. The DBpedia extraction framework has been completely rewritten in 
> Scala.
> The new framework dramatically reduces the extraction time of a single
> Wikipedia article from over 200 to about 13 milliseconds. All features of
> the previous PHP framework have been ported. In addition, the new 
> framework
> can extract data from Wikipedia tables based on table-to-ontologymappings
> and is able to extract multiple infoboxes out of a single Wikipedia 
> article.
> The data from each infobox is represented as a separate RDF resource. All
> resources that are extracted from a single page can be connected using
> custom RDF properties which are also defined in the mappings. A lot of 
> work
> also went into the value parsers and the DBpedia 3.5 dataset should
> therefore be much cleaner than its predecessors. In addition, units of
> measurement are normalized to their respective SI unit, which makes 
> querying
> DBpedia easier.
>
> 2. The mapping language that is used to map Wikipedia infoboxes to the
> DBpedia Ontology has been redesigned. The documentation of the newmapping
> language is found at
> http://dbpedia.svn.sourceforge.net/viewvc/dbpedia/trunk/extraction/core/doc/
> mapping%20language/
>
> 3. In order to enable the DBpedia user community to extend and refine the
> infobox to ontology mappings, the mappings can be edited on the newly
> created wiki hosted on http://mappings.dbpedia.org. At the moment, 303
> template mappings are defined, which cover (including redirects) 1055
> templates. On the wiki, the DBpedia Ontology can be edited by the 
> community
> as well. At the moment, the ontology consists of 259 classes and about 
> 1,200
> properties.
> 4. The ontology properties extracted from infoboxes are now split intotwo
> data sets: 1. The Ontology Infobox Properties dataset contains the
> properties as they are defined in the ontology (e.g. length). The range 
> of a
> property is either an xsd schema type or a dimension of measurement, in
> which case the value is normalized to the respective SI unit. 2. The
> Ontology Infobox Properties (Specific) dataset contains properties which
> have been specialized for a specific class using a specific unit. e.g.the
> property height is specialized on the class Person using the unit
> centimeters instead of meters. For further details please refer to
> http://wiki.dbpedia.org/Datasets#h18-11.
> 5. The framework now resolves template redirects, making it possible to
> cover all redirects to an infobox on Wikipedia with a single mapping.
>
> 6. Three new extractors have been implemented: 1. PageIdExtractor 
> extracting
> Wikipedia page IDs are extracted for each page. 2. RevisionExtractor
> extracting the latest revision of a page. 3. PNDExtractor extracting PND
> (Personnamendatei) identifiers.
>
> 7. The data set now provides labels, abstracts, page links and infobox 
> data
> in 92 different languages, which have been extracted from recentWikipedia
> dumps as of March 2010.
>
> 8. In addition the N-Triples datasets, N-Quads datasets are providedwhich
> include a provenance URI to each statement. The provenance URI denotesthe
> origin of the extracted triple in Wikipedia (For details see:
> http://wiki.dbpedia.org/Datasets#h18-18).
>
> You can download the new DBpedia dataset from
> http://wiki.dbpedia.org/Downloads35. As usual, the data set is also
> available as Linked Data and via the DBpedia SPARQL endpoint.
>
> Lots of thanks to:
>
> * Robert Isele, Anja Jentzsch, Christopher Sahnwaldt, and Paul Kreis (all
> Freie Universität Berlin) for reimplementing the DBpedia extraction
> framework in Scala, for extending the infobox-to-ontology mappings andfor
> extracting the new DBpedia 3.5 knowledge base.
>
> * Jens Lehmann and Sören Auer (both Universität Leipzig) for providingthe
> knowledge base via the DBpedia download server at Universität Leipzig.
>
> * Kingsley Idehen and Mitko Iliev (both OpenLink Software) for loadingthe
> knowledge base into the Virtuoso instance that serves the Linked Dataview
> and SPARQL endpoint.
>
> The whole DBpedia team is very thankful to three companies which enabled 
> us
> to do all this by supporting and sponsoring the DBpedia project:
>
> * Neofonie GmbH (http://www.neofonie.de/index.jsp), a Berlin-basedcompany
> offering leading technologies in the area of Web search, social media and
> mobile applications.
>
> * Vulcan Inc. as part of its Project Halo (www.projecthalo.com). Vulcan 
> Inc.
> creates and advances a variety of world-class endeavors and high impact
> initiatives that change and improve the way we live, learn, do business
> (http://www.vulcan.com/).
>
> * OpenLink Software (http://www.openlinksw.com/). OpenLink Software 
> develops
> the Virtuoso Universal Server, an innovative enterprise grade server that
> cost-effectively delivers an unrivaled platform for Data Access, 
> Integration
> and Management.
>
> More information about DBpedia is found at http://dbpedia.org/About
>
> Have fun with the new DBpedia knowledge base!
>
> Cheers,
>
> Chris Bizer
>
>
> --
> Prof. Dr. Christian Bizer
> Web-based Systems Group
> Freie Universität Berlin
> +49 30 838 55509
> http://www.bizer.de
> chris@bizer.de
>

-- 
Using Opera's revolutionary e-mail client: http://www.opera.com/mail/ [2]

Received on Tuesday, 13 April 2010 13:11:51 UTC