- From: <baran@goldmail.de>
- Date: Tue, 13 Apr 2010 14:12:50 +0300
- To: dbpedia-discussion@lists.sourceforge.net, dbpedia-announcements@lists.sourceforge.net, "Chris Bizer" <chris@bizer.de>
- Cc: public-lod@w3.org, 'SW-forum' <semantic-web@w3.org>, semanticweb@yahoogroups.com
- Message-ID: <op.va3n7ochjp4e63@user-pc>
A fact of my experience since many years: The homepage of my grandma is better accessible than the flagship(!) of 'linked data' dbpedia.org... On Mon, 12 Apr 2010 12:06:01 +0300, Chris Bizer <chris@bizer.de> wrote: > Hi all, > > we are happy to announce the release of DBpedia 3.5. > > The new release is based on Wikipedia dumps dating from March 2010. > Compared > to the 3.4 release, we were able to increase the quality of the DBpedia > knowledge base by employing a new data extraction framework which applies > various data cleansing heuristics as well as by extending the > infobox-to-ontology mappings that guide the data extraction process. > > The new DBpedia knowledge base describes more than 3.4 million things,out > of which 1.47 million are classified in a consistent ontology, including > 312,000 persons, 413,000 places, 94,000 music albums, 49,000 films,15,000 > video games, 140,000 organizations, 146,000 species and 4,600 diseases. > The > DBpedia data set features labels and abstracts for these 3.2 million > things > in up to 92 different languages; 1,460,000 links to images and 5,543,000 > links to external web pages; 4,887,000 external links into other RDF > datasets, 565,000 Wikipedia categories, and 75,000 YAGO categories. The > DBpedia knowledge base altogether consists of over 1 billion pieces of > information (RDF triples) out of which 257 million were extracted fromthe > English edition of Wikipedia and 766 million were extracted from other > language editions. > > The new release provides the following improvements and changes compared > to > the DBpedia 3.4 release: > > 1. The DBpedia extraction framework has been completely rewritten in > Scala. > The new framework dramatically reduces the extraction time of a single > Wikipedia article from over 200 to about 13 milliseconds. All features of > the previous PHP framework have been ported. In addition, the new > framework > can extract data from Wikipedia tables based on table-to-ontologymappings > and is able to extract multiple infoboxes out of a single Wikipedia > article. > The data from each infobox is represented as a separate RDF resource. All > resources that are extracted from a single page can be connected using > custom RDF properties which are also defined in the mappings. A lot of > work > also went into the value parsers and the DBpedia 3.5 dataset should > therefore be much cleaner than its predecessors. In addition, units of > measurement are normalized to their respective SI unit, which makes > querying > DBpedia easier. > > 2. The mapping language that is used to map Wikipedia infoboxes to the > DBpedia Ontology has been redesigned. The documentation of the newmapping > language is found at > http://dbpedia.svn.sourceforge.net/viewvc/dbpedia/trunk/extraction/core/doc/ > mapping%20language/ > > 3. In order to enable the DBpedia user community to extend and refine the > infobox to ontology mappings, the mappings can be edited on the newly > created wiki hosted on http://mappings.dbpedia.org. At the moment, 303 > template mappings are defined, which cover (including redirects) 1055 > templates. On the wiki, the DBpedia Ontology can be edited by the > community > as well. At the moment, the ontology consists of 259 classes and about > 1,200 > properties. > 4. The ontology properties extracted from infoboxes are now split intotwo > data sets: 1. The Ontology Infobox Properties dataset contains the > properties as they are defined in the ontology (e.g. length). The range > of a > property is either an xsd schema type or a dimension of measurement, in > which case the value is normalized to the respective SI unit. 2. The > Ontology Infobox Properties (Specific) dataset contains properties which > have been specialized for a specific class using a specific unit. e.g.the > property height is specialized on the class Person using the unit > centimeters instead of meters. For further details please refer to > http://wiki.dbpedia.org/Datasets#h18-11. > 5. The framework now resolves template redirects, making it possible to > cover all redirects to an infobox on Wikipedia with a single mapping. > > 6. Three new extractors have been implemented: 1. PageIdExtractor > extracting > Wikipedia page IDs are extracted for each page. 2. RevisionExtractor > extracting the latest revision of a page. 3. PNDExtractor extracting PND > (Personnamendatei) identifiers. > > 7. The data set now provides labels, abstracts, page links and infobox > data > in 92 different languages, which have been extracted from recentWikipedia > dumps as of March 2010. > > 8. In addition the N-Triples datasets, N-Quads datasets are providedwhich > include a provenance URI to each statement. The provenance URI denotesthe > origin of the extracted triple in Wikipedia (For details see: > http://wiki.dbpedia.org/Datasets#h18-18). > > You can download the new DBpedia dataset from > http://wiki.dbpedia.org/Downloads35. As usual, the data set is also > available as Linked Data and via the DBpedia SPARQL endpoint. > > Lots of thanks to: > > * Robert Isele, Anja Jentzsch, Christopher Sahnwaldt, and Paul Kreis (all > Freie Universität Berlin) for reimplementing the DBpedia extraction > framework in Scala, for extending the infobox-to-ontology mappings andfor > extracting the new DBpedia 3.5 knowledge base. > > * Jens Lehmann and Sören Auer (both Universität Leipzig) for providingthe > knowledge base via the DBpedia download server at Universität Leipzig. > > * Kingsley Idehen and Mitko Iliev (both OpenLink Software) for loadingthe > knowledge base into the Virtuoso instance that serves the Linked Dataview > and SPARQL endpoint. > > The whole DBpedia team is very thankful to three companies which enabled > us > to do all this by supporting and sponsoring the DBpedia project: > > * Neofonie GmbH (http://www.neofonie.de/index.jsp), a Berlin-basedcompany > offering leading technologies in the area of Web search, social media and > mobile applications. > > * Vulcan Inc. as part of its Project Halo (www.projecthalo.com). Vulcan > Inc. > creates and advances a variety of world-class endeavors and high impact > initiatives that change and improve the way we live, learn, do business > (http://www.vulcan.com/). > > * OpenLink Software (http://www.openlinksw.com/). OpenLink Software > develops > the Virtuoso Universal Server, an innovative enterprise grade server that > cost-effectively delivers an unrivaled platform for Data Access, > Integration > and Management. > > More information about DBpedia is found at http://dbpedia.org/About > > Have fun with the new DBpedia knowledge base! > > Cheers, > > Chris Bizer > > > -- > Prof. Dr. Christian Bizer > Web-based Systems Group > Freie Universität Berlin > +49 30 838 55509 > http://www.bizer.de > chris@bizer.de > -- Using Opera's revolutionary e-mail client: http://www.opera.com/mail/ [2]
Received on Tuesday, 13 April 2010 13:11:51 UTC