Re: ANN: DBpedia 3.5 released from Lee Feigenbaum on 2010-04-12 (semantic-web@w3.org from April 2010)

From: Lee Feigenbaum <lee@thefigtrees.net>
Date: Mon, 12 Apr 2010 09:19:43 -0400
To: Luis Criado Fernández <lcriadof@yahoo.es>
CC: Chris Bizer <chris@bizer.de>, dbpedia-discussion@lists.sourceforge.net, dbpedia-announcements@lists.sourceforge.net, public-lod@w3.org, SW-forum <semantic-web@w3.org>, semanticweb@yahoogroups.com
Message-ID: <4BC31DEF.4040000@thefigtrees.net>
Hello Luis,

You can accomplish this with SPARQL. Please see 
http://www.cambridgesemantics.com/2008/09/sparql-by-example/#%2825%29 
for an example.

You can also download language-specific subsets of the data from the 
DBPedia downloads page at:

http://wiki.dbpedia.org/Downloads35

hope this helps,
Lee


On 4/12/2010 8:42 AM, Luis Criado Fernández wrote:
> A great job, congratulations!!!!.
>
> I did not know the existence of Dbpedia. I have much interest in studying it.
>
> If you allow me a question, I would like to know, if
> do we have any way to distinguish the language
> of the content of the value of property "dbpedia-owl: abstract"?
>
> Please excuse my English,
>
>
>
>
> ________________________________
>
> Cheers,
> Luis Criado
> http://lcriadof.blogspot.com/
>
>
>
>
> ----- Mensaje original ----
> De: Chris Bizer<chris@bizer.de>
> Para: dbpedia-discussion@lists.sourceforge.net; dbpedia-announcements@lists.sourceforge.net
> CC: public-lod@w3.org; SW-forum<semantic-web@w3.org>; semanticweb@yahoogroups.com
> Enviado: lun,12 abril, 2010 11:06
> Asunto: ANN: DBpedia 3.5 released
>
> Hi all,
>
> we are happy to announce the release of DBpedia 3.5.
>
> The new release is based on Wikipedia dumps dating from March 2010. Compared
> to the 3.4 release, we were able to increase the quality of the DBpedia
> knowledge base by employing a new data extraction framework which applies
> various data cleansing heuristics as well as by extending the
> infobox-to-ontology mappings that guide the data extraction process.
>
> The new DBpedia knowledge base describes more than 3.4 million things, out
> of which 1.47 million are classified in a consistent ontology, including
> 312,000 persons, 413,000 places, 94,000 music albums, 49,000 films, 15,000
> video games, 140,000 organizations, 146,000 species and 4,600 diseases. The
> DBpedia data set features labels and abstracts for these 3.2 million things
> in up to 92 different languages; 1,460,000 links to images and 5,543,000
> links to external web pages; 4,887,000 external links into other RDF
> datasets, 565,000 Wikipedia categories, and 75,000 YAGO categories. The
> DBpedia knowledge base altogether consists of over 1 billion pieces of
> information (RDF triples) out of which 257 million were extracted from the
> English edition of Wikipedia and 766 million were extracted from other
> language editions.
>
> The new release provides the following improvements and changes compared to
> the DBpedia 3.4 release:
>
> 1. The DBpedia extraction framework has been completely rewritten in Scala.
> The new framework dramatically reduces the extraction time of a single
> Wikipedia article from over 200 to about 13 milliseconds. All features of
> the previous PHP framework have been ported. In addition, the new framework
> can extract data from Wikipedia tables based on table-to-ontology mappings
> and is able to extract multiple infoboxes out of a single Wikipedia article.
> The data from each infobox is represented as a separate RDF resource. All
> resources that are extracted from a single page can be connected using
> custom RDF properties which are also defined in the mappings. A lot of work
> also went into the value parsers and the DBpedia 3.5 dataset should
> therefore be much cleaner than its predecessors. In addition, units of
> measurement are normalized to their respective SI unit, which makes querying
> DBpedia easier.
>
> 2. The mapping language that is used to map Wikipedia infoboxes to the
> DBpedia Ontology has been redesigned. The documentation of the new mapping
> language is found at
> http://dbpedia.svn.sourceforge.net/viewvc/dbpedia/trunk/extraction/core/doc/
> mapping%20language/
>
> 3. In order to enable the DBpedia user community to extend and refine the
> infobox to ontology mappings, the mappings can be edited on the newly
> created wiki hosted on http://mappings.dbpedia.org. At the moment, 303
> template mappings are defined, which cover (including redirects) 1055
> templates. On the wiki, the DBpedia Ontology can be edited by the community
> as well. At the moment, the ontology consists of 259 classes and about 1,200
> properties.
>
> 4. The ontology properties extracted from infoboxes are now split into two
> data sets: 1. The Ontology Infobox Properties dataset contains the
> properties as they are defined in the ontology (e.g. length). The range of a
> property is either an xsd schema type or a dimension of measurement, in
> which case the value is normalized to the respective SI unit. 2. The
> Ontology Infobox Properties (Specific) dataset contains properties which
> have been specialized for a specific class using a specific unit. e.g. the
> property height is specialized on the class Person using the unit
> centimeters instead of meters. For further details please refer to
> http://wiki.dbpedia.org/Datasets#h18-11.
>
> 5. The framework now resolves template redirects, making it possible to
> cover all redirects to an infobox on Wikipedia with a single mapping.
>
> 6. Three new extractors have been implemented: 1. PageIdExtractor extracting
> Wikipedia page IDs are extracted for each page. 2. RevisionExtractor
> extracting the latest revision of a page. 3. PNDExtractor extracting PND
> (Personnamendatei) identifiers.
>
> 7. The data set now provides labels, abstracts, page links and infobox data
> in 92 different languages, which have been extracted from recent Wikipedia
> dumps as of March 2010.
>
> 8. In addition the N-Triples datasets, N-Quads datasets are provided which
> include a provenance URI to each statement. The provenance URI denotes the
> origin of the extracted triple in Wikipedia (For details see:
> http://wiki.dbpedia.org/Datasets#h18-18).
>
> You can download the new DBpedia dataset from
> http://wiki.dbpedia.org/Downloads35. As usual, the data set is also
> available as Linked Data and via the DBpedia SPARQL endpoint.
>
> Lots of thanks to:
>
> * Robert Isele, Anja Jentzsch, Christopher Sahnwaldt, and Paul Kreis (all
> Freie Universität Berlin) for reimplementing the DBpedia extraction
> framework in Scala, for extending the infobox-to-ontology mappings and for
> extracting the new DBpedia 3.5 knowledge base.
>
> * Jens Lehmann and Sören Auer (both Universität Leipzig) for providing the
> knowledge base via the DBpedia download server at Universität Leipzig.
>
> * Kingsley Idehen and Mitko Iliev (both OpenLink Software) for loading the
> knowledge base into the Virtuoso instance that serves the Linked Data view
> and SPARQL endpoint.
>
> The whole DBpedia team is very thankful to three companies which enabled us
> to do all this by supporting and sponsoring the DBpedia project:
>
> * Neofonie GmbH (http://www.neofonie.de/index.jsp), a Berlin-based company
> offering leading technologies in the area of Web search, social media and
> mobile applications.
>
> * Vulcan Inc. as part of its Project Halo (www.projecthalo.com). Vulcan Inc.
> creates and advances a variety of world-class endeavors and high impact
> initiatives that change and improve the way we live, learn, do business
> (http://www.vulcan.com/).
>
> * OpenLink Software (http://www.openlinksw.com/). OpenLink Software develops
> the Virtuoso Universal Server, an innovative enterprise grade server that
> cost-effectively delivers an unrivaled platform for Data Access, Integration
> and Management.
>
> More information about DBpedia is found at http://dbpedia.org/About
>
> Have fun with the new DBpedia knowledge base!
>
> Cheers,
>
> Chris Bizer
>
>
> --
> Prof. Dr. Christian Bizer
> Web-based Systems Group
> Freie Universität Berlin
> +49 30 838 55509
> http://www.bizer.de
> chris@bizer.de
>
>
>
>
>
Received on Monday, 12 April 2010 13:20:22 UTC