W3C home > Mailing lists > Public > public-lod@w3.org > January 2011

Re: [Dbpedia-discussion] ANN: DBpedia 3.6 released

From: Roberto Mirizzi <roberto.mirizzi@gmail.com>
Date: Tue, 18 Jan 2011 12:00:29 +0100
Message-ID: <4D3572CD.3040708@gmail.com>
To: Chris Bizer <chris@bizer.de>
CC: dbpedia-announcements@lists.sourceforge.net, 'public-lod' <public-lod@w3.org>, 'Semantic Web' <semantic-web@w3.org>, dbpedia-discussion@lists.sourceforge.net
Hi all,
and congratulations for the new release and all the improvements.
Anyway, I've some remarks against the "Improved parsing coverage". :-)
Looking at the new dataset, I've found an issue that in the previous 
dataset did not exist.
For example, if you look at this page:
http://dbpedia.org/page/PHP
you will find that the object of the property "dbpprop:influencedBy" is 
a string "C, Perl, Java, C++, Tcl", while in the previous version of the 
dataset (I've loaded it on my local machine), there are five distinct 
triples having as object, respectively: 
"dbpedia:C_(programming_language)", 
"dbpedia:Java_(programming_language)", "dbpedia:Perl", "dbpedia:Tcl" and 
"dbpedia:C++_(programming_language)".
I think this problem should be fixed, specially because a corresponding 
"ontology property" still does not exist for such properties.


All the best,
roberto


Il 17/01/2011 14:10, Chris Bizer ha scritto:
> Hi all,
>
> we are happy to announce the release of DBpedia 3.6. The new release is
> based on Wikipedia dumps dating from October/November 2010.
>
> The new DBpedia dataset describes more than 3.5 million things, of which
> 1.67 million are classified in a consistent ontology, including 364,000
> persons, 462,000 places, 99,000 music albums, 54,000 films, 16,500 video
> games, 148,000 organizations, 148,000 species and 5,200 diseases.
>
> The DBpedia dataset features labels and abstracts for 3.5 million things in
> up to 97 different languages; 1,850,000 links to images and 5,900,000 links
> to external web pages; 6,500,000 external links into other RDF datasets, and
> 632,000 Wikipedia categories.
>
> The dataset consists of 672 million pieces of information (RDF triples) out
> of which 286 million were extracted from the English edition of Wikipedia
> and 386 million were extracted from other language editions and links to
> external datasets.
>
> Along with the release of the new datasets, we are happy to announce the
> initial release of the DBpedia MappingTool
> (http://mappings.dbpedia.org/index.php/MappingTool): a graphical user
> interface to support the community in creating and editing mappings as well
> as the ontology.
>
> The new release provides the following improvements and changes compared to
> the DBpedia 3.5.1 release:
>
> 1. Improved DBpedia Ontology as well as improved Infobox mappings using
> http://mappings.dbpedia.org/.
>
> Furthermore, there are now also mappings in languages other than English.
> These improvements are largely due to collective work by the community.
> There are 13.8 million RDF statements based on mappings (11.1 million in
> version 3.5.1). All this data is in the /ontology/ namespace. Note that this
> data is of much higher quality than the Raw Infobox data in the /property/
> namespace.
>
> Statistics of the mappings wiki on the date of release 3.6:
>
> + Mappings:
>       + English: 315 Infobox mappings (covers 1124 templates including
> redirects)
>       + Greek: 137 Infobox mappings (covers 192 templates including
> redirects)
>       + Hungarian: 111 Infobox mappings (covers 151 templates including
> redirects)
>       + Croatian: 36 Infobox mappings (covers 67 templates including
> redirects)
>       + German: 9 Infobox mappings
>       + Slovenian: 4 Infobox mappings
> + Ontology:
>       +  272 classes
> +  Properties:
>       + 629 object properties
>       + 706 datatype properties (they are all in the /datatype/ namespace)
>
> 2.  Some commonly used property names changed.
>
> + Please see http://dbpedia.org/ChangeLog and
> http://dbpedia.org/Datasets/Properties to know which relations changed and
> update your applications accordingly!
>
> 3. New Datatypes for increased quality in mapping-based properties
>
> + xsd:positiveInteger, xsd:nonNegativeInteger, xsd:nonPositiveInteger,
> xsd:negativeInteger
>
> 4. Improved parsing coverage.
>
> + Parsing of lists of elements in Infobox property values that improves the
> completeness of extracted facts.
> + Method to deal with missing repeated links in Infoboxes that do appear
> somewhere else on the page.
> + Flag templates are parsed.
> + Various improvements on internationalization.
>
> 5. Improved recognition of
>
> + Wikipedia namespace identifiers.
> + Wikipedia language codes.
> + Category hierarchies.
>
> 6. Disambiguation links for acronyms (all upper-case title) are now
> extracted (for example, Kilobyte and Knowledge_base for "KB"):
>
> + Wikilinks consisting of multiple words: If the starting letters of the
> words appear in correct order (with possible gaps) and cover all acronym
> letters.
> + Wikilinks consisting of a single word: If the case-insensitive longest
> common subsequence with the acronym is equal to the acronym.
>
> 7. Encoding (bugfixes):
>
> + The new datasets support the complete range of Unicode code points (up to
> 0x10ffff). 16-bit code points start with '\u', code points larger than
> 16-bits start with '\U'.
> + Commas and ampersands do not get encoded anymore in URIs. Please see
> http://dbpedia.org/URIencoding for an explanation regarding the DBpedia URI
> encoding scheme.
>
> 8. Extended Datasets:
>
> + Thanks to Johannes Hoffart (Max-Planck-Institut für Informatik) for
> contributing links to YAGO2.
> + Freebase links have been updated. They now refer to mids
> (http://wiki.freebase.com/wiki/Machine_ID) because guids have been
> deprecated.
>
> You can download the new DBpedia dataset from http://dbpedia.org/Downloads36
>
> As usual, the dataset is also available as Linked Data and via the DBpedia
> SPARQL endpoint at http://dbpedia.org/sparql
>
> Lots of thanks to:
>
> + All editors that contributed to the DBpedia ontology mappings via the
> Mappings Wiki.
> + Max Jakob (Freie Universität Berlin, Germany) for improving the DBpedia
> extraction framework and for extracting the new datasets.
> + Robert Isele and Anja Jentzsch (both Freie Universität Berlin, Germany)
> for helping Max with their expertise on the extraction framework.
> + Paul Kreis (Freie Universität Berlin, Germany) for analyzing the DBpedia
> data of the previous release and suggesting ways to increase quality and
> quantity. Some results of his work were implemented in this release.
> + Dimitris Kontokostas (Aristotle University of Thessaloniki, Greece), Jimmy
> O'Regan (Eolaistriu Technologies, Ireland), José Paulo Leal (University of
> Porto, Portugal) for providing patches to improve the extraction framework.
> + Jens Lehmann and Sören Auer (both Universität Leipzig, Germany) for
> providing the new dataset via the DBpedia download server at Universität
> Leipzig.
> + Kingsley Idehen and Mitko Iliev (both OpenLink Software) for loading the
> dataset into the Virtuoso instance that serves the Linked Data view and
> SPARQL endpoint. OpenLink Software (http://www.openlinksw.com/) altogether
> for providing the server infrastructure for DBpedia.
>
> The work on the new release was financially supported by:
>
> + Neofonie GmbH, a Berlin-based company offering leading technologies in the
> area of Web search, social media and mobile applications
> (http://www.neofonie.de/).
> + The European Commission through the project LOD2 - Creating Knowledge out
> of Linked Data (http://lod2.eu/).
> + Vulcan Inc. as part of its Project Halo (http://www.projecthalo.com/).
> Vulcan Inc. creates and advances a variety of world-class endeavors and high
> impact initiatives that change and improve the way we live, learn, do
> business (http://www.vulcan.com/).
>
> More information about DBpedia is found at http://dbpedia.org/About
>
> Have fun with the new dataset!
>
> The whole DBpedia team also congratulates Wikipedia to its 10th Birthday
> which was this weekend!
>
> Cheers,
>
> Chris Bizer
>
>
> --
> Prof. Dr. Christian Bizer
> Web-based Systems Group
> Freie Universität Berlin
> +49 30 838 55509
> http://www.bizer.de
> chris@bizer.de
>
>
>
> ------------------------------------------------------------------------------
> Protect Your Site and Customers from Malware Attacks
> Learn about various malware tactics and how to avoid them. Understand
> malware threats, the impact they can have on your business, and how you
> can protect your company and customers by using code signing.
> http://p.sf.net/sfu/oracle-sfdevnl
> _______________________________________________
> Dbpedia-discussion mailing list
> Dbpedia-discussion@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion

-- 
Roberto Mirizzi
Information System Lab
Politecnico di Bari (Italy)
http://sisinflab.poliba.it/mirizzi
Received on Tuesday, 18 January 2011 11:03:01 UTC

This archive was generated by hypermail 2.3.1 : Sunday, 31 March 2013 14:24:31 UTC