[ANN] Experimental Wikimedia Commons RDF extraction with DBpedia

Hi everybody,

We are happy to announce an experimental RDF dump of the Wikimedia Commons. A complete first draft is now available online at http://nl.dbpedia.org/downloads/commonswiki/20140705/, and will be eventually accesible from http://commons.dbpedia.org. A small sample dataset, which may be easier to browse, is available on Github at https://github.com/gaurav/commons-extraction/tree/master/commonswiki/20140101

The following datasets showcases some of the improvements that we’ve been working on over the last two months:
 - File information (*-file-information.*) is a completely new dataset that contains information on the files in the Commons, including file and thumbnail URLs, file extensions, file type classes and MIME types.
 - DBpedia’s Mappings Extractor (*-mappingbased-properties.*) uses templates stored on the Mapping server (http://mappings.dbpedia.org/) to create RDF for information-rich templates. This system still has some important limitations, such as not being able to process process embedded templates (e.g. license templates inside {{Information}}), but top-level templates are completely configurable. The existing mappings are available at http://mappings.dbpedia.org/index.php/Mapping_commons
 - This includes 363 license templates that indicate licensing for Commons files under public domain, Creative Commons and other open access licenses. These were created by bots and still require verification before use. They are listed at http://mappings.dbpedia.org/index.php/Category:Commons_media_license
 - The DBpedia Geoextractor (*-geo-coordinates.*) now extracts geographical coordinates from Commons files using the {{Location}} template.
 - The DBpedia SKOS Extractor (*-skos-categories.*) now identifies relationships between Commons categories, building a SKOS-based description of the entire Commons category tree.

Please have a look and let us know what you think. We’ll be working on a number of open tasks over the next three weeks, listed at https://github.com/gaurav/extraction-framework/issues?state=open -- if you see something wrong with what we’ve done above, or have an issue you’d particularly like us to tackle, please report it there or drop me an e-mail!

This work is sponsored by the Google Summer of Code program
(https://www.google-melange.com/gsoc/project/details/google/gsoc2014/gaurav/5676830073815040).

Thanks!

cheers,
The DBpedia Commons extraction team:
Gaurav Vaidya
Dimitris Kontokostas
Andrea Di Menna
Jimmy O’Regan

Received on Thursday, 31 July 2014 07:26:40 UTC