W3C home > Mailing lists > Public > semantic-web@w3.org > December 2006

Re: AW: ANN: RDF Book Mashup - Integrating Web 2.0 data sources like Amazon and Google into the Semantic Web

From: Richard Cyganiak <richard@cyganiak.de>
Date: Fri, 1 Dec 2006 19:19:47 +0100
Message-Id: <BF4AE2F7-0F82-483B-A08E-E760701A99E0@cyganiak.de>
Cc: "Chris Bizer" <chris@bizer.de>, "'Karl Dubost'" <karl@w3.org>, "'Damian Steer'" <damian.steer@hp.com>, <semantic-web@w3.org>
To: Richard Newman <r.newman@reading.ac.uk>

On 1 Dec 2006, at 18:27, Richard Newman wrote:
> Systemone have Wikipedia dumped monthly into RDF:
>
> http://labs.systemone.at/wikipedia3
>
> A public SPARQL endpoint is on their roadmap, but it's only 47  
> million triples, so you should be able to load it in a few minutes  
> on your machine and run queries locally.

Unfortunately this only represents the hyperlink structure and basic  
article metadata in RDF. It does no scraping of data from info boxes  
or article content. Might be interesting for analyzing Wikipedia's  
link structure or social dynamics, but not for content extraction.

Richard



>
> -R
>
>
> On  1 Dec 2006, at 4:30 AM, Chris Bizer wrote:
>
>>> I wish that wikipedia had a fully exportable database
>>> http://en.wikipedia.org/wiki/Lists_of_films
>>>
>>> For example, being able to export all data of this movie as RDF,
>>> maybe a templating issue at least for the box on the right.
>>> http://en.wikipedia.org/wiki/2046_%28film%29
>>
>> Should be an easy job for a SIMILE like screen scraper.
>>
>> If you start scraping down from the Wikipedia film list, you  
>> should get a
>> fair amount of data.
>>
>> To all the Semantic Wiki guys: Has anybody already done something  
>> like this?
>> Are there SPARQL end-points/repositories for Wikipedia-scraped data?
>
>
>
Received on Friday, 1 December 2006 18:20:01 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 8 January 2008 14:22:45 GMT