Re: AW: ANN: RDF Book Mashup - Integrating Web 2.0 data sources like Amazon and Google into the Semantic Web from Richard Cyganiak on 2006-12-01 (semantic-web@w3.org from December 2006)

From: Richard Cyganiak <richard@cyganiak.de>
Date: Fri, 1 Dec 2006 19:19:47 +0100
To: Richard Newman <r.newman@reading.ac.uk>
Cc: "Chris Bizer" <chris@bizer.de>, "'Karl Dubost'" <karl@w3.org>, "'Damian Steer'" <damian.steer@hp.com>, <semantic-web@w3.org>
Message-Id: <BF4AE2F7-0F82-483B-A08E-E760701A99E0@cyganiak.de>

On 1 Dec 2006, at 18:27, Richard Newman wrote:
> Systemone have Wikipedia dumped monthly into RDF:
>
> http://labs.systemone.at/wikipedia3
>
> A public SPARQL endpoint is on their roadmap, but it's only 47  
> million triples, so you should be able to load it in a few minutes  
> on your machine and run queries locally.

Unfortunately this only represents the hyperlink structure and basic  
article metadata in RDF. It does no scraping of data from info boxes  
or article content. Might be interesting for analyzing Wikipedia's  
link structure or social dynamics, but not for content extraction.

Richard



>
> -R
>
>
> On  1 Dec 2006, at 4:30 AM, Chris Bizer wrote:
>
>>> I wish that wikipedia had a fully exportable database
>>> http://en.wikipedia.org/wiki/Lists_of_films
>>>
>>> For example, being able to export all data of this movie as RDF,
>>> maybe a templating issue at least for the box on the right.
>>> http://en.wikipedia.org/wiki/2046_%28film%29
>>
>> Should be an easy job for a SIMILE like screen scraper.
>>
>> If you start scraping down from the Wikipedia film list, you  
>> should get a
>> fair amount of data.
>>
>> To all the Semantic Wiki guys: Has anybody already done something  
>> like this?
>> Are there SPARQL end-points/repositories for Wikipedia-scraped data?
>
>
>

Received on Friday, 1 December 2006 18:20:01 UTC