Re: Wikipedia incremental updates from Michael Hausenblas on 2010-01-22 (public-lod@w3.org from January 2010)

From: Michael Hausenblas <michael.hausenblas@deri.org>
Date: Fri, 22 Jan 2010 08:06:13 +0000
To: Nicolas Torzec <torzecn@yahoo-inc.com>
CC: Linked Data community <public-lod@w3.org>
Message-ID: <C77F0CF5.C255%michael.hausenblas@deri.org>

Nicolas,

> Does anyone has experience with that?
> 
> Is there any other way to retrieve incremental updates in a reliable and
> continuous way, especially in the same format as the one provided for the
> static dumps?  (mysql replication, incremental dumps... )

I think this is a very timely and important question [1]. We did a demo
recently [2], based on voiD and Atom, trying to figure out what could work
and as a result a group of interested people in this area has formed [3].
Would be great if you'd join in and share your use case ...

Cheers,
      Michael

[1] http://esw.w3.org/topic/DatasetDynamics
[2] http://code.google.com/p/dady/wiki/Demos
[3] http://groups.google.com/group/dataset-dynamics

-- 
Dr. Michael Hausenblas
LiDRC - Linked Data Research Centre
DERI - Digital Enterprise Research Institute
NUIG - National University of Ireland, Galway
Ireland, Europe
Tel. +353 91 495730
http://linkeddata.deri.ie/
http://sw-app.org/about.html



> From: Nicolas Torzec <torzecn@yahoo-inc.com>
> Date: Thu, 21 Jan 2010 19:35:24 -0800
> To: Linked Data community <public-lod@w3.org>
> Subject: Wikipedia incremental updates
> Resent-From: Linked Data community <public-lod@w3.org>
> Resent-Date: Fri, 22 Jan 2010 03:45:06 +0000
> 
> Hi there,
> 
> I am using open data sets such as Wikipedia for data mining and knowledge
> acquisition purposes; entities and relations extracted being exposed and
> consumed via indices.
> 
> I am already retrieving and processing new Wikipedia static dumps every time
> they are available, but I would like to go beyond this and use
> incremental/live updates to be more in synch with Wikipedia content.
> 
> I know that I could use some Web services and IRC Channels for tracking
> changes in Wikipedia but, beside the fact that the web service has been
> designed more for tracking individual changes than monitoring Wikipedia
> changes continuously, these two methods will still require to parse the
> update messages (for extracting the URLs of the new/modified/deleted pages)
> and then to retrieve the actual pages.
> 
> Does anyone has experience with that?
> 
> Is there any other way to retrieve incremental updates in a reliable and
> continuous way, especially in the same format as the one provided for the
> static dumps?  (mysql replication, incremental dumps... )
> 
> I have also read that DBpedia was trying to be more in sync with Wikipedia
> content. How do they plan to stay in sync with Wikipedia updates?
> 
> 
> Thanks for your help.
> 
> Best,
> Nicolas Torzec.
>

Received on Friday, 22 January 2010 08:06:46 UTC