W3C home > Mailing lists > Public > public-lod@w3.org > January 2010

Re: Wikipedia incremental updates

From: Hugh Williams <hwilliams@openlinksw.com>
Date: Fri, 22 Jan 2010 04:35:39 +0000
Message-Id: <8FDFB0F3-F709-42CB-BEA7-2EE969808D9A@openlinksw.com>
Cc: <public-lod@w3.org>
To: Nicolas Torzec <torzecn@yahoo-inc.com>
Hi Nicolas,

The upcoming DBpedia Live service is hosted at:


This service extracts live updates from Wikipedia as can be seen at:


It is still under going testing but is available for use and scheduled become/replace the default DBpedia service soon ...

Best Regards
Hugh Williams
Professional Services
OpenLink Software
Web: http://www.openlinksw.com
Support: http://support.openlinksw.com
Forums: http://boards.openlinksw.com/support
Twitter: http://twitter.com/OpenLink

On 22 Jan 2010, at 03:35, Nicolas Torzec wrote:

> Hi there,
> I am using open data sets such as Wikipedia for data mining and knowledge acquisition purposes; entities and relations extracted being exposed and consumed via indices.  
> I am already retrieving and processing new Wikipedia static dumps every time they are available, but I would like to go beyond this and use incremental/live updates to be more in synch with Wikipedia content.
> I know that I could use some Web services and IRC Channels for tracking changes in Wikipedia but, beside the fact that the web service has been designed more for tracking individual changes than monitoring Wikipedia changes continuously, these two methods will still require to parse the update messages (for extracting the URLs of the new/modified/deleted pages) and then to retrieve the actual pages.
> Does anyone has experience with that? 
> Is there any other way to retrieve incremental updates in a reliable and continuous way, especially in the same format as the one provided for the static dumps?  (mysql replication, incremental dumps... )
> I have also read that DBpedia was trying to be more in sync with Wikipedia content. How do they plan to stay in sync with Wikipedia updates?      
> Thanks for your help.
> Best,
> Nicolas Torzec.
Received on Friday, 22 January 2010 04:36:13 UTC

This archive was generated by hypermail 2.3.1 : Wednesday, 7 January 2015 15:16:02 UTC