- From: Nicolas Torzec <torzecn@yahoo-inc.com>
- Date: Thu, 21 Jan 2010 19:35:24 -0800
- To: <public-lod@w3.org>
- Message-ID: <C77E5CFC.25C5%torzecn@yahoo-inc.com>
Hi there, I am using open data sets such as Wikipedia for data mining and knowledge acquisition purposes; entities and relations extracted being exposed and consumed via indices. I am already retrieving and processing new Wikipedia static dumps every time they are available, but I would like to go beyond this and use incremental/live updates to be more in synch with Wikipedia content. I know that I could use some Web services and IRC Channels for tracking changes in Wikipedia but, beside the fact that the web service has been designed more for tracking individual changes than monitoring Wikipedia changes continuously, these two methods will still require to parse the update messages (for extracting the URLs of the new/modified/deleted pages) and then to retrieve the actual pages. Does anyone has experience with that? Is there any other way to retrieve incremental updates in a reliable and continuous way, especially in the same format as the one provided for the static dumps? (mysql replication, incremental dumps... ) I have also read that DBpedia was trying to be more in sync with Wikipedia content. How do they plan to stay in sync with Wikipedia updates? Thanks for your help. Best, Nicolas Torzec.
Received on Friday, 22 January 2010 03:45:04 UTC