- From: Peter Ansell <ansell.peter@gmail.com>
- Date: Fri, 22 Jan 2010 13:50:40 +1000
- To: Nicolas Torzec <torzecn@yahoo-inc.com>
- Cc: public-lod@w3.org
HI Nicolas, >From what I can remember they have an agreement with Wikipedia to get access to a limited access full content live update feed. You might want to try asking about it on the dbpedia discussion list at [1] for more information. Cheers, Peter [1] https://lists.sourceforge.net/mailman/listinfo/dbpedia-discussion 2010/1/22 Nicolas Torzec <torzecn@yahoo-inc.com>: > Hi there, > > I am using open data sets such as Wikipedia for data mining and knowledge > acquisition purposes; entities and relations extracted being exposed and > consumed via indices. > > I am already retrieving and processing new Wikipedia static dumps every time > they are available, but I would like to go beyond this and use > incremental/live updates to be more in synch with Wikipedia content. > > I know that I could use some Web services and IRC Channels for tracking > changes in Wikipedia but, beside the fact that the web service has been > designed more for tracking individual changes than monitoring Wikipedia > changes continuously, these two methods will still require to parse the > update messages (for extracting the URLs of the new/modified/deleted pages) > and then to retrieve the actual pages. > > Does anyone has experience with that? > > Is there any other way to retrieve incremental updates in a reliable and > continuous way, especially in the same format as the one provided for the > static dumps? (mysql replication, incremental dumps... ) > > I have also read that DBpedia was trying to be more in sync with Wikipedia > content. How do they plan to stay in sync with Wikipedia updates? > > > Thanks for your help. > > Best, > Nicolas Torzec. >
Received on Friday, 22 January 2010 03:51:08 UTC