Re: Wikipedia incremental updates

From: Peter Ansell <ansell.peter@gmail.com> · Date: Fri, 22 Jan 2010 13:50:40 +1000

HI Nicolas,

>From what I can remember they have an agreement with Wikipedia to get
access to a limited access full content live update feed. You might
want to try asking about it on the dbpedia discussion list at [1] for
more information.

Cheers,

Peter

[1] https://lists.sourceforge.net/mailman/listinfo/dbpedia-discussion

2010/1/22 Nicolas Torzec <torzecn@yahoo-inc.com>:
> Hi there,
>
> I am using open data sets such as Wikipedia for data mining and knowledge
> acquisition purposes; entities and relations extracted being exposed and
> consumed via indices.
>
> I am already retrieving and processing new Wikipedia static dumps every time
> they are available, but I would like to go beyond this and use
> incremental/live updates to be more in synch with Wikipedia content.
>
> I know that I could use some Web services and IRC Channels for tracking
> changes in Wikipedia but, beside the fact that the web service has been
> designed more for tracking individual changes than monitoring Wikipedia
> changes continuously, these two methods will still require to parse the
> update messages (for extracting the URLs of the new/modified/deleted pages)
> and then to retrieve the actual pages.
>
> Does anyone has experience with that?
>
> Is there any other way to retrieve incremental updates in a reliable and
> continuous way, especially in the same format as the one provided for the
> static dumps?  (mysql replication, incremental dumps... )
>
> I have also read that DBpedia was trying to be more in sync with Wikipedia
> content. How do they plan to stay in sync with Wikipedia updates?
>
>
> Thanks for your help.
>
> Best,
> Nicolas Torzec.
>