W3C home > Mailing lists > Public > public-lod@w3.org > April 2009

Re: Keeping crawlers up-to-date

From: Giovanni Tummarello <g.tummarello@gmail.com>
Date: Tue, 28 Apr 2009 15:08:26 +0100
Message-ID: <210271540904280708t40b1a703x27be3039054e2c03@mail.gmail.com>
To: Yves Raimond <yves.raimond@gmail.com>
Cc: Linking Open Data <public-lod@w3.org>, Nicholas J Humfrey <njh@aelius.com>, Patrick Sinclair <metade@gmail.com>
Hi YVes,

nothing can beat having a semantic sitemap [1]. Basically you say that you
change 1nce a day and give a link to the dump. Done :-)

if you put it i am ready to show in sindice the information updated every
day, and with no other cost for you than a single dump download.

also the sitemap allows you to specify where your sparql endpoint is.


[1] http://sw.deri.org/2007/07/sitemapextension/

On Tue, Apr 28, 2009 at 2:39 PM, Yves Raimond <yves.raimond@gmail.com>wrote:

> Hello!
> I know this issue has been raised during the LOD BOF at WWW 2009, but
> I don't know if any possible solutions emerged from there.
> The problem we are facing is that data on BBC Programmes changes
> approximately 50 000 times a day (new/updated
> broadcasts/versions/programmes/segments etc.). As we'd like to keep a
> set of RDF crawlers up-to-date with our information we were wondering
> how best to ping these. pingthesemanticweb seems like a nice option,
> but it needs the crawlers to ping it often enough to make sure they
> didn't miss a change. Another solution we were thinking of would be to
> stick either Talis changesets [1] or SPARQL/Update statements in a
> message queue, which would then be consumed by the crawlers.
> Did anyone tried to tackle this problem already?
> Cheers!
> y
> [1] http://n2.talis.com/wiki/Changeset
Received on Tuesday, 28 April 2009 14:09:25 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 16:20:46 UTC