W3C home > Mailing lists > Public > public-lod@w3.org > April 2009

Re: Keeping crawlers up-to-date

From: Yves Raimond <yves.raimond@gmail.com>
Date: Tue, 28 Apr 2009 15:15:18 +0100
Message-ID: <82593ac00904280715i4644eba5w73f71b005834257e@mail.gmail.com>
To: giovanni.tummarello@deri.org
Cc: Linking Open Data <public-lod@w3.org>, Nicholas J Humfrey <njh@aelius.com>, Patrick Sinclair <metade@gmail.com>
Hi Giovanni!

>
> nothing can beat having a semantic sitemap [1]. Basically you say that you
> change 1nce a day and give a link to the dump. Done :-)
>

Well, the problem is that we don't have an RDF dump, and it is quite
costly to generate one, due to the architecture driving the site
(classic MVC-style, driven by a SQL database). Basically, the simplest
way to do so would be to crawl the entire site, which brings us back
to my previous question (how to keep in sync the results of these
crawls) :-)

> if you put it i am ready to show in sindice the information updated every
> day, and with no other cost for you than a single dump download.
>
> also the sitemap allows you to specify where your sparql endpoint is.

Same thing as for the RDF dump.

Although if we had an RDF dump, that would probably be a good way to
go. It may be a bit inefficient though, as it wouldn't take into
account the data that doesn't change?

Cheers!
y
Received on Tuesday, 28 April 2009 14:16:01 UTC

This archive was generated by hypermail 2.3.1 : Sunday, 31 March 2013 14:24:20 UTC