Re: RDF Dataset Notifications

Hi,

On 17 April 2010 12:22, Giovanni Tummarello <g.tummarello@gmail.com> wrote:
> i tell you what we're going to be supporting in Sindice very soon and
> it would be great if you could add it to the table:
>
> simple existing sitemaps :-). Sitemaps provide the list of URLs to
> crawl and for each one either a "last updated" field or "update
> frequerncy".
>
> If the website cares to update the last updated properly then also
> huge datasets can be kept in sync on daily (or less) bases.
>
> by publishing RDF in entity based slices (HTML + RDFa) the mechanism
> simply works fine and it is the same large web publishers have been
> using for years to expose the deep web so it is not difficult to
> explain etc.
>
> for large datasets which are large RDF files, the Semantic Sitemap
> extention does its job for us (dbpedia and many others are in Sindice
> because of that)
>
> What do you think?
> ...
Yes, directed and undirected crawling needs to be included.

I've tweaked the spreadsheet into two worksheets:

* Approaches for mirroring data, e.g exports, crawling, etc
* Approaches for syndicating notifications/changes

The latter is what I had originally, but the mirroring aspects are
new. I've included semantic sitemaps on there, along with simple
dataset exports, BitTorrent, etc.

A system may choose to just regularly mirror a dataset using a dump,
or via a crawl. Or it may compare an initial mirror with
synchronisation via further update notifications.

Hopefully the new spreadsheet helps tease some of that out:

http://spreadsheets.google.com/pub?key=tLWdskoM-2--vLjUI05e7qQ&output=html

Cheers,

L.

-- 
Leigh Dodds
Programme Manager, Talis Platform
Talis
leigh.dodds@talis.com
http://www.talis.com

Received on Sunday, 18 April 2010 15:46:45 UTC