Re: The Power of Virtuoso Sponger Technology from Giovanni Tummarello on 2009-10-18 (public-lod@w3.org from October 2009)

From: Giovanni Tummarello <giovanni.tummarello@deri.org>
Date: Sun, 18 Oct 2009 16:06:45 +0100
To: Hugh Glaser <hg@ecs.soton.ac.uk>
Cc: "public-lod@w3.org" <public-lod@w3.org>, Sindice general discussions list <sindice-general@lists.deri.org>
Message-ID: <210271540910180806w2c84d6e4o92cb44fb4294ab81@mail.gmail.com>

> A) The wrapper's Semantic Sitemap points you at the original Sitemap, and
> says how it is doing the wrapping. And because you know how the wrapper is
> behaving, you can process the standard Sitemap to get the information you
> want about what the wrapping site provides.
> Actually, the "slicing" in the current spec is something similar to this -
> my Linked Data site is a wrapper around my SPARQL endpoint, and I provide a
> description of this along with dumps of the contents of the RDF store.
>

i get it. The problem here is the automation. This would effectively
mean Sindice fetching "takes order" from a site (site A)  to go and
fetch some third party site (site B) and index it the way site A says.
Seems scary :/ but yes no work for site A to do really


> B) Another way is for the wrapper to actually process the Sitemap and data
> dumps to produce a Semantic Sitemap and RDF dumps. Really wrapping the whole
> site, not just the data. This would require no extra facilities at the
> Sindice end.

This is better under a security/trust/provenence ... site A fetches
the content of site B (lets use the term "fecth" instead of "crawl" to
indicate a bunch of sitemal URLs to be fetch, but they can easily be
hundreds of thousands, so a several day job) , then wraps it creates a
nice dump and i am happy.

... this is good but seems to a) require a lot of job for site A, the
reward is not that clear, b) puts site A in some for of repsonsibility
for republishing data of site B (without having a large automatic
service like a search engine)


this is still about fetching all and not about integrating some form
of service description (as martin suggests) (note that i am SURE we
necessarely have to integrate services, but it would seem logical
afterall, yet somehow very different from what we have so far been
considering, data explicitly published.



is it possible to come up with a super light service description that
would allow me to simply understand when the service needs to be
invoked to possibly answer a query?

Maybe something in the middle?like products descriptions in RDF and
then a special node for the price that says "see service here"?  or
"seeupdated price list here" ?

 if so i could index such descriptions and when somebody asks me i could say

a) these are the answers i know alrady
b) these services claim to be able to give you some additional answer
or (probably better) i do the calling for you in parallel cached mode,
sort the result and return it with the provenence indication etc ?

Giovanni

Received on Sunday, 18 October 2009 15:07:41 UTC