- From: Giovanni Tummarello <giovanni.tummarello@deri.org>
- Date: Sun, 18 Oct 2009 16:06:45 +0100
- To: Hugh Glaser <hg@ecs.soton.ac.uk>
- Cc: "public-lod@w3.org" <public-lod@w3.org>, Sindice general discussions list <sindice-general@lists.deri.org>
> A) The wrapper's Semantic Sitemap points you at the original Sitemap, and > says how it is doing the wrapping. And because you know how the wrapper is > behaving, you can process the standard Sitemap to get the information you > want about what the wrapping site provides. > Actually, the "slicing" in the current spec is something similar to this - > my Linked Data site is a wrapper around my SPARQL endpoint, and I provide a > description of this along with dumps of the contents of the RDF store. > i get it. The problem here is the automation. This would effectively mean Sindice fetching "takes order" from a site (site A) to go and fetch some third party site (site B) and index it the way site A says. Seems scary :/ but yes no work for site A to do really > B) Another way is for the wrapper to actually process the Sitemap and data > dumps to produce a Semantic Sitemap and RDF dumps. Really wrapping the whole > site, not just the data. This would require no extra facilities at the > Sindice end. This is better under a security/trust/provenence ... site A fetches the content of site B (lets use the term "fecth" instead of "crawl" to indicate a bunch of sitemal URLs to be fetch, but they can easily be hundreds of thousands, so a several day job) , then wraps it creates a nice dump and i am happy. ... this is good but seems to a) require a lot of job for site A, the reward is not that clear, b) puts site A in some for of repsonsibility for republishing data of site B (without having a large automatic service like a search engine) this is still about fetching all and not about integrating some form of service description (as martin suggests) (note that i am SURE we necessarely have to integrate services, but it would seem logical afterall, yet somehow very different from what we have so far been considering, data explicitly published. is it possible to come up with a super light service description that would allow me to simply understand when the service needs to be invoked to possibly answer a query? Maybe something in the middle?like products descriptions in RDF and then a special node for the price that says "see service here"? or "seeupdated price list here" ? if so i could index such descriptions and when somebody asks me i could say a) these are the answers i know alrady b) these services claim to be able to give you some additional answer or (probably better) i do the calling for you in parallel cached mode, sort the result and return it with the provenence indication etc ? Giovanni
Received on Sunday, 18 October 2009 15:07:41 UTC