Re: The Power of Virtuoso Sponger Technology from Olaf Hartig on 2009-10-18 (public-lod@w3.org from October 2009)

From: Olaf Hartig <hartig@informatik.hu-berlin.de>
Date: Sun, 18 Oct 2009 17:34:55 +0200
To: "public-lod@w3.org" <public-lod@w3.org>
Message-Id: <200910181734.56930.hartig@informatik.hu-berlin.de>

Hey Giovanni,

On Sunday 18 October 2009 16:01:41 Giovanni Tummarello wrote:
> I'd say, if i understand well
>
> that that works only for queries where you need the extra dereferenced
> data just "additionally" e.g. to add a label to your result se

I'm not sure what you mean be "additionally" here. The approach works for all 
queries that could be answered by traversing RDF links and building the result 
during this process. This approach doesn't assume a huge store/index that 
holds large parts of the Web of data. Instead, all data that contributes to 
the result is discovered during the execution of the query. 
(at least in the pure form of the approach - for efficiency reasons or to 
allow for more complete results you may want to reuse the data discovered 
during previous query executions)

> if you need the remote, on the fly reference data to e.g. sort by
> price you'd have to fetch all from the remote site ..

True. But fetching all the remote site data that is relevant for the query is 
possible with the link traversal based approach (as long as the single RDF 
graphs from the site are interlinked appropriately).
Sure, this might become more inefficient than systems that crawl in advance. 
But, what if the descriptions for the price-sorted things are from multiple 
data sources. What if these descriptions change quite frequently or the list 
of these things changes often. Maybe even the list of sources that provide the 
descriptions change. In these cases the link traversal based approach will 
help because it allows for up-to-date answers even if their calculation might 
take some time.

Greetings,
Olaf

> Gio
>
>
>
> On Sun, Oct 18, 2009 at 2:57 PM, Olaf Hartig
>
> <hartig@informatik.hu-berlin.de> wrote:
> > Hey,
> >
> > On Sunday 18 October 2009 09:37:14 Martin Hepp (UniBW) wrote:
> >> [...]
> >> So it will boil down to technology that combines (1) crawling and
> >> caching rather stable data sets with (2) distributing queries and parts
> >> of queries among the right SPARQL endpoints (whatever actual DB
> >> technology they expose).
> >>
> >> You can keep a text index of the whole Web, if crawling cycles in the
> >> order of magnitude of weeks are fine. For structured, linked data that
> >> exposes dynamic database content, "dumb" crawling and caching will not
> >> scale.
> >
> > Interesting discussion!
> >
> > An alternative approach to query federation is the link traversal based
> > query execution as implemented in the SemWeb Client Lib. The main idea of
> > this approach is to look-up URIs during the query execution itself. With
> > this approach you don't rely on the existence of SPARQL endpoints and
> > -even more important- you don't have to know all the sources that
> > contribute to the query result in advance. Plus, the results are based on
> > the most up-to-date data you can get.
> >
> > Greetings,
> > Olaf

Received on Sunday, 18 October 2009 15:35:18 UTC