Re: Compiling information from several different triplestores from Paul Gearon on 2009-05-05 (semantic-web@w3.org from May 2009)

From: Paul Gearon <gearon@ieee.org>
Date: Tue, 5 May 2009 10:32:52 -0500
To: Geoff Chappell <geoff@sover.net>
Cc: Nicolas Raoul <nicolas.raoul.lists@gmail.com>, semantic-web@w3.org
Message-ID: <a25ac1f0905050832h2a6cb71byca9a14bd5adb5e33@mail.gmail.com>

On Tue, May 5, 2009 at 10:08 AM, Geoff Chappell <geoff@sover.net> wrote:

<snip/>
>
> Here's an example of what I'm talking about using our Semantics.SDK for
> .NET[1]:
>
>        prefix owl: <...>
>        prefix ex: <...>
>        prefix foaf: <...>
>
>        #sparql extension to support rules
>        rulebase (
>                construct {?s ?p ?o} from {?x owl:sameAs ?s. ?x ?p ?o}
>        )
>
>        select ?f
>        from <http://www.someplace.org/data>
>        from <http://www.someotherplace.org/data>
>        where { ex:Anthony foaf:knows ?f }
>
>
> To do this efficiently, the query processor will need statistics for the
> data sources used. For remote graphs (e.g. sparql endpoints) this means that
> they either need to publish stats in a reasonable form, or the query
> processor would have to generate and cache its own based upon queries
> against the graph.

This is exactly what I want to do in Mulgara. Unfortunately, I've
wanted to do this for a couple of years now, and there are always
other priorities. :-(

The idea is to send the basic graph patterns out to each endpoint, and
ask how large the binding will be. The place with the largest binding
gets the entire query, and it sends out the rest of the query to the
other endpoints. Any endpoints with an empty result aren't sent
anything (for those BGPs) after their initial response. Each endpoint
can whittle down the query, sending the remaining query on to their
peers, and joining whatever result they get to their own local BGP
resolutions. The idea is that only the smallest bindings are
transferred across the network, and after join a small binding to a
large one you *usually* get another small binding (with more
variables). This is overly simplistic (really! there's a lot more to
do!), but it illustrates the point. I believe that Aduna are working
on something similar for Sesame.

Until these things are available though, we have to reply on systems
that transfer entire graphs, or complete bindings all the time.
Mulgara can send individual bindings to various servers (either the
full list from the FROM clauses, or individually via GRAPH), which
works pretty well, but it doesn't minimize the network traffic.

Regards,
Paul Gearon

Received on Tuesday, 5 May 2009 15:33:35 UTC