Re: Compiling information from several different triplestores from Michael Lang(Jr.) on 2009-05-05 (semantic-web@w3.org from May 2009)

From: Michael Lang(Jr.) <michaelallenlang@gmail.com>
Date: Tue, 5 May 2009 11:38:00 -0400
To: Paul Gearon <gearon@ieee.org>
Cc: Geoff Chappell <geoff@sover.net>, Nicolas Raoul <nicolas.raoul.lists@gmail.com>, semantic-web@w3.org, Alex Hall <alexhall@revelytix.com>, michaelalang@gmail.com
Message-ID: <59c1f5620905050838h1e1325f5if6ec719beda7a3a@mail.gmail.com>

Bringing a co-worker in on the thread.
Mike Lang
Revelytix, Inc.

phone: 410-584-0009 (office)
          443-928-3782 (cell)
skype: michael.allen.lang.jr
aim: MikeJrRevelytix


On Tue, May 5, 2009 at 11:32 AM, Paul Gearon <gearon@ieee.org> wrote:

> On Tue, May 5, 2009 at 10:08 AM, Geoff Chappell <geoff@sover.net> wrote:
>
> <snip/>
> >
> > Here's an example of what I'm talking about using our Semantics.SDK for
> > .NET[1]:
> >
> >        prefix owl: <...>
> >        prefix ex: <...>
> >        prefix foaf: <...>
> >
> >        #sparql extension to support rules
> >        rulebase (
> >                construct {?s ?p ?o} from {?x owl:sameAs ?s. ?x ?p ?o}
> >        )
> >
> >        select ?f
> >        from <http://www.someplace.org/data>
> >        from <http://www.someotherplace.org/data>
> >        where { ex:Anthony foaf:knows ?f }
> >
> >
> > To do this efficiently, the query processor will need statistics for the
> > data sources used. For remote graphs (e.g. sparql endpoints) this means
> that
> > they either need to publish stats in a reasonable form, or the query
> > processor would have to generate and cache its own based upon queries
> > against the graph.
>
> This is exactly what I want to do in Mulgara. Unfortunately, I've
> wanted to do this for a couple of years now, and there are always
> other priorities. :-(
>
> The idea is to send the basic graph patterns out to each endpoint, and
> ask how large the binding will be. The place with the largest binding
> gets the entire query, and it sends out the rest of the query to the
> other endpoints. Any endpoints with an empty result aren't sent
> anything (for those BGPs) after their initial response. Each endpoint
> can whittle down the query, sending the remaining query on to their
> peers, and joining whatever result they get to their own local BGP
> resolutions. The idea is that only the smallest bindings are
> transferred across the network, and after join a small binding to a
> large one you *usually* get another small binding (with more
> variables). This is overly simplistic (really! there's a lot more to
> do!), but it illustrates the point. I believe that Aduna are working
> on something similar for Sesame.
>
> Until these things are available though, we have to reply on systems
> that transfer entire graphs, or complete bindings all the time.
> Mulgara can send individual bindings to various servers (either the
> full list from the FROM clauses, or individually via GRAPH), which
> works pretty well, but it doesn't minimize the network traffic.
>
> Regards,
> Paul Gearon
>
>

Received on Tuesday, 5 May 2009 15:38:38 UTC