RE: Protocol extensions for federated querying from Seaborne, Andy on 2009-10-22 (public-rdf-dawg@w3.org from October to December 2009)

From: Seaborne, Andy <andy.seaborne@hp.com>
Date: Thu, 22 Oct 2009 15:21:58 +0000
To: Andreas Langegger <al@jku.at>, Paul Gearon <gearon@ieee.org>
CC: "public-rdf-dawg@w3.org" <public-rdf-dawg@w3.org>
Message-ID: <B6CF1054FDC8B845BF93A6645D19BEA3693FAD49DC@GVW1118EXC.americas.hpqcorp.net>


> -----Original Message-----
> From: public-rdf-dawg-request@w3.org [mailto:public-rdf-dawg-request@w3.org]
> On Behalf Of Andreas Langegger
> Sent: 21 October 2009 14:03
> To: Paul Gearon
> Cc: public-rdf-dawg@w3.org
> Subject: Re: Protocol extensions for federated querying
> 
> Hi Paul,
> 
> +1 - would like to see that in SPARQL/Query1.1 also!
> 
> However, I think it would be more convenient, compact and also require
> less markup if initial bindings can be submitted as part of the query
> and not in the post attachment. Small queries could still be issued
> via GET and if there are many bindings, the client just can use POST
> anyway.
> 
> I have implemented a BINDINGS extension in ARQ, demo running at
> http://ramses.faw.uni-

> linz.ac.at:8900/snorql/?query=SELECT+*+WHERE+%7B%0D%0A++%3Fs+a+%3Ftype%0D%0A%
> 7D+BINDINGS+%3Ftype+%7B%0D%0A++bsbm%3AProduct+.%0D%0A++foaf%3APerson+.%0D%0A%
> 7D
> 
> Example with multiple variables (empty bindings may be specified with
> "null"):
> SELECT * WHERE {
>    ?s :p ?a ; :p ?b ...
> } BINDINGS ?a ?b {
>    bsbm:Product "34"^^xsd:int .
>    null "23"^^xsd:int .
>    foaf:Person . // remaining slots are interpreted as empty (null)
> }


I like the idea of in-query syntax for this.

> 
> The evaluation is simply a Join in ARQ against an OpTable which is the
> materialized solutions supplied. Very simple to implement actually and
> worth having it in future SPARQL.

An alternative design is to regard the bindings as initial values and evaluation is anything that's equivalent to a loop taking rows one at a time, substituting for variables and evaluating the query.  This is nearly the same as a join except when certain nested optionals forms are in the query.  It’s a bit more in keeping with streaming the bindings in while streaming results out.  I think this is what is called a "bind join" in the Garlic (at IBM) work from some time ago.

 Andy

> For scalable federation over public SPARQL endpoints I'm however more
> than sceptical since I've done much research and experiments towards
> this direction. My SemWIQ [1] mediator is working with patched
> endpoints only that support SPARQL BINDINGS and RDFStats [2]. I think
> issuing COUNT queries before may not scale well. Initial bindings
> mainly reduce the latency times for HTTP connections, but it does only
> linearly speed up federation. If there are many distributed joins,
> even bind joins (dynamic optimization by substitution) becomes
> troublesome...
> 
> Regards,
> Andy
> 
> [1] http://semwiq.sourceforge.net

> [2] http://rdfstats.sourceforge.net

> 
> 
> On Oct 20, 2009, at 9:51 PM, Paul Gearon wrote:
> 
> > Hi everyone,
> >
> > This meets the commitment I made for ACTION-124.
> >
> > So far, all the comments I've seen on federated queries have been
> > about the suggested query syntax. To date I'm in agreement with what
> > I've seen proposed.
> >
> > I am also interested in extending the protocol to support federation a
> > little better. At the moment, all queries are done as a simple request
> > via a GET or a POST. In the case of POST, the endpoint alone is
> > provided in the URL, and the query appears in the body.
> >
> > I'd like to see a form of POST that includes a SPARQL variable binding
> > result in the body (a la http://www.w3.org/TR/rdf-sparql-XMLres/). In
> > this way the receiving query engine can work with prebindings that are
> > provided to it, allowing it to reduce the result that is to be
> > streamed back to the calling engine.
> >
> > To give an example, I'll reference the two datasets found in 8.3 of
> > the SPARQL Query Language document:
> > http://www.w3.org/TR/rdf-sparql-query/#queryDataset

> >
> > If we make the presumption that the named graph
> > http://example.org/foaf/aliceFoaf can be found at
> > http://sparql.org/sparql/, then I might want to issue the following
> > query to get the names of people whose nicknames are in the bobFoaf
> > graph:
> >
> > SELECT ?nick ?name
> > FROM <http://example.org/foaf/bobFoaf>
> > WHERE {
> > ?p1 foaf:nick ?nick .
> > ?p1 foaf:mbox ?mbox
> > SERVICE <http://sparql.org/sparql/> {
> >   SELECT ?mbox ?name
> >   FROM <http://example.org/foaf/aliceFoaf>
> >   WHERE { ?p2 foaf:mbox ?mbox . ?p2 foaf:name ?name }
> > }
> > }
> >
> >
> > The part of the query in the SERVICE block would usually return the
> > following:
> > <?xml version="1.0"?>
> > <sparql xmlns="http://www.w3.org/2005/sparql-results#">
> > <head>
> >   <variable name="mbox"/>
> >   <variable name="name"/>
> > </head>
> > <results>
> >   <result>
> >     <binding name="mbox"><uri>mailto:alice@work.example</uri></
> > binding>
> >     <binding name="name"><literal>Alice</literal></binding>
> >   </result>
> >   <result>
> >     <binding name="mbox"><uri>mailto:bob@work.example</uri></binding>
> >     <binding name="name"><literal>Bob</literal></binding>
> >   </result>
> > </results>
> > </sparql>
> >
> > Note that this is information for both Bob and Alice. This can then be
> > joined to the remainder of the query, which reduces the results to
> > just Bob.
> >
> > However, a query engine may instead want to evaluate Bob first. This
> > may be desirable if some COUNT queries have already been issued, and
> > the query engine knows that the results of the SERVICE block will
> > return a large number of results, while the local data would bind
> > ?mbox to only a few values. In that case, the local binding of ?mbox
> > could be sent along with the query (?p1 and ?nick are not necessary
> > for the remote service). This could be accomplished using a POST that
> > has the query in the URL, and the bindings in the body.
> >
> > POST /sparql/?query=SELECT+%3Fmbox+%3Fname+FROM+%3Chttp%3A%2F
> > %2Fexample.org%2Ffoaf%2FaliceFoaf%3E+WHERE+%7B+%3Fp2+foaf%3Ambox+
> > %3Fmbox+.+%3Fp2+foaf%3Aname+%3Fname+%7D
> > HTTP/1.1
> > Content-Length: xxxxxx
> > Content-Type: multipart/form-data;
> > boundary=ZpwZZc62ZXXjf0InvlrBjTWNrJSp--FL
> > Host: sparql.org
> > Connection: Keep-Alive
> > User-Agent: example
> >
> > --ZpwZZc62ZXXjf0InvlrBjTWNrJSp--FL
> > Content-Disposition: form-data; name="query-prebinding"
> > Content-Type: text/plain; charset=UTF-8
> > Content-Transfer-Encoding: 8bit
> >
> > <?xml version="1.0"?>
> > <sparql xmlns="http://www.w3.org/2005/sparql-results#">
> > <head>
> >   <variable name="mbox"/>
> > </head>
> > <results>
> >   <result>
> >     <binding name="mbox"><uri>mailto:bob@work.example</uri></binding>
> >   </result>
> > </results>
> > </sparql>
> >
> > --ZpwZZc62ZXXjf0InvlrBjTWNrJSp--FL--
> >
> > With this pre-binding, the remote query engine is able to reduce it's
> > results to just the one for Bob, thereby cutting the returned size
> > down by nearly half.
> >
> > One potential issue is for very long queries that also want to be
> > placed into the body of a POST. In that case we could simply define
> > the names of each section (in the example above I've used a name of
> > "query-prebinding").
> >
> > What do others think? Does this proposal have merit?
> >
> > Regards,
> > Paul Gearon
> >
> 
> 
> http://www.langegger.at

> ----------------------------------------------------------------------
> Dipl.-Ing.(FH) Andreas Langegger
> FAW - Institute for Application-oriented Knowledge Processing
> Johannes Kepler University Linz
> A-4040 Linz, Altenberger Straße 69
> 
> 
> 
> 
> 
>
Received on Thursday, 22 October 2009 15:22:53 UTC