- From: Seaborne, Andy <andy.seaborne@hp.com>
- Date: Thu, 22 Oct 2009 15:21:58 +0000
- To: Andreas Langegger <al@jku.at>, Paul Gearon <gearon@ieee.org>
- CC: "public-rdf-dawg@w3.org" <public-rdf-dawg@w3.org>
> -----Original Message----- > From: public-rdf-dawg-request@w3.org [mailto:public-rdf-dawg-request@w3.org] > On Behalf Of Andreas Langegger > Sent: 21 October 2009 14:03 > To: Paul Gearon > Cc: public-rdf-dawg@w3.org > Subject: Re: Protocol extensions for federated querying > > Hi Paul, > > +1 - would like to see that in SPARQL/Query1.1 also! > > However, I think it would be more convenient, compact and also require > less markup if initial bindings can be submitted as part of the query > and not in the post attachment. Small queries could still be issued > via GET and if there are many bindings, the client just can use POST > anyway. > > I have implemented a BINDINGS extension in ARQ, demo running at > http://ramses.faw.uni- > linz.ac.at:8900/snorql/?query=SELECT+*+WHERE+%7B%0D%0A++%3Fs+a+%3Ftype%0D%0A% > 7D+BINDINGS+%3Ftype+%7B%0D%0A++bsbm%3AProduct+.%0D%0A++foaf%3APerson+.%0D%0A% > 7D > > Example with multiple variables (empty bindings may be specified with > "null"): > SELECT * WHERE { > ?s :p ?a ; :p ?b ... > } BINDINGS ?a ?b { > bsbm:Product "34"^^xsd:int . > null "23"^^xsd:int . > foaf:Person . // remaining slots are interpreted as empty (null) > } I like the idea of in-query syntax for this. > > The evaluation is simply a Join in ARQ against an OpTable which is the > materialized solutions supplied. Very simple to implement actually and > worth having it in future SPARQL. An alternative design is to regard the bindings as initial values and evaluation is anything that's equivalent to a loop taking rows one at a time, substituting for variables and evaluating the query. This is nearly the same as a join except when certain nested optionals forms are in the query. It’s a bit more in keeping with streaming the bindings in while streaming results out. I think this is what is called a "bind join" in the Garlic (at IBM) work from some time ago. Andy > For scalable federation over public SPARQL endpoints I'm however more > than sceptical since I've done much research and experiments towards > this direction. My SemWIQ [1] mediator is working with patched > endpoints only that support SPARQL BINDINGS and RDFStats [2]. I think > issuing COUNT queries before may not scale well. Initial bindings > mainly reduce the latency times for HTTP connections, but it does only > linearly speed up federation. If there are many distributed joins, > even bind joins (dynamic optimization by substitution) becomes > troublesome... > > Regards, > Andy > > [1] http://semwiq.sourceforge.net > [2] http://rdfstats.sourceforge.net > > > On Oct 20, 2009, at 9:51 PM, Paul Gearon wrote: > > > Hi everyone, > > > > This meets the commitment I made for ACTION-124. > > > > So far, all the comments I've seen on federated queries have been > > about the suggested query syntax. To date I'm in agreement with what > > I've seen proposed. > > > > I am also interested in extending the protocol to support federation a > > little better. At the moment, all queries are done as a simple request > > via a GET or a POST. In the case of POST, the endpoint alone is > > provided in the URL, and the query appears in the body. > > > > I'd like to see a form of POST that includes a SPARQL variable binding > > result in the body (a la http://www.w3.org/TR/rdf-sparql-XMLres/). In > > this way the receiving query engine can work with prebindings that are > > provided to it, allowing it to reduce the result that is to be > > streamed back to the calling engine. > > > > To give an example, I'll reference the two datasets found in 8.3 of > > the SPARQL Query Language document: > > http://www.w3.org/TR/rdf-sparql-query/#queryDataset > > > > If we make the presumption that the named graph > > http://example.org/foaf/aliceFoaf can be found at > > http://sparql.org/sparql/, then I might want to issue the following > > query to get the names of people whose nicknames are in the bobFoaf > > graph: > > > > SELECT ?nick ?name > > FROM <http://example.org/foaf/bobFoaf> > > WHERE { > > ?p1 foaf:nick ?nick . > > ?p1 foaf:mbox ?mbox > > SERVICE <http://sparql.org/sparql/> { > > SELECT ?mbox ?name > > FROM <http://example.org/foaf/aliceFoaf> > > WHERE { ?p2 foaf:mbox ?mbox . ?p2 foaf:name ?name } > > } > > } > > > > > > The part of the query in the SERVICE block would usually return the > > following: > > <?xml version="1.0"?> > > <sparql xmlns="http://www.w3.org/2005/sparql-results#"> > > <head> > > <variable name="mbox"/> > > <variable name="name"/> > > </head> > > <results> > > <result> > > <binding name="mbox"><uri>mailto:alice@work.example</uri></ > > binding> > > <binding name="name"><literal>Alice</literal></binding> > > </result> > > <result> > > <binding name="mbox"><uri>mailto:bob@work.example</uri></binding> > > <binding name="name"><literal>Bob</literal></binding> > > </result> > > </results> > > </sparql> > > > > Note that this is information for both Bob and Alice. This can then be > > joined to the remainder of the query, which reduces the results to > > just Bob. > > > > However, a query engine may instead want to evaluate Bob first. This > > may be desirable if some COUNT queries have already been issued, and > > the query engine knows that the results of the SERVICE block will > > return a large number of results, while the local data would bind > > ?mbox to only a few values. In that case, the local binding of ?mbox > > could be sent along with the query (?p1 and ?nick are not necessary > > for the remote service). This could be accomplished using a POST that > > has the query in the URL, and the bindings in the body. > > > > POST /sparql/?query=SELECT+%3Fmbox+%3Fname+FROM+%3Chttp%3A%2F > > %2Fexample.org%2Ffoaf%2FaliceFoaf%3E+WHERE+%7B+%3Fp2+foaf%3Ambox+ > > %3Fmbox+.+%3Fp2+foaf%3Aname+%3Fname+%7D > > HTTP/1.1 > > Content-Length: xxxxxx > > Content-Type: multipart/form-data; > > boundary=ZpwZZc62ZXXjf0InvlrBjTWNrJSp--FL > > Host: sparql.org > > Connection: Keep-Alive > > User-Agent: example > > > > --ZpwZZc62ZXXjf0InvlrBjTWNrJSp--FL > > Content-Disposition: form-data; name="query-prebinding" > > Content-Type: text/plain; charset=UTF-8 > > Content-Transfer-Encoding: 8bit > > > > <?xml version="1.0"?> > > <sparql xmlns="http://www.w3.org/2005/sparql-results#"> > > <head> > > <variable name="mbox"/> > > </head> > > <results> > > <result> > > <binding name="mbox"><uri>mailto:bob@work.example</uri></binding> > > </result> > > </results> > > </sparql> > > > > --ZpwZZc62ZXXjf0InvlrBjTWNrJSp--FL-- > > > > With this pre-binding, the remote query engine is able to reduce it's > > results to just the one for Bob, thereby cutting the returned size > > down by nearly half. > > > > One potential issue is for very long queries that also want to be > > placed into the body of a POST. In that case we could simply define > > the names of each section (in the example above I've used a name of > > "query-prebinding"). > > > > What do others think? Does this proposal have merit? > > > > Regards, > > Paul Gearon > > > > > http://www.langegger.at > ---------------------------------------------------------------------- > Dipl.-Ing.(FH) Andreas Langegger > FAW - Institute for Application-oriented Knowledge Processing > Johannes Kepler University Linz > A-4040 Linz, Altenberger Straße 69 > > > > > >
Received on Thursday, 22 October 2009 15:22:53 UTC