Re: querions regarding BINDINGS from Andy Seaborne on 2010-09-11 (public-rdf-dawg@w3.org from July to September 2010)

From: Andy Seaborne <andy.seaborne@epimorphics.com>
Date: Sat, 11 Sep 2010 17:24:56 +0100
To: Carlos Buil Aranda <cbuil@fi.upm.es>
CC: public-rdf-dawg@w3.org
Message-ID: <4C8BAD58.8080101@epimorphics.com>
Summary:

   Proposal to use the BINDINGS mechanism for parametrized queries.


On 09/09/10 17:02, Carlos Buil Aranda wrote:
> Hello all,
>
> I'm looking at the BINDINGS description of the SPARQL 1.1 query
> federation and I have a question about it. It says "In order to
> efficiently communicate constraints to sparql endpoints, the queryier
> may follow the WHERE clause with BINDINGS." I'm want to implement it,
> but I do not totally understand it. Does the sentence mean that it is
> better to use BINDINGS for global constraints?

The BINDINGS clause allows the query client to send some additional data 
to the remote endpoint.  The remote endpoint takes that data and 
executes the query with that data, and returns the results.  This can 
result in considerable less data needing to be sent over the network.

For example:
     SELECT * { ?s ?p ?o }
vs
     SELECT * { ?s ?p ?o }
     BINDINGS ?s { ( <http://example/s1> ) ( <http://example/s2> ) }

on a large dataset.

The federation document currently defines the combination of data with 
the WHERE clause as adding a join to the algebra operations:

(join
    (bgp (triple ?s ?p ?o))
    (table ((?s <http://example/s1>)
            (?s <http://example/s2>)
    ))

or whatever the internal representation of constant data is in the system.

> As I see it, BINDINGS modifies the solution once all the triples have
> been recovered from multiple SERVICE (or a single query), but FILTER can
> do much of it inside the query, Am I right?  in the example query, would
> not be possible (in case they were services) to filter directly the
> results of them for making the query more efficient? there would be less
> data to make the joins, right?

Yes - but it may not be very convenient to do it that way -  it might 
become a very unwieldy filter:

     SELECT *
     { ?s ?p ?o .
       FILTER ( ?s = <http://example/s1> || ?s = <http://example/s2> )
     }

with more variables, some of which can be UNDEF (not bound in that row), 
it's going to get unwieldy.  Instead, the query can be sent and the 
additional binding streamed after the query.

It is very close to the parameterized query feature the WG originally 
discussed [1,2] (and see also parametrized queries in SQL) but it is not 
quite the same because the proposed definition is a join.

For example:

SELECT *
{ ?s ?p ?o . FILTER ( ?o < ?v ) }
BINDINGS ?v { (1) (2) }

does not yield any results as currently defined (?v is unbound in query 
execution, the FILTER eliminated everything, then the join is done on no 
results from the query).

But thought of as substitutions for the bindings:

   for each row:
     substitute all occurrences of named variables in the query
     execute modified query

this query does have results because ?v is replaced by 1 and then by 2; 
?v is bound the test is done and there maybe results.

That is, where SQL typically has positional parameters, SPARQL can use 
named variables as parameters.

OpenLink have observed that the execution of the modified query can 
often reuse the same query execution plan - see [2].  If an 
implementation chooses to, it can use the join form when the queries are 
equivalent (that's a static test).

I propose we use substition semantics so that we can parametrize 
FILTERs. There is already an operator in the algebra to do this.

I'm neutral as to whether the protocol is extended as well. [3]

What I think we do need to consider is whether using join semantics on a 
feature so closely related to parametrized queries, we are making it 
harder to do parametrized queries at some future time if we decide on a 
different mechanism.


Implementation experience:

ARQ has offered a more limited version of feature - it's a single row, 
not several - and the feature is quite well used.

> (by the way, in the BINDINGS section, the variable ?iuphar is not
> present inside the query but it is in the select and the bindings part,
> is that ok?)

That is something we need to discuss.

But it could be OK (it returns ?iuphar from the table join).

	Andy

[1] http://www.w3.org/2009/sparql/wiki/Feature:Parameters
[2]
http://www.w3.org/2009/sparql/wiki/Extensions_Proposed_By_OpenLink
[3]
http://lists.w3.org/Archives/Public/public-rdf-dawg-comments/2009Mar/0013.html
Received on Saturday, 11 September 2010 16:25:34 UTC