W3C home > Mailing lists > Public > public-rdf-dawg@w3.org > April to June 2007

Re: Filter-scope-1 test

From: Seaborne, Andy <andy.seaborne@hp.com>
Date: Thu, 28 Jun 2007 14:10:49 +0100
Message-ID: <4683B359.9000805@hp.com>
To: Steve Harris <steve.harris@garlik.com>
CC: RDF Data Access Working Group <public-rdf-dawg@w3.org>



Steve Harris wrote:
> On 27 Jun 2007, at 17:52, Seaborne, Andy wrote:
> 
>>
>> Steve Harris wrote:
>>> I've a question regarding the filter-scope-1 test, I would have   
>>> expected the results below, which would be equivalent to FILTER 
>>> (0)  after http://www.w3.org/TR/rdf-sparql-query/
>>> ⤠6.2, but instead I see something equivalent to if the OPTIONAL   
>>> block were missing entirely.
>>> I can't find anything in the spec which suggests that evaluating  
>>> a  FILTER() expression with out of scope variables should cause  
>>> the  entire enclosing block to be disregarded. Though, I may have  
>>> just  missed it.
>>> I'm also not sure I like this scoping rule, it rules out some   
>>> potentially useful query forms, like:
>>> SELECT *
>>> WHERE {
>>>    ?person :department ?dept .
>>>    OPTIONAL {
>>>      ?person :project ?proj .
>>>      FILTER(?dept = :technology)
>>>    }
>>> }
>> This is not the same case as the next example because of teh extra  
>> {} below: This is:
>>
>> (leftjoin
>>   (BGP [triple ?person :department ?dept])
>>   (BGP [triple ?person :project ?proj])
>>   (= ?dept <http://example/technology>))
> 
> Yeah, Lee explained that to me on IRC, I meant to post something  
> saying what I'd missed last night, but forgot. Sorry.
> 
> I have {} blindness apparently, my mental parser just chops them out.
> 
> I do regard it as odd (and undesirable) that
>     :x :p ?v . { :x :q ?w . OPTIONAL {  :x :p ?v2 } FILTER (...) }
> is different to
>     :x :p ?v . :x :q ?w . OPTIONAL {  :x :p ?v2 FILTER (...) }
> 
> but that's maybe just down to my defective mental parser :)

It's a consequence of bottom-up evaluation.  The {} have scope implications.

Consider evaluating some function, which has two arguments each of which use a 
variable x (could be a relational algebra but the point is general evaluation)

   function(A(x), B(x))

Does B(x) see the value of x from A(x)?  Bottom up, the answer is "no": the 
evaluation of A(x) and B(x) are isolated from each other.

> 
>> ?dept is in scope because it becomes part of the left join condition.
>>
>> ?person :project ?proj is included when (?dept = :technology).
>>
>>> Data:
>>> @prefix : <http://example/> .
>>> :x :p 1 .
>>> :x :p 2 .
>>> :x :p 3 .
>>> :x :p 4 .
>>> :x :q 1 .
>>> :x :q 2 .
>>> :x :q 3 .
>>> Query:
>>> PREFIX :    <http://example/>
>>> SELECT *
>>> {
>>>      :x :p ?v .
>>>      { :x :q ?w
>>>        OPTIONAL {  :x :p ?v2 FILTER(?v = 1) }
>>>      }
>>> }
>> Because of the extra {} this is a different structure to the first  
>> example:
>>
>> (join
>>   (BGP [triple :x :p ?v])
>>   (leftjoin
>>     (BGP [triple :x :q ?w])
>>     (BGP [triple :x :p ?v2])
>>     (= ?v 1))
>> )
> 
> Or in slightly more steve-friendly notation:
> (triple :x :p ?v) [X] ( (triple :x :q ?w) ]X] (triple :x :p ?v2) s(?v  
> = 1) )

:-)

But note that SPARQL LeftJoin has three arguments, left and right table,s and 
the leftjoin condition (left theta join).  The condition can access teh 
variables in both left and right tables.

> 
> but actually I think it must be equivalent to
> (triple :x :p ?v) [X] ( (triple :x :q ?w) ]X] ( (triple :x :p ?v2) s(? 
> v = 1) ) )
> 
> Otherwise there would be no bindings
> 
>>> What I would expect: (excuse the use of NULL)
>> I'm having difficulty at this point : Could you explain why you  
>> expect 48 (4*3*4) results all of which have ?v2 null?  Even old  
>> style, that wouldn't happen (i.e. get 4 matches when ?v = 1 and one  
>> otherwise).
> 
> It's to do with my execution method. I do all the joins, then apply  
> the FILTERs afterwards, this used to be fine, and was a valid  
> optimisation under the old logic, but isn't anymore.
> 
> [ it might seem like an odd optimisation, but it's possible for me to  
> perform many algebraic joins in a very parallel way, but not FILTERs,  
> so the FILTERs would end up serialising everything ]

Not odd at all - I know SQL optimizers that do exactly the same for exactly 
the same reasons.

> 
>> It seems to have include { :x :p ?v2 } matches multiple times even  
>> when ?v != 1  And for whatever evaluation we have, why is ?v2 null  
>> when ?v = 1?
> 
> Well, I don't do enough inspection to know that ?v = 1 cannot  
> possibly be true, so I evaluate the join, then later, by inspection I  
> can tell that the OPTIONAL branch cannot succeed, as ?v = 1 must  
> always fail. But I've already done the join at that point.
> 
> I now get that this optimisation cannot be applied in the general  
> case anymore (with REDUCED, if I make things DISTINCT internally I  
> think it's still valid), I'll have to add some code that determines  
> when it can be applied and when I have to halt and inline the FILTER  
> evaluation.
> 
> Unfortunately the current version of Dave's parser flattens out plain  
> {} expressions, which made query processing easier, but there's not  
> enough information left to tell the difference.
> Changing that is probably going to break my execution engine, but it  
> seems like it needs a radical overhaul anyway.
> 
>> In evaluation terms, the extra nesting given by the {} introduce a  
>> JOIN.
> 
> There are the same number of triple patterns, so the same number of  
> joins, surely?

Same number of joins - different scoping.  Basic graph patterns are not merely 
a join of the triple patterns until filters have been moved around.  Nor if 
it's not simple entailment.

> 
> I also don't get why it's desirable to have {}s affect the scope of  
> variables, especially when OPTIONAL{} and GRAPH{} don't.

They do act in the same way.

{ pattern1 } join { pattern2 }

the evaluation of operator lets variables in the left and right equate. 
OPTIONAL and GRAPH are no different really.

> Lee said  
> it's something to do with having different FILTER expressions in  
> different branches of a UNION, but that doesn't seem like a good  
> enough reason to add such an apparently arbitrary scoping rule.  
> There's probably some underlying logic, but it's not clear to me what  
> it is.
> 
> - Steve

Thanks for the update

	Andy

> 

-- 
Hewlett-Packard Limited
Registered Office: Cain Road, Bracknell, Berks RG12 1HN
Registered No: 690597 England
Received on Thursday, 28 June 2007 13:11:02 GMT

This archive was generated by hypermail 2.3.1 : Tuesday, 26 March 2013 16:15:36 GMT