Re: Filter-scope-1 test

On 27 Jun 2007, at 17:52, Seaborne, Andy wrote:

>
>
> Steve Harris wrote:
>> I've a question regarding the filter-scope-1 test, I would have   
>> expected the results below, which would be equivalent to FILTER 
>> (0)  after http://www.w3.org/TR/rdf-sparql-query/
>> ยง 6.2, but instead I see something equivalent to if the OPTIONAL   
>> block were missing entirely.
>> I can't find anything in the spec which suggests that evaluating  
>> a  FILTER() expression with out of scope variables should cause  
>> the  entire enclosing block to be disregarded. Though, I may have  
>> just  missed it.
>> I'm also not sure I like this scoping rule, it rules out some   
>> potentially useful query forms, like:
>> SELECT *
>> WHERE {
>>    ?person :department ?dept .
>>    OPTIONAL {
>>      ?person :project ?proj .
>>      FILTER(?dept = :technology)
>>    }
>> }
>
> This is not the same case as the next example because of teh extra  
> {} below: This is:
>
> (leftjoin
>   (BGP [triple ?person :department ?dept])
>   (BGP [triple ?person :project ?proj])
>   (= ?dept <http://example/technology>))

Yeah, Lee explained that to me on IRC, I meant to post something  
saying what I'd missed last night, but forgot. Sorry.

I have {} blindness apparently, my mental parser just chops them out.

I do regard it as odd (and undesirable) that
    :x :p ?v . { :x :q ?w . OPTIONAL {  :x :p ?v2 } FILTER (...) }
is different to
    :x :p ?v . :x :q ?w . OPTIONAL {  :x :p ?v2 FILTER (...) }

but that's maybe just down to my defective mental parser :)

> ?dept is in scope because it becomes part of the left join condition.
>
> ?person :project ?proj is included when (?dept = :technology).
>
>> Data:
>> @prefix : <http://example/> .
>> :x :p 1 .
>> :x :p 2 .
>> :x :p 3 .
>> :x :p 4 .
>> :x :q 1 .
>> :x :q 2 .
>> :x :q 3 .
>> Query:
>> PREFIX :    <http://example/>
>> SELECT *
>> {
>>      :x :p ?v .
>>      { :x :q ?w
>>        OPTIONAL {  :x :p ?v2 FILTER(?v = 1) }
>>      }
>> }
>
> Because of the extra {} this is a different structure to the first  
> example:
>
> (join
>   (BGP [triple :x :p ?v])
>   (leftjoin
>     (BGP [triple :x :q ?w])
>     (BGP [triple :x :p ?v2])
>     (= ?v 1))
> )

Or in slightly more steve-friendly notation:
(triple :x :p ?v) [X] ( (triple :x :q ?w) ]X] (triple :x :p ?v2) s(?v  
= 1) )

but actually I think it must be equivalent to
(triple :x :p ?v) [X] ( (triple :x :q ?w) ]X] ( (triple :x :p ?v2) s(? 
v = 1) ) )

Otherwise there would be no bindings

>> What I would expect: (excuse the use of NULL)
>
> I'm having difficulty at this point : Could you explain why you  
> expect 48 (4*3*4) results all of which have ?v2 null?  Even old  
> style, that wouldn't happen (i.e. get 4 matches when ?v = 1 and one  
> otherwise).

It's to do with my execution method. I do all the joins, then apply  
the FILTERs afterwards, this used to be fine, and was a valid  
optimisation under the old logic, but isn't anymore.

[ it might seem like an odd optimisation, but it's possible for me to  
perform many algebraic joins in a very parallel way, but not FILTERs,  
so the FILTERs would end up serialising everything ]

> It seems to have include { :x :p ?v2 } matches multiple times even  
> when ?v != 1  And for whatever evaluation we have, why is ?v2 null  
> when ?v = 1?

Well, I don't do enough inspection to know that ?v = 1 cannot  
possibly be true, so I evaluate the join, then later, by inspection I  
can tell that the OPTIONAL branch cannot succeed, as ?v = 1 must  
always fail. But I've already done the join at that point.

I now get that this optimisation cannot be applied in the general  
case anymore (with REDUCED, if I make things DISTINCT internally I  
think it's still valid), I'll have to add some code that determines  
when it can be applied and when I have to halt and inline the FILTER  
evaluation.

Unfortunately the current version of Dave's parser flattens out plain  
{} expressions, which made query processing easier, but there's not  
enough information left to tell the difference.
Changing that is probably going to break my execution engine, but it  
seems like it needs a radical overhaul anyway.

> In evaluation terms, the extra nesting given by the {} introduce a  
> JOIN.

There are the same number of triple patterns, so the same number of  
joins, surely?

I also don't get why it's desirable to have {}s affect the scope of  
variables, especially when OPTIONAL{} and GRAPH{} don't. Lee said  
it's something to do with having different FILTER expressions in  
different branches of a UNION, but that doesn't seem like a good  
enough reason to add such an apparently arbitrary scoping rule.  
There's probably some underlying logic, but it's not clear to me what  
it is.

- Steve

-- 
Steve Harris
Garlik Limited
Gainsborough House
2 Sheen Road
Richmond  TW9 1AE

T   +44(0)20 8973 2465
F   +44(0)20 8973 2301
www.garlik.com

Registered in England and Wales 535 7233 VAT # 849 0517 11
Registered office: 42-46 High Street, Esher, Surrey KT10 9QY

Received on Thursday, 28 June 2007 10:25:44 UTC