W3C home > Mailing lists > Public > public-rdf-dawg@w3.org > April to June 2007

Re: Filter-scope-1 test

From: Steve Harris <steve.harris@garlik.com>
Date: Thu, 28 Jun 2007 15:53:35 +0100
Message-Id: <3D628D24-79B9-4056-BF6E-0BC785DEB337@garlik.com>
Cc: RDF Data Access Working Group <public-rdf-dawg@w3.org>
To: andy.seaborne@hp.com

On 28 Jun 2007, at 14:10, Seaborne, Andy wrote:
>
>
> Steve Harris wrote:

>>> This is not the same case as the next example because of teh  
>>> extra  {} below: This is:
>>>
>>> (leftjoin
>>>   (BGP [triple ?person :department ?dept])
>>>   (BGP [triple ?person :project ?proj])
>>>   (= ?dept <http://example/technology>))
>> Yeah, Lee explained that to me on IRC, I meant to post something   
>> saying what I'd missed last night, but forgot. Sorry.
>> I have {} blindness apparently, my mental parser just chops them out.
>> I do regard it as odd (and undesirable) that
>>     :x :p ?v . { :x :q ?w . OPTIONAL {  :x :p ?v2 } FILTER (...) }
>> is different to
>>     :x :p ?v . :x :q ?w . OPTIONAL {  :x :p ?v2 FILTER (...) }
>> but that's maybe just down to my defective mental parser :)
>
> It's a consequence of bottom-up evaluation.  The {} have scope  
> implications.

Yes, but only in some cases it seems.

> Consider evaluating some function, which has two arguments each of  
> which use a variable x (could be a relational algebra but the point  
> is general evaluation)
>
>   function(A(x), B(x))
>
> Does B(x) see the value of x from A(x)?  Bottom up, the answer is  
> "no": the evaluation of A(x) and B(x) are isolated from each other.

Well, it depends what you mean by "see" surely? It's an  
implementation decision whether you pass values of x on, you can do A 
(x1), b(x2), x = x1 U x2 if you like.

SPARQL is still supposed to be a declarative language AFAICT.

>>> ?dept is in scope because it becomes part of the left join  
>>> condition.
>>>
>>> ?person :project ?proj is included when (?dept = :technology).
>>>
>>>> Data:
>>>> @prefix : <http://example/> .
>>>> :x :p 1 .
>>>> :x :p 2 .
>>>> :x :p 3 .
>>>> :x :p 4 .
>>>> :x :q 1 .
>>>> :x :q 2 .
>>>> :x :q 3 .
>>>> Query:
>>>> PREFIX :    <http://example/>
>>>> SELECT *
>>>> {
>>>>      :x :p ?v .
>>>>      { :x :q ?w
>>>>        OPTIONAL {  :x :p ?v2 FILTER(?v = 1) }
>>>>      }
>>>> }
>>> Because of the extra {} this is a different structure to the  
>>> first  example:
>>>
>>> (join
>>>   (BGP [triple :x :p ?v])
>>>   (leftjoin
>>>     (BGP [triple :x :q ?w])
>>>     (BGP [triple :x :p ?v2])
>>>     (= ?v 1))
>>> )
>> Or in slightly more steve-friendly notation:
>> (triple :x :p ?v) [X] ( (triple :x :q ?w) ]X] (triple :x :p ?v2) s 
>> (?v  = 1) )
>
> :-)
>
> But note that SPARQL LeftJoin has three arguments, left and right  
> table,s and the leftjoin condition (left theta join).  The  
> condition can access teh variables in both left and right tables.

SPARQL OPTIONALs are not generally theta joins, but I'll let that one  
pass :)

I think (hope) it's true that OPTIONAL(A, B, F) = (A ]X] (B s(F))),  
modulo the scope thing.

>> but actually I think it must be equivalent to
>> (triple :x :p ?v) [X] ( (triple :x :q ?w) ]X] ( (triple :x :p ?v2)  
>> s(? v = 1) ) )
>> Otherwise there would be no bindings
>>>> What I would expect: (excuse the use of NULL)
>>> I'm having difficulty at this point : Could you explain why you   
>>> expect 48 (4*3*4) results all of which have ?v2 null?  Even old   
>>> style, that wouldn't happen (i.e. get 4 matches when ?v = 1 and  
>>> one  otherwise).
>> It's to do with my execution method. I do all the joins, then  
>> apply  the FILTERs afterwards, this used to be fine, and was a  
>> valid  optimisation under the old logic, but isn't anymore.
>> [ it might seem like an odd optimisation, but it's possible for me  
>> to  perform many algebraic joins in a very parallel way, but not  
>> FILTERs,  so the FILTERs would end up serialising everything ]
>
> Not odd at all - I know SQL optimizers that do exactly the same for  
> exactly the same reasons.

OK, well in any case I'll have to do inline FILTER evaluation in some  
cases, in order to get the cardinality correct. I think in the  
absence of FILTERs inside OPTIONAL/UNIONs I'm still OK to apply  
FILTERs at the end.

>>> It seems to have include { :x :p ?v2 } matches multiple times  
>>> even  when ?v != 1  And for whatever evaluation we have, why is ? 
>>> v2 null  when ?v = 1?
>> Well, I don't do enough inspection to know that ?v = 1 cannot   
>> possibly be true, so I evaluate the join, then later, by  
>> inspection I  can tell that the OPTIONAL branch cannot succeed,  
>> as ?v = 1 must  always fail. But I've already done the join at  
>> that point.
>> I now get that this optimisation cannot be applied in the general   
>> case anymore (with REDUCED, if I make things DISTINCT internally  
>> I  think it's still valid), I'll have to add some code that  
>> determines  when it can be applied and when I have to halt and  
>> inline the FILTER  evaluation.
>> Unfortunately the current version of Dave's parser flattens out  
>> plain  {} expressions, which made query processing easier, but  
>> there's not  enough information left to tell the difference.
>> Changing that is probably going to break my execution engine, but  
>> it  seems like it needs a radical overhaul anyway.
>>> In evaluation terms, the extra nesting given by the {} introduce  
>>> a  JOIN.
>> There are the same number of triple patterns, so the same number  
>> of  joins, surely?
>
> Same number of joins - different scoping.  Basic graph patterns are  
> not merely a join of the triple patterns until filters have been  
> moved around.  Nor if it's not simple entailment.
>
>> I also don't get why it's desirable to have {}s affect the scope  
>> of  variables, especially when OPTIONAL{} and GRAPH{} don't.
>
> They do act in the same way.
>
> { pattern1 } join { pattern2 }
>
> the evaluation of operator lets variables in the left and right  
> equate. OPTIONAL and GRAPH are no different really.

Well, they behave differently.

    (1) ?x :p ?y { ?x :r ?w } OPTIONAL { ?x :q ?z }
and
    (2) ?x :p ?y OPTIONAL { ?x :r ?w } OPTIONAL{ ?x :q ?z }

are different as far as I can tell. IIUC the righthand optional only  
shares the middle ?x in (1) and both in (2)?

It seems to me that the whole language would be much easier to  
understand and implement if there were no scope.

It's my feeling that the CR document does not explain them adequately  
in any case. I've read every section that mentions "scope" (mostly  
concerning only bNodes AFAICT) twice, and the whole document once  
through, and I still don't see where it's defined, or see an  
illuminating example.

- Steve
Received on Thursday, 28 June 2007 14:53:46 GMT

This archive was generated by hypermail 2.3.1 : Tuesday, 26 March 2013 16:15:36 GMT