Re: order of FILTERs in a BGP? scope of FILTERs? from Lee Feigenbaum on 2006-09-13 (public-rdf-dawg@w3.org from July to September 2006)

From: Lee Feigenbaum <feigenbl@us.ibm.com>
Date: Wed, 13 Sep 2006 13:42:45 -0400
To: andy.seaborne@hp.com
Cc: RDF Data Access Working Group <public-rdf-dawg@w3.org>
Message-ID: <OF7BFC7A36.9E437A27-ON852571E8.005FB4E8-852571E8.00614BA9@us.ibm.com>
"Seaborne, Andy" <andy.seaborne@hp.com> wrote on 09/07/2006 05:43:17 AM:
> Lee Feigenbaum wrote:
> > 
> > It does not explicitly state what solutions a FILTER is tested 
against, so 
> > I see two possibilities:
> > 
> > 1) a FILTER operates globally, regardless of where it occurs in a 
query 
> > (across UNIONs, GRAPHs, groups, OPTIONALs, BGPs, etc.)
> > 2) a FILTER is limited to restricting the solutions of the 
> > FilteredBasicGraphPattern in which it occurs
> > 
> > I suspect #2 is the intended meaning, but because it is tied so 
closely to 
> > the grammar and because this part of the grammar is defined 
recursively ( 
> > FilteredBasicGraphPattern       ::=     BlockOfTriples? ( Constraint 
'.'? 
> > FilteredBasicGraphPattern  )? ), this seems to make the behavior of 
> > FILTERs order-dependent. 
> 
> Great example - could these be submitted as test cases, please?

I'll try to get a few of these checked in as test cases in the next week 
or so.

> 
> #2 is closest to intended meaning - when the matching was by 
> subgraph, FILTERs 
> were part of the group, and as the whole group is a conjunction, they 
weren't 
> attached to a specific BasicGraphPattern (which may be empty in which 
case it 
> is a condition of the group again).

Right, this has always seemed to be the case to me -- a FILTER affects the 
solutions generated from the conjunctive group in which it is a part. But 
the spec doesn't say anything like this.

> Now, however, they are.  The sec11 text is written in the sense of each 
> solution in the whatever-the-context-is is tried against the filter to 
see if 
> it is acceptable or not.
> 
> Eric?

Again, right. Section 11 is silent as to the context where FILTERs are 
evaluated, and nowhere else in the spec makes this clear. is there a group 
consensus that the intention is for FILTERs to be scoped to groups (curly 
braces)? I'll try to highlight what I think this means in the test cases I 
submit.



> >>From some experiments with SPARQLer, I'm still at a bit of a loss:
> > 
> > Data (available at: http://thefigtrees.net/lee/sw/data/g.n3 )
> > 
> > @prefix : <http://example.org/> .
> > 
> > :s1 :p1 :o1 .
> > :s2 :p2 :o2 .
> > 
> > And consider the query (query #1):
> > 
> > PREFIX : <http://example.org/>
> > SELECT *
> > FROM <http://thefigtrees.net/lee/sw/data/g.n3> 
> > {
> >   ?s :p1 :o1 .
> >   ?t :p2 :o2 .
> > }
> > 
> > which has solution:
> > 
> > ?s      ?t
> > --      --
> > :s1     :s2
> > 
> > Now, we add a FILTER (query #2):
> > 
> > PREFIX : <http://example.org/>
> > SELECT *
> > FROM <http://thefigtrees.net/lee/sw/data/g.n3> 
> > {
> >   ?s :p1 :o1 .
> >   ?t :p2 :o2 .
> >   FILTER (?t = :s2) .
> > }
> > 
> > and we get the same single solution:
> > 
> > ?s      ?t
> > --      --
> > :s1     :s2
> > 
> > Now, we change the position of the FILTER (query #3):
> > 
> > PREFIX : <http://example.org/>
> > SELECT *
> > FROM <http://thefigtrees.net/lee/sw/data/g.n3> 
> > {
> >   FILTER (?t = :s2) .
> >   ?s :p1 :o1 .
> >   ?t :p2 :o2 .
> > }
> > 
> > and now we get *zero* solutions.
> 
> This is a different issue - it's about the way ARQ does things and about 
how 
> it tracks the spec.
> 
> As the spec stands, filters are associated with BGPs syntactically. In 
prefix 
> notation:
> 
>    1 (query
>    2   (prefix : <http://example.org/>)
>    3   (select *)
>    4   (from <http://thefigtrees.net/lee/sw/data/g.n3>)
>    5   (group
>    6     (basicgraphpattern
>    7       (filter (= ?t :s2))
>    8       (triplepattern ?s :p1 :o1)
>    9       (triplepattern ?t :p2 :o2)))
>    10 )
> 
> so ARQ has put the filter as part of the BGP.
> 
> The BGP definition does not cover filters.  PatH did offer sometime ago 
to 
> propose to extend them to filters, but I'm not sure this is still valid 
or 
> even relevant because the BGP definition is in-flux and will have 
tochange to 
> accommodate Jorge Perez's report.
> 
> [[
> To be explicit: Jorge Perez's report is
> 
> Graph: <a> <p> <a> . _:x <p> _:y .
> 
> SELECT * { ?a <p> ?a }
> ==> ?a/<a> ?a/_:x ?a/:y
>     because _:x <p> _:x . and _:y <p> _:y . are entailed with bNodes in 
the 
> scoping set when there is redundancy in the graph.
> ]]
> 
> 
> As the BGP defn changes, we'll see where filters fit in.  If they 
> are not part 
> of the BGP, then reflowing would happen if the GROUP allowed it but 
> some order 
> dependence is needed for the OPTIONAL/!bound idiom (which for betteror 
worse 
> is being used, by users (inc the W3C tech reports query!)).

This is an example of the OPTIONAL/!bound idiom, right? (Find maximum 
asserted price of each book):

SELECT ?book ?price {
  ?book :price ?price .
  OPTIONAL {
    ?book :price ?price2 . 
    FILTER (?price2 > ?price)
  }
  FILTER (!bound(?price2))
}

Why is order dependence needed here? If FILTERs are scoped to the siblings 
of whatever group the FILTERs appear in, wouldn't this order-rearraned 
query yield the same results?

SELECT ?book ?price {
  FILTER(!bound(?price2) .
  ?book :price ?price .
  OPTIONAL {
    ?book :price ?price2 . 
    FILTER (?price2 > ?price)
  }
}

Maybe I don't understand what you meant here...

> ARQ executes by splitting the BGP into blocks of triples and 
> filters.  It does 
> not reflow the BGP to put the filter in the right place.  It dispatches 
> blocks-of-triples to the storage layer (which works for a remote 
DIGreasoner 
> as well as plain storage or a rules engine)
> 
> == Implementation feedback: these systems do not support FILTER-like 
> functionality so putting FILTERs too closely with BGPs may be a 
> restriction to 
> implementability.

What are "these systems"? The storage layers that ARQ uses? I think the 
design should be oriented around what we expect the results of queries 
using FILTERs in various locations to be, rather than ease of 
implementation. Or maybe again I'm missing your point (probably :-).

> == Proposal: FILTERs trail the BGP conceptually.  i.e. they are executed 
to 
> minimise the unbound variables and hence maximise the non-error 
> evaluation and 
> maximise the solutions that can be found.

Is this a proposal for what the scope of FILTER should be? I must say that 
I don't understand this proposal. 

> Reflowing the query is actually quite trivial to do as I've done it in 
other 
> places before.   But currently, ARQ executes the query as it is 
> syntactically, 
> so if the FILTER is too early, it will cause an error and so the 
solution is 
> rejected (the engine is constructive - it starts with a single solution 
of no 
> variables and adds bindings as needed to satisfy the patterns).
> 
> That's what happens in query 3:
> 
> arq.qparse --plan ...Q3... prints the execution plan:
> 
> [BasicGraphPattern
>    [Constraint ( ?t = :s2 )]
>    [Triples
>      ?s :p1 :o1
>      ?t :p2 :o2
>    ]]
> 
> 
> Read that as a functional evaluation.  The constraint is performed, then 
the 
> pair of triples are matched as unit.
> 
> I have not rushed to change this to avoid having to change things 
> twice.  I do 
> not want to prejudge where the BGP decision goes and I don't want to 
make a 
> change that is visible to an application queries then make a second 
change 
> that alters applications again.  Making such changes has a large cost to 

> application writers (and we have asked for implementations so I feel an 
> obligation there - it is reported publicly in the W3C newswire of July 
6). It 
> also impacts my support load.  Putting FILTERs in the right place (not 
just 
> last) is a very large performance improvements (x1000+ is quite 
possible).

Yes, I agree with all this. Didn't mean to conflate ARQ's behavior with 
spec design decisions. It's hard to know for any given SPARQL 
implementation which parts of the spec. the implementation is not in step 
with or chooses its own interpretation due to the spec. being 
underspecified.

I guess my upshot here is that ARQ and other engines' behavior completely 
aside, I think we need to revisit the FILTER specification and make sure 
that the design clearly specifies the scope of FILTER; i.e., what 
solutions a FILTER can eliminate.

I think it's clear to me that attaching the scope of a FILTER to a 
FilteredBGP (i.e., associating the conceptual scope of a FILTER with its 
grammatical socpe) does NOT produce results that people expect.

Lee


>    Andy
> 
> > 
> > So my first questions are: is this behavior what people expect from 
> > SPARQL? is this behavior somehow explained by the current spec text?
> > 
> >>From the SPARQL grammar's point of view, I believe the first of these 
is:
> >   FilteredBasicGraphPattern(BlockOfTriples(...), Constraint(...), 
> > FilteredBasicGraphPattern()) 
> > 
> > while the second is 
> >   FilteredBasicGraphPattern(Constraint(...), 
> > FilteredBasicGraphPattern(BlockOfTriples(...)))
> > 
> > I would think in both cases that the two triples occur in the same 
BGP, 
> > and so the solution {?s/:s1, ?t/:s2} should be FILTERed, and not 
> > eliminated. The observed behavior of SPARQLer in query #3 could only 
occur 
> > (I think) if the FILTER is applied individually to the solutions from 
each 
> > triple pattern before those solutions are joined together.
> > 
> > Things get more confusing (for me, at least). Consider:
> > 
> >   http://www.w3.org/2001/sw/DataAccess/tests/#dawg-bound-query-001
> > 
> > Data:
> > 
> > @prefix : <http://example.org/ns#> . 
> > :a1 :b :c1 . 
> > :c1 :d :e . 
> > :a2 :b :c2 . 
> > :c2 :b :f .
> > 
> > Query (query #4):
> > 
> > PREFIX : <http://example.org/ns#> 
> > SELECT ?a ?c
> > WHERE { 
> >   ?a :b ?c . 
> >   OPTIONAL { ?c :d ?e } . 
> >   FILTER (! bound(?e)) 
> > }
> > 
> > Expected results:
> > 
> > ?c      ?a
> > --      --
> > <http://example.org/ns#c2>      <http://example.org/ns#a2>
> > <http://example.org/ns#f>       <http://example.org/ns#c2>
> > 
> > 
> > These results rely on the three solutions of the OPTIONAL 
binary-operator 
> > being FILTERED by the filter. But the OPTIONAL does not appear in the 
same 
> > FilteredBasicGraphPattern as the FILTER constraint (of course, that's 
not 
> > physically possible). The parse tree for this is something akin to:
> > 
> > Group( Optional(BGP(?a :b ?c), BGP(?c :d ?e)), FilteredBGP(BGP(), 
> > Constraint(!bound(?e))) )
> > 
> > It would require a reading of the spec that takes filters that appear 
as 
> > siblings within a group and applies them to all other siblings in a 
group 
> > to get the behavior expected by this test (SPARQLer conforms to the 
> > behavior expected by this test). But I have a very hard time seeing 
how 
> > any explanation for the behavior expected by this test can also have 
the 
> > diverging behavior that I see from SPARQLer in query #2 and query #3. 
> > 
> > ...
> > 
> > I could go on (e.g., everyone probably agrees that FILTERs don't 
affect 
> > solutions in other branches of a UNION), but I don't think I'll 
enlighten 
> > myself any further at this point and it would probably just make this 
> > message more rambling, so I'll stop for now and hope to be corrected 
as to 
> > what I'm missing or at least pointed to discussions of this which I'm 
> > pretty sure have occurred before. 
> > 
> > 
> > thanks,
> > Lee
> > 
> > 
>
Received on Wednesday, 13 September 2006 17:43:02 UTC