Re: order of FILTERs in a BGP? scope of FILTERs?

Lee Feigenbaum wrote:
> Hi everyone,
> 
> I know that we've discussed this -- or similar -- before, but my searching 
> hasn't found anything conclusive/authoritative. I've been working through 
> some situations involving FILTERs in my implementation and 
> cross-referencing with ARQ's behavior in SPARQLer (http://www.sparql.org) 
> and also trying to work through the "proper" behavior according to the 
> spec. 
> 
> ... And I'm having a lot of difficulties :)
> 
> First, I find the spec (I'm working off of rq24 but don't believe this is 
> any different for rq23 or the published CR) unclear as to what the scope 
> of a FILTER is. The text in section 11 reads:
> 
> """
> Specifically, FILTERs eliminate any solutions that, when substituted into 
> the expression, either result in an effective boolean value of false or 
> produce an error.
> """
> 
> It does not explicitly state what solutions a FILTER is tested against, so 
> I see two possibilities:
> 
> 1) a FILTER operates globally, regardless of where it occurs in a query 
> (across UNIONs, GRAPHs, groups, OPTIONALs, BGPs, etc.)
> 2) a FILTER is limited to restricting the solutions of the 
> FilteredBasicGraphPattern in which it occurs
> 
> I suspect #2 is the intended meaning, but because it is tied so closely to 
> the grammar and because this part of the grammar is defined recursively ( 
> FilteredBasicGraphPattern       ::=     BlockOfTriples? ( Constraint '.'? 
> FilteredBasicGraphPattern  )? ), this seems to make the behavior of 
> FILTERs order-dependent. 

Great example - could these be submitted as test cases, please?


#2 is closest to intended meaning - when the matching was by subgraph, FILTERs 
were part of the group, and as the whole group is a conjunction, they weren't 
attached to a specific BasicGraphPattern (which may be empty in which case it 
is a condition of the group again).

Now, however, they are.  The sec11 text is written in the sense of each 
solution in the whatever-the-context-is is tried against the filter to see if 
it is acceptable or not.

Eric?

> 
>>From some experiments with SPARQLer, I'm still at a bit of a loss:
> 
> Data (available at: http://thefigtrees.net/lee/sw/data/g.n3 )
> 
> @prefix : <http://example.org/> .
> 
> :s1 :p1 :o1 .
> :s2 :p2 :o2 .
> 
> And consider the query (query #1):
> 
> PREFIX : <http://example.org/>
> SELECT *
> FROM <http://thefigtrees.net/lee/sw/data/g.n3> 
> {
>   ?s :p1 :o1 .
>   ?t :p2 :o2 .
> }
> 
> which has solution:
> 
> ?s      ?t
> --      --
> :s1     :s2
> 
> Now, we add a FILTER (query #2):
> 
> PREFIX : <http://example.org/>
> SELECT *
> FROM <http://thefigtrees.net/lee/sw/data/g.n3> 
> {
>   ?s :p1 :o1 .
>   ?t :p2 :o2 .
>   FILTER (?t = :s2) .
> }
> 
> and we get the same single solution:
> 
> ?s      ?t
> --      --
> :s1     :s2
> 
> Now, we change the position of the FILTER (query #3):
> 
> PREFIX : <http://example.org/>
> SELECT *
> FROM <http://thefigtrees.net/lee/sw/data/g.n3> 
> {
>   FILTER (?t = :s2) .
>   ?s :p1 :o1 .
>   ?t :p2 :o2 .
> }
> 
> and now we get *zero* solutions.

This is a different issue - it's about the way ARQ does things and about how 
it tracks the spec.

As the spec stands, filters are associated with BGPs syntactically.  In prefix 
notation:

   1 (query
   2   (prefix : <http://example.org/>)
   3   (select *)
   4   (from <http://thefigtrees.net/lee/sw/data/g.n3>)
   5   (group
   6     (basicgraphpattern
   7       (filter (= ?t :s2))
   8       (triplepattern ?s :p1 :o1)
   9       (triplepattern ?t :p2 :o2)))
   10 )

so ARQ has put the filter as part of the BGP.

The BGP definition does not cover filters.  PatH did offer sometime ago to 
propose to extend them to filters, but I'm not sure this is still valid or 
even relevant because the BGP definition is in-flux and will have to change to 
accommodate Jorge Perez's report.

[[
To be explicit: Jorge Perez's report is

Graph: <a> <p> <a> . _:x <p> _:y .

SELECT * { ?a <p> ?a }
==> ?a/<a> ?a/_:x ?a/:y
    because _:x <p> _:x . and _:y <p> _:y . are entailed with bNodes in the 
scoping set when there is redundancy in the graph.
]]


As the BGP defn changes, we'll see where filters fit in.  If they are not part 
of the BGP, then reflowing would happen if the GROUP allowed it but some order 
dependence is needed for the OPTIONAL/!bound idiom (which for better or worse 
is being used, by users (inc the W3C tech reports query!)).

ARQ executes by splitting the BGP into blocks of triples and filters.  It does 
not reflow the BGP to put the filter in the right place.  It dispatches 
blocks-of-triples to the storage layer (which works for a remote DIG reasoner 
as well as plain storage or a rules engine)

== Implementation feedback: these systems do not support FILTER-like 
functionality so putting FILTERs too closely with BGPs may be a restriction to 
implementability.

== Proposal: FILTERs trail the BGP conceptually.  i.e. they are executed to 
minimise the unbound variables and hence maximise the non-error evaluation and 
maximise the solutions that can be found.

Reflowing the query is actually quite trivial to do as I've done it in other 
places before.   But currently, ARQ executes the query as it is syntactically, 
so if the FILTER is too early, it will cause an error and so the solution is 
rejected (the engine is constructive - it starts with a single solution of no 
variables and adds bindings as needed to satisfy the patterns).

That's what happens in query 3:

arq.qparse --plan ...Q3... prints the execution plan:

[BasicGraphPattern
   [Constraint ( ?t = :s2 )]
   [Triples
     ?s :p1 :o1
     ?t :p2 :o2
   ]]


Read that as a functional evaluation.  The constraint is performed, then the 
pair of triples are matched as unit.

I have not rushed to change this to avoid having to change things twice.  I do 
not want to prejudge where the BGP decision goes and I don't want to make a 
change that is visible to an application queries then make a second change 
that alters applications again.  Making such changes has a large cost to 
application writers (and we have asked for implementations so I feel an 
obligation there - it is reported publicly in the W3C newswire of July 6). It 
also impacts my support load.  Putting FILTERs in the right place (not just 
last) is a very large performance improvements (x1000+ is quite possible).

	Andy

> 
> So my first questions are: is this behavior what people expect from 
> SPARQL? is this behavior somehow explained by the current spec text?
> 
>>From the SPARQL grammar's point of view, I believe the first of these is:
>   FilteredBasicGraphPattern(BlockOfTriples(...), Constraint(...), 
> FilteredBasicGraphPattern()) 
> 
> while the second is 
>   FilteredBasicGraphPattern(Constraint(...), 
> FilteredBasicGraphPattern(BlockOfTriples(...)))
> 
> I would think in both cases that the two triples occur in the same BGP, 
> and so the solution {?s/:s1, ?t/:s2} should be FILTERed, and not 
> eliminated. The observed behavior of SPARQLer in query #3 could only occur 
> (I think) if the FILTER is applied individually to the solutions from each 
> triple pattern before those solutions are joined together.
> 
> Things get more confusing (for me, at least). Consider:
> 
>   http://www.w3.org/2001/sw/DataAccess/tests/#dawg-bound-query-001
> 
> Data:
> 
> @prefix : <http://example.org/ns#> . 
> :a1 :b :c1 . 
> :c1 :d :e . 
> :a2 :b :c2 . 
> :c2 :b :f .
> 
> Query (query #4):
> 
> PREFIX : <http://example.org/ns#> 
> SELECT ?a ?c
> WHERE { 
>   ?a :b ?c . 
>   OPTIONAL { ?c :d ?e } . 
>   FILTER (! bound(?e)) 
> }
> 
> Expected results:
> 
> ?c      ?a
> --      --
> <http://example.org/ns#c2>      <http://example.org/ns#a2>
> <http://example.org/ns#f>       <http://example.org/ns#c2>
> 
> 
> These results rely on the three solutions of the OPTIONAL binary-operator 
> being FILTERED by the filter. But the OPTIONAL does not appear in the same 
> FilteredBasicGraphPattern as the FILTER constraint (of course, that's not 
> physically possible). The parse tree for this is something akin to:
> 
> Group( Optional(BGP(?a :b ?c), BGP(?c :d ?e)), FilteredBGP(BGP(), 
> Constraint(!bound(?e))) )
> 
> It would require a reading of the spec that takes filters that appear as 
> siblings within a group and applies them to all other siblings in a group 
> to get the behavior expected by this test (SPARQLer conforms to the 
> behavior expected by this test). But I have a very hard time seeing how 
> any explanation for the behavior expected by this test can also have the 
> diverging behavior that I see from SPARQLer in query #2 and query #3. 
> 
> ...
> 
> I could go on (e.g., everyone probably agrees that FILTERs don't affect 
> solutions in other branches of a UNION), but I don't think I'll enlighten 
> myself any further at this point and it would probably just make this 
> message more rambling, so I'll stop for now and hope to be corrected as to 
> what I'm missing or at least pointed to discussions of this which I'm 
> pretty sure have occurred before. 
> 
> 
> thanks,
> Lee
> 
> 

Received on Thursday, 7 September 2006 09:43:51 UTC