- From: Lee Feigenbaum <feigenbl@us.ibm.com>
- Date: Wed, 13 Sep 2006 13:42:45 -0400
- To: andy.seaborne@hp.com
- Cc: RDF Data Access Working Group <public-rdf-dawg@w3.org>
"Seaborne, Andy" <andy.seaborne@hp.com> wrote on 09/07/2006 05:43:17 AM: > Lee Feigenbaum wrote: > > > > It does not explicitly state what solutions a FILTER is tested against, so > > I see two possibilities: > > > > 1) a FILTER operates globally, regardless of where it occurs in a query > > (across UNIONs, GRAPHs, groups, OPTIONALs, BGPs, etc.) > > 2) a FILTER is limited to restricting the solutions of the > > FilteredBasicGraphPattern in which it occurs > > > > I suspect #2 is the intended meaning, but because it is tied so closely to > > the grammar and because this part of the grammar is defined recursively ( > > FilteredBasicGraphPattern ::= BlockOfTriples? ( Constraint '.'? > > FilteredBasicGraphPattern )? ), this seems to make the behavior of > > FILTERs order-dependent. > > Great example - could these be submitted as test cases, please? I'll try to get a few of these checked in as test cases in the next week or so. > > #2 is closest to intended meaning - when the matching was by > subgraph, FILTERs > were part of the group, and as the whole group is a conjunction, they weren't > attached to a specific BasicGraphPattern (which may be empty in which case it > is a condition of the group again). Right, this has always seemed to be the case to me -- a FILTER affects the solutions generated from the conjunctive group in which it is a part. But the spec doesn't say anything like this. > Now, however, they are. The sec11 text is written in the sense of each > solution in the whatever-the-context-is is tried against the filter to see if > it is acceptable or not. > > Eric? Again, right. Section 11 is silent as to the context where FILTERs are evaluated, and nowhere else in the spec makes this clear. is there a group consensus that the intention is for FILTERs to be scoped to groups (curly braces)? I'll try to highlight what I think this means in the test cases I submit. > >>From some experiments with SPARQLer, I'm still at a bit of a loss: > > > > Data (available at: http://thefigtrees.net/lee/sw/data/g.n3 ) > > > > @prefix : <http://example.org/> . > > > > :s1 :p1 :o1 . > > :s2 :p2 :o2 . > > > > And consider the query (query #1): > > > > PREFIX : <http://example.org/> > > SELECT * > > FROM <http://thefigtrees.net/lee/sw/data/g.n3> > > { > > ?s :p1 :o1 . > > ?t :p2 :o2 . > > } > > > > which has solution: > > > > ?s ?t > > -- -- > > :s1 :s2 > > > > Now, we add a FILTER (query #2): > > > > PREFIX : <http://example.org/> > > SELECT * > > FROM <http://thefigtrees.net/lee/sw/data/g.n3> > > { > > ?s :p1 :o1 . > > ?t :p2 :o2 . > > FILTER (?t = :s2) . > > } > > > > and we get the same single solution: > > > > ?s ?t > > -- -- > > :s1 :s2 > > > > Now, we change the position of the FILTER (query #3): > > > > PREFIX : <http://example.org/> > > SELECT * > > FROM <http://thefigtrees.net/lee/sw/data/g.n3> > > { > > FILTER (?t = :s2) . > > ?s :p1 :o1 . > > ?t :p2 :o2 . > > } > > > > and now we get *zero* solutions. > > This is a different issue - it's about the way ARQ does things and about how > it tracks the spec. > > As the spec stands, filters are associated with BGPs syntactically. In prefix > notation: > > 1 (query > 2 (prefix : <http://example.org/>) > 3 (select *) > 4 (from <http://thefigtrees.net/lee/sw/data/g.n3>) > 5 (group > 6 (basicgraphpattern > 7 (filter (= ?t :s2)) > 8 (triplepattern ?s :p1 :o1) > 9 (triplepattern ?t :p2 :o2))) > 10 ) > > so ARQ has put the filter as part of the BGP. > > The BGP definition does not cover filters. PatH did offer sometime ago to > propose to extend them to filters, but I'm not sure this is still valid or > even relevant because the BGP definition is in-flux and will have tochange to > accommodate Jorge Perez's report. > > [[ > To be explicit: Jorge Perez's report is > > Graph: <a> <p> <a> . _:x <p> _:y . > > SELECT * { ?a <p> ?a } > ==> ?a/<a> ?a/_:x ?a/:y > because _:x <p> _:x . and _:y <p> _:y . are entailed with bNodes in the > scoping set when there is redundancy in the graph. > ]] > > > As the BGP defn changes, we'll see where filters fit in. If they > are not part > of the BGP, then reflowing would happen if the GROUP allowed it but > some order > dependence is needed for the OPTIONAL/!bound idiom (which for betteror worse > is being used, by users (inc the W3C tech reports query!)). This is an example of the OPTIONAL/!bound idiom, right? (Find maximum asserted price of each book): SELECT ?book ?price { ?book :price ?price . OPTIONAL { ?book :price ?price2 . FILTER (?price2 > ?price) } FILTER (!bound(?price2)) } Why is order dependence needed here? If FILTERs are scoped to the siblings of whatever group the FILTERs appear in, wouldn't this order-rearraned query yield the same results? SELECT ?book ?price { FILTER(!bound(?price2) . ?book :price ?price . OPTIONAL { ?book :price ?price2 . FILTER (?price2 > ?price) } } Maybe I don't understand what you meant here... > ARQ executes by splitting the BGP into blocks of triples and > filters. It does > not reflow the BGP to put the filter in the right place. It dispatches > blocks-of-triples to the storage layer (which works for a remote DIGreasoner > as well as plain storage or a rules engine) > > == Implementation feedback: these systems do not support FILTER-like > functionality so putting FILTERs too closely with BGPs may be a > restriction to > implementability. What are "these systems"? The storage layers that ARQ uses? I think the design should be oriented around what we expect the results of queries using FILTERs in various locations to be, rather than ease of implementation. Or maybe again I'm missing your point (probably :-). > == Proposal: FILTERs trail the BGP conceptually. i.e. they are executed to > minimise the unbound variables and hence maximise the non-error > evaluation and > maximise the solutions that can be found. Is this a proposal for what the scope of FILTER should be? I must say that I don't understand this proposal. > Reflowing the query is actually quite trivial to do as I've done it in other > places before. But currently, ARQ executes the query as it is > syntactically, > so if the FILTER is too early, it will cause an error and so the solution is > rejected (the engine is constructive - it starts with a single solution of no > variables and adds bindings as needed to satisfy the patterns). > > That's what happens in query 3: > > arq.qparse --plan ...Q3... prints the execution plan: > > [BasicGraphPattern > [Constraint ( ?t = :s2 )] > [Triples > ?s :p1 :o1 > ?t :p2 :o2 > ]] > > > Read that as a functional evaluation. The constraint is performed, then the > pair of triples are matched as unit. > > I have not rushed to change this to avoid having to change things > twice. I do > not want to prejudge where the BGP decision goes and I don't want to make a > change that is visible to an application queries then make a second change > that alters applications again. Making such changes has a large cost to > application writers (and we have asked for implementations so I feel an > obligation there - it is reported publicly in the W3C newswire of July 6). It > also impacts my support load. Putting FILTERs in the right place (not just > last) is a very large performance improvements (x1000+ is quite possible). Yes, I agree with all this. Didn't mean to conflate ARQ's behavior with spec design decisions. It's hard to know for any given SPARQL implementation which parts of the spec. the implementation is not in step with or chooses its own interpretation due to the spec. being underspecified. I guess my upshot here is that ARQ and other engines' behavior completely aside, I think we need to revisit the FILTER specification and make sure that the design clearly specifies the scope of FILTER; i.e., what solutions a FILTER can eliminate. I think it's clear to me that attaching the scope of a FILTER to a FilteredBGP (i.e., associating the conceptual scope of a FILTER with its grammatical socpe) does NOT produce results that people expect. Lee > Andy > > > > > So my first questions are: is this behavior what people expect from > > SPARQL? is this behavior somehow explained by the current spec text? > > > >>From the SPARQL grammar's point of view, I believe the first of these is: > > FilteredBasicGraphPattern(BlockOfTriples(...), Constraint(...), > > FilteredBasicGraphPattern()) > > > > while the second is > > FilteredBasicGraphPattern(Constraint(...), > > FilteredBasicGraphPattern(BlockOfTriples(...))) > > > > I would think in both cases that the two triples occur in the same BGP, > > and so the solution {?s/:s1, ?t/:s2} should be FILTERed, and not > > eliminated. The observed behavior of SPARQLer in query #3 could only occur > > (I think) if the FILTER is applied individually to the solutions from each > > triple pattern before those solutions are joined together. > > > > Things get more confusing (for me, at least). Consider: > > > > http://www.w3.org/2001/sw/DataAccess/tests/#dawg-bound-query-001 > > > > Data: > > > > @prefix : <http://example.org/ns#> . > > :a1 :b :c1 . > > :c1 :d :e . > > :a2 :b :c2 . > > :c2 :b :f . > > > > Query (query #4): > > > > PREFIX : <http://example.org/ns#> > > SELECT ?a ?c > > WHERE { > > ?a :b ?c . > > OPTIONAL { ?c :d ?e } . > > FILTER (! bound(?e)) > > } > > > > Expected results: > > > > ?c ?a > > -- -- > > <http://example.org/ns#c2> <http://example.org/ns#a2> > > <http://example.org/ns#f> <http://example.org/ns#c2> > > > > > > These results rely on the three solutions of the OPTIONAL binary-operator > > being FILTERED by the filter. But the OPTIONAL does not appear in the same > > FilteredBasicGraphPattern as the FILTER constraint (of course, that's not > > physically possible). The parse tree for this is something akin to: > > > > Group( Optional(BGP(?a :b ?c), BGP(?c :d ?e)), FilteredBGP(BGP(), > > Constraint(!bound(?e))) ) > > > > It would require a reading of the spec that takes filters that appear as > > siblings within a group and applies them to all other siblings in a group > > to get the behavior expected by this test (SPARQLer conforms to the > > behavior expected by this test). But I have a very hard time seeing how > > any explanation for the behavior expected by this test can also have the > > diverging behavior that I see from SPARQLer in query #2 and query #3. > > > > ... > > > > I could go on (e.g., everyone probably agrees that FILTERs don't affect > > solutions in other branches of a UNION), but I don't think I'll enlighten > > myself any further at this point and it would probably just make this > > message more rambling, so I'll stop for now and hope to be corrected as to > > what I'm missing or at least pointed to discussions of this which I'm > > pretty sure have occurred before. > > > > > > thanks, > > Lee > > > > >
Received on Wednesday, 13 September 2006 17:43:02 UTC