Re: order of FILTERs in a BGP? scope of FILTERs? from Seaborne, Andy on 2006-09-15 (public-rdf-dawg@w3.org from July to September 2006)

From: Seaborne, Andy <andy.seaborne@hp.com>
Date: Fri, 15 Sep 2006 09:22:57 +0100
To: Lee Feigenbaum <feigenbl@us.ibm.com>
CC: RDF Data Access Working Group <public-rdf-dawg@w3.org>
Message-ID: <450A62E1.3030009@hp.com>
[This is a resend - after 16+ hours, it hasn't appeared in the archive]

Lee Feigenbaum wrote:
 > > "Seaborne, Andy" <andy.seaborne@hp.com> wrote on 09/07/2006 05:43:17 AM:
 >> >> Lee Feigenbaum wrote:
 >>> >>> It does not explicitly state what solutions a FILTER is tested
 > > against, so
 >>> >>> I see two possibilities:
 >>> >>>
 >>> >>> 1) a FILTER operates globally, regardless of where it occurs in a
 > > query
 >>> >>> (across UNIONs, GRAPHs, groups, OPTIONALs, BGPs, etc.)
 >>> >>> 2) a FILTER is limited to restricting the solutions of the
 >>> >>> FilteredBasicGraphPattern in which it occurs
 >>> >>>
 >>> >>> I suspect #2 is the intended meaning, but because it is tied so
 > > closely to
 >>> >>> the grammar and because this part of the grammar is defined
 > > recursively (
 >>> >>> FilteredBasicGraphPattern       ::=     BlockOfTriples? ( Constraint
 > > '.'?
 >>> >>> FilteredBasicGraphPattern  )? ), this seems to make the behavior of
 >>> >>> FILTERs order-dependent.
 >> >> Great example - could these be submitted as test cases, please?
 > >
 > > I'll try to get a few of these checked in as test cases in the next week
 > > or so.
 > >
 >> >> #2 is closest to intended meaning - when the matching was by
 >> >> subgraph, FILTERs
 >> >> were part of the group, and as the whole group is a conjunction, they
 > > weren't
 >> >> attached to a specific BasicGraphPattern (which may be empty in which
 > > case it
 >> >> is a condition of the group again).
 > >
 > > Right, this has always seemed to be the case to me -- a FILTER affects the
 > > solutions generated from the conjunctive group in which it is a part. But
 > > the spec doesn't say anything like this.
 > >
 >> >> Now, however, they are.  The sec11 text is written in the sense of each
 >> >> solution in the whatever-the-context-is is tried against the filter to
 > > see if
 >> >> it is acceptable or not.
 >> >>
 >> >> Eric?
 > >
 > > Again, right. Section 11 is silent as to the context where FILTERs are
 > > evaluated, and nowhere else in the spec makes this clear. is there a group
 > > consensus that the intention is for FILTERs to be scoped to groups (curly
 > > braces)? I'll try to highlight what I think this means in the test cases I
 > > submit.

I think the text should go in rq24/4.7 - it might also be mentioned in sec11.

     ... snip ...

 >> >>
 >> >> As the BGP defn changes, we'll see where filters fit in.  If they
 >> >> are not part
 >> >> of the BGP, then reflowing would happen if the GROUP allowed it but
 >> >> some order
 >> >> dependence is needed for the OPTIONAL/!bound idiom (which for betteror
 > > worse
 >> >> is being used, by users (inc the W3C tech reports query!)).
 > >
 > > This is an example of the OPTIONAL/!bound idiom, right? (Find maximum
 > > asserted price of each book):
 > >
 > > SELECT ?book ?price {
 > >   ?book :price ?price .
 > >   OPTIONAL {
 > >     ?book :price ?price2 .
 > >     FILTER (?price2 > ?price)
 > >   }
 > >   FILTER (!bound(?price2))
 > > }
 > >
 > > Why is order dependence needed here? If FILTERs are scoped to the siblings
 > > of whatever group the FILTERs appear in, wouldn't this order-rearraned
 > > query yield the same results?
 > >
 > > SELECT ?book ?price {
 > >   FILTER(!bound(?price2) .
 > >   ?book :price ?price .
 > >   OPTIONAL {
 > >     ?book :price ?price2 .
 > >     FILTER (?price2 > ?price)
 > >   }
 > > }

I can see what you mean by siblings - it's the reading of the spec consistent
with the first LC where FILTERs are part of GROUPs because conjunction in
groups and conjunction in BGPs came to the same thing.  Ditto the way OPTIONAL
worked - adding into the group.

In this case, the !bound FILTER is now in the LHS of the optional.

The first is:

(query
    (prefix : <http:/example/>)
    (select ?book ?price)
    (group
      (optional
        (basicgraphpattern
          (triplepattern ?book :price ?price))
        (group
          (basicgraphpattern
            (triplepattern ?book :price ?price2)
            (filter (> ?price2 ?price)))))
      (basicgraphpattern
        (filter (! (bound ?price2)))))
)

The second is:

(query
    (prefix : <http:/example/>)
    (select ?book ?price)
    (group
      (optional
        (basicgraphpattern
          (filter (! (bound ?price2)))
          (triplepattern ?book :price ?price))
        (group
          (basicgraphpattern
            (triplepattern ?book :price ?price2)
            (filter (> ?price2 ?price))))))
)


 > >
 > > Maybe I don't understand what you meant here...
 > >
 >> >> ARQ executes by splitting the BGP into blocks of triples and
 >> >> filters.  It does
 >> >> not reflow the BGP to put the filter in the right place.  It dispatches
 >> >> blocks-of-triples to the storage layer (which works for a remote
 > > DIGreasoner
 >> >> as well as plain storage or a rules engine)
 >> >>
 >> >> == Implementation feedback: these systems do not support FILTER-like
 >> >> functionality so putting FILTERs too closely with BGPs may be a
 >> >> restriction to
 >> >> implementability.
 > >
 > > What are "these systems"? The storage layers that ARQ uses? I think the
 > > design should be oriented around what we expect the results of queries
 > > using FILTERs in various locations to be, rather than ease of
 > > implementation. Or maybe again I'm missing your point (probably  :-) .

Not the storage layers ARQ might use necessarily but things like DL-reasoners
or rule engines.  For pure storage, there isn't an issue as to where the
implementation does the filtering (or, in some cases, split between storage
and the general engine).  But some data sources don't speak FILTERs where it
might matter.  For example, DIG can't express FILTERs (true in 1.1 - might be
changed in DIG 2.0 and I missed it).  I agree with you that the design should
be oriented around what the results of queries should be; it is informing that
some systems provide a single block of triple pattern interface.

 > >
 >> >> == Proposal: FILTERs trail the BGP conceptually.  i.e. they are executed
 > > to
 >> >> minimise the unbound variables and hence maximise the non-error
 >> >> evaluation and
 >> >> maximise the solutions that can be found.
 > >
 > > Is this a proposal for what the scope of FILTER should be? I must say that
 > > I don't understand this proposal.

If FILTER is in a BGP, or GROUP, then the correct results will be obtained if
all the matching is done, then any filters applied (c.f SQL).  That might give
us the right model.

However, I wouldn't want the syntax to purely enforce that - in RDQL, having
all constraints fixed after the pattern lead to some queries to become
confusing.  Alberto's variant had in place regex (the object of a triple
pattern could directly be a regex).  Allowing the filter to be written close
to variables it is restricting is clearer when the triple pattern block is
long (and, sometimes, they get really quite long).

	Andy

 > >
 >> >> Reflowing the query is actually quite trivial to do as I've done it in
 > > other
 >> >> places before.   But currently, ARQ executes the query as it is
 >> >> syntactically,
 >> >> so if the FILTER is too early, it will cause an error and so the
 > > solution is
 >> >> rejected (the engine is constructive - it starts with a single solution
 > > of no
 >> >> variables and adds bindings as needed to satisfy the patterns).
 >> >>
 >> >> That's what happens in query 3:
 >> >>
 >> >> arq.qparse --plan ...Q3... prints the execution plan:
 >> >>
 >> >> [BasicGraphPattern
 >> >>    [Constraint ( ?t = :s2 )]
 >> >>    [Triples
 >> >>      ?s :p1 :o1
 >> >>      ?t :p2 :o2
 >> >>    ]]
 >> >>
 >> >>
 >> >> Read that as a functional evaluation.  The constraint is performed, then
 > > the
 >> >> pair of triples are matched as unit.
 >> >>
 >> >> I have not rushed to change this to avoid having to change things
 >> >> twice.  I do
 >> >> not want to prejudge where the BGP decision goes and I don't want to
 > > make a
 >> >> change that is visible to an application queries then make a second
 > > change
 >> >> that alters applications again.  Making such changes has a large cost to
 > >
 >> >> application writers (and we have asked for implementations so I feel an
 >> >> obligation there - it is reported publicly in the W3C newswire of July
 > > 6). It
 >> >> also impacts my support load.  Putting FILTERs in the right place (not
 > > just
 >> >> last) is a very large performance improvements (x1000+ is quite
 > > possible).
 > >
 > > Yes, I agree with all this. Didn't mean to conflate ARQ's behavior with
 > > spec design decisions. It's hard to know for any given SPARQL
 > > implementation which parts of the spec. the implementation is not in step
 > > with or chooses its own interpretation due to the spec. being
 > > underspecified.
 > >
 > > I guess my upshot here is that ARQ and other engines' behavior completely
 > > aside, I think we need to revisit the FILTER specification and make sure
 > > that the design clearly specifies the scope of FILTER; i.e., what
 > > solutions a FILTER can eliminate.
 > >
 > > I think it's clear to me that attaching the scope of a FILTER to a
 > > FilteredBGP (i.e., associating the conceptual scope of a FILTER with its
 > > grammatical socpe) does NOT produce results that people expect.
 > >
 > > Lee
 > >
 > >
 >> >>    Andy
 >> >>
Received on Friday, 15 September 2006 08:23:14 UTC