order of FILTERs in a BGP? scope of FILTERs? from Lee Feigenbaum on 2006-08-28 (public-rdf-dawg@w3.org from July to September 2006)

From: Lee Feigenbaum <feigenbl@us.ibm.com>
Date: Mon, 28 Aug 2006 16:22:56 -0400
To: RDF Data Access Working Group <public-rdf-dawg@w3.org>
Message-ID: <OF8A8A0308.E3B8C932-ON852571D8.0069C577-852571D8.006FF43F@us.ibm.com>
Hi everyone,

I know that we've discussed this -- or similar -- before, but my searching 
hasn't found anything conclusive/authoritative. I've been working through 
some situations involving FILTERs in my implementation and 
cross-referencing with ARQ's behavior in SPARQLer (http://www.sparql.org) 
and also trying to work through the "proper" behavior according to the 
spec. 

... And I'm having a lot of difficulties :)

First, I find the spec (I'm working off of rq24 but don't believe this is 
any different for rq23 or the published CR) unclear as to what the scope 
of a FILTER is. The text in section 11 reads:

"""
Specifically, FILTERs eliminate any solutions that, when substituted into 
the expression, either result in an effective boolean value of false or 
produce an error.
"""

It does not explicitly state what solutions a FILTER is tested against, so 
I see two possibilities:

1) a FILTER operates globally, regardless of where it occurs in a query 
(across UNIONs, GRAPHs, groups, OPTIONALs, BGPs, etc.)
2) a FILTER is limited to restricting the solutions of the 
FilteredBasicGraphPattern in which it occurs

I suspect #2 is the intended meaning, but because it is tied so closely to 
the grammar and because this part of the grammar is defined recursively ( 
FilteredBasicGraphPattern       ::=     BlockOfTriples? ( Constraint '.'? 
FilteredBasicGraphPattern  )? ), this seems to make the behavior of 
FILTERs order-dependent. 

>From some experiments with SPARQLer, I'm still at a bit of a loss:

Data (available at: http://thefigtrees.net/lee/sw/data/g.n3 )

@prefix : <http://example.org/> .

:s1 :p1 :o1 .
:s2 :p2 :o2 .

And consider the query (query #1):

PREFIX : <http://example.org/>
SELECT *
FROM <http://thefigtrees.net/lee/sw/data/g.n3> 
{
  ?s :p1 :o1 .
  ?t :p2 :o2 .
}

which has solution:

?s      ?t
--      --
:s1     :s2

Now, we add a FILTER (query #2):

PREFIX : <http://example.org/>
SELECT *
FROM <http://thefigtrees.net/lee/sw/data/g.n3> 
{
  ?s :p1 :o1 .
  ?t :p2 :o2 .
  FILTER (?t = :s2) .
}

and we get the same single solution:

?s      ?t
--      --
:s1     :s2

Now, we change the position of the FILTER (query #3):

PREFIX : <http://example.org/>
SELECT *
FROM <http://thefigtrees.net/lee/sw/data/g.n3> 
{
  FILTER (?t = :s2) .
  ?s :p1 :o1 .
  ?t :p2 :o2 .
}

and now we get *zero* solutions.

So my first questions are: is this behavior what people expect from 
SPARQL? is this behavior somehow explained by the current spec text?

>From the SPARQL grammar's point of view, I believe the first of these is:
  FilteredBasicGraphPattern(BlockOfTriples(...), Constraint(...), 
FilteredBasicGraphPattern()) 

while the second is 
  FilteredBasicGraphPattern(Constraint(...), 
FilteredBasicGraphPattern(BlockOfTriples(...)))

I would think in both cases that the two triples occur in the same BGP, 
and so the solution {?s/:s1, ?t/:s2} should be FILTERed, and not 
eliminated. The observed behavior of SPARQLer in query #3 could only occur 
(I think) if the FILTER is applied individually to the solutions from each 
triple pattern before those solutions are joined together.

Things get more confusing (for me, at least). Consider:

  http://www.w3.org/2001/sw/DataAccess/tests/#dawg-bound-query-001

Data:

@prefix : <http://example.org/ns#> . 
:a1 :b :c1 . 
:c1 :d :e . 
:a2 :b :c2 . 
:c2 :b :f .

Query (query #4):

PREFIX : <http://example.org/ns#> 
SELECT ?a ?c
WHERE { 
  ?a :b ?c . 
  OPTIONAL { ?c :d ?e } . 
  FILTER (! bound(?e)) 
}

Expected results:

?c      ?a
--      --
<http://example.org/ns#c2>      <http://example.org/ns#a2>
<http://example.org/ns#f>       <http://example.org/ns#c2>


These results rely on the three solutions of the OPTIONAL binary-operator 
being FILTERED by the filter. But the OPTIONAL does not appear in the same 
FilteredBasicGraphPattern as the FILTER constraint (of course, that's not 
physically possible). The parse tree for this is something akin to:

Group( Optional(BGP(?a :b ?c), BGP(?c :d ?e)), FilteredBGP(BGP(), 
Constraint(!bound(?e))) )

It would require a reading of the spec that takes filters that appear as 
siblings within a group and applies them to all other siblings in a group 
to get the behavior expected by this test (SPARQLer conforms to the 
behavior expected by this test). But I have a very hard time seeing how 
any explanation for the behavior expected by this test can also have the 
diverging behavior that I see from SPARQLer in query #2 and query #3. 

...

I could go on (e.g., everyone probably agrees that FILTERs don't affect 
solutions in other branches of a UNION), but I don't think I'll enlighten 
myself any further at this point and it would probably just make this 
message more rambling, so I'll stop for now and hope to be corrected as to 
what I'm missing or at least pointed to discussions of this which I'm 
pretty sure have occurred before. 


thanks,
Lee
Received on Monday, 28 August 2006 20:23:08 UTC