Re: Proposal for simplifying FILTER semantics

P.S. To clarify my proposal (and to support my claim that it is an easy 
change), I have done the main formal modifications that the spec would 
require [1]. There are four changes:

Section 17.4:
* Filter is a unary function, working like BGP
   (results restricted to terms in active graph;
    could be relaxed to allow all bnodes)
* LeftJoin is a binary function, working like LeftJoin(*,*,true)

Section 17.2.3:
* Variablen in FILTER are always "visible"

Section 17.2.1:
* The translation of GroupGraphPatterns includes FILTER directly using 
Join (helper variable FS no longer needed), and no more case distinction 
happens for OPTIONAL.

So everything becomes somewhat shorter/simpler. I have not updated any 
informal parts (esp. the translation examples).



On 01/05/11 20:29, Markus Krötzsch wrote:
> Dear WG,
> when working with SPARQL recently, I noticed that certain disjunctive
> queries are most cumbersome/inefficient to formulate due to the special
> post-processing semantics of FILTER expressions. I have written up a
> detailed explanation [1]. In a nutshell: it is *really* hard to combine
> FILTERs and BGPs in disjunctions.
> But the problem has a simple fix:
> * Define FILTER in such a way that it can *create* new solution
> mappings, just like BGP. A FILTER would create all variable bindings (to
> terms from the active graph) that make the filter condition true.
> * Instead of applying filters after matching, the generated solution
> mappings of a FILTER would directly be joined with other parts of the
> query.
> Putting it like this simplifies the whole algebra, both formally and
> conceptually. Moreover, I think that practical implementation are
> already working like that anyway (using FILTER conditions such as "=" to
> pre-generate results instead of waiting until the very end before
> "checking" them).
> The only negative effect that I see is that this would change the
> meaning of variables that occur in filters but in no BGP. Currently,
> such variables are considered "unbound". With the change, they would be
> instantiated to all terms that match. Experimenting with FILTER-only
> variables in some RDF stores, I merely got error messages (and rightly
> so, since a variable that can never be bound is of little use in a
> filter). So I assume that this is a corner case of little practical
> relevance.
> AFAICT, all other queries would give exactly the same results (joining
> having the same effect as filtering). So it seems that I am suggesting a
> largely formal algebra change, but one that would make hitherto useless
> queries very helpful (e.g. to solve the problem in [1]).
> I am aware that this proposal comes at a very late stage, but I think it
> is still feasible to do it. I could help with updating the formal parts
> of the algebra. In any case, I would like to hear the opinion of
> implementers/practitioners, also re [1]. Note that I am writing this
> largely as a user (and teacher) of SPARQL, so when I am investing my
> time here it is merely because I am convinced that it would greatly
> benefit the language.
> Cheers,
> Markus
> [1]

Dr. Markus Krötzsch
Oxford  University  Computing  Laboratory
Room 306, Parks Road, Oxford, OX1 3QD, UK
+44 (0)1865 283529

Received on Monday, 2 May 2011 16:56:24 UTC