Proposal for simplifying FILTER semantics

Dear WG,

when working with SPARQL recently, I noticed that certain disjunctive 
queries are most cumbersome/inefficient to formulate due to the special 
post-processing semantics of FILTER expressions. I have written up a 
detailed explanation [1]. In a nutshell: it is *really* hard to combine 
FILTERs and BGPs in disjunctions.

But the problem has a simple fix:

* Define FILTER in such a way that it can *create* new solution 
mappings, just like BGP. A FILTER would create all variable bindings (to 
terms from the active graph) that make the filter condition true.
* Instead of applying filters after matching, the generated solution 
mappings of a FILTER would directly be joined with other parts of the query.

Putting it like this simplifies the whole algebra, both formally and 
conceptually. Moreover, I think that practical implementation are 
already working like that anyway (using FILTER conditions such as "=" to 
pre-generate results instead of waiting until the very end before 
"checking" them).

The only negative effect that I see is that this would change the 
meaning of variables that occur in filters but in no BGP. Currently, 
such variables are considered "unbound". With the change, they would be 
instantiated to all terms that match. Experimenting with FILTER-only 
variables in some RDF stores, I merely got error messages (and rightly 
so, since a variable that can never be bound is of little use in a 
filter). So I assume that this is a corner case of little practical 
relevance.

AFAICT, all other queries would give exactly the same results (joining 
having the same effect as filtering). So it seems that I am suggesting a 
largely formal algebra change, but one that would make hitherto useless 
queries very helpful (e.g. to solve the problem in [1]).

I am aware that this proposal comes at a very late stage, but I think it 
is still feasible to do it. I could help with updating the formal parts 
of the algebra. In any case, I would like to hear the opinion of 
implementers/practitioners, also re [1]. Note that I am writing this 
largely as a user (and teacher) of SPARQL, so when I am investing my 
time here it is merely because I am convinced that it would greatly 
benefit the language.

Cheers,

Markus


[1] http://korrekt.org/page/The_State_of_the_UNION

-- 
Dr. Markus Krötzsch
Oxford  University  Computing  Laboratory
Room 306, Parks Road, Oxford, OX1 3QD, UK
+44 (0)1865 283529    http://korrekt.org/

Received on Sunday, 1 May 2011 19:30:06 UTC