Re: Proposal for simplifying FILTER semantics from Markus Krötzsch on 2011-05-02 (public-rdf-dawg-comments@w3.org from May 2011)

From: Markus Krötzsch <markus.kroetzsch@comlab.ox.ac.uk>
Date: Mon, 02 May 2011 17:56:02 +0100
To: public-rdf-dawg-comments@w3.org
Message-ID: <4DBEE222.9080400@comlab.ox.ac.uk>
P.S. To clarify my proposal (and to support my claim that it is an easy 
change), I have done the main formal modifications that the spec would 
require [1]. There are four changes:

Section 17.4:
* Filter is a unary function, working like BGP
   (results restricted to terms in active graph;
    could be relaxed to allow all bnodes)
* LeftJoin is a binary function, working like LeftJoin(*,*,true)

Section 17.2.3:
* Variablen in FILTER are always "visible"

Section 17.2.1:
* The translation of GroupGraphPatterns includes FILTER directly using 
Join (helper variable FS no longer needed), and no more case distinction 
happens for OPTIONAL.

So everything becomes somewhat shorter/simpler. I have not updated any 
informal parts (esp. the translation examples).

Markus

[1] http://korrekt.org/sparql-proposal/


On 01/05/11 20:29, Markus Krötzsch wrote:
> Dear WG,
>
> when working with SPARQL recently, I noticed that certain disjunctive
> queries are most cumbersome/inefficient to formulate due to the special
> post-processing semantics of FILTER expressions. I have written up a
> detailed explanation [1]. In a nutshell: it is *really* hard to combine
> FILTERs and BGPs in disjunctions.
>
> But the problem has a simple fix:
>
> * Define FILTER in such a way that it can *create* new solution
> mappings, just like BGP. A FILTER would create all variable bindings (to
> terms from the active graph) that make the filter condition true.
> * Instead of applying filters after matching, the generated solution
> mappings of a FILTER would directly be joined with other parts of the
> query.
>
> Putting it like this simplifies the whole algebra, both formally and
> conceptually. Moreover, I think that practical implementation are
> already working like that anyway (using FILTER conditions such as "=" to
> pre-generate results instead of waiting until the very end before
> "checking" them).
>
> The only negative effect that I see is that this would change the
> meaning of variables that occur in filters but in no BGP. Currently,
> such variables are considered "unbound". With the change, they would be
> instantiated to all terms that match. Experimenting with FILTER-only
> variables in some RDF stores, I merely got error messages (and rightly
> so, since a variable that can never be bound is of little use in a
> filter). So I assume that this is a corner case of little practical
> relevance.
>
> AFAICT, all other queries would give exactly the same results (joining
> having the same effect as filtering). So it seems that I am suggesting a
> largely formal algebra change, but one that would make hitherto useless
> queries very helpful (e.g. to solve the problem in [1]).
>
> I am aware that this proposal comes at a very late stage, but I think it
> is still feasible to do it. I could help with updating the formal parts
> of the algebra. In any case, I would like to hear the opinion of
> implementers/practitioners, also re [1]. Note that I am writing this
> largely as a user (and teacher) of SPARQL, so when I am investing my
> time here it is merely because I am convinced that it would greatly
> benefit the language.
>
> Cheers,
>
> Markus
>
>
> [1] http://korrekt.org/page/The_State_of_the_UNION
>


-- 
Dr. Markus Krötzsch
Oxford  University  Computing  Laboratory
Room 306, Parks Road, Oxford, OX1 3QD, UK
+44 (0)1865 283529    http://korrekt.org/
Received on Monday, 2 May 2011 16:56:24 UTC