BIND semantics from Holger Knublauch on 2012-08-12 (public-rdf-dawg-comments@w3.org from August 2012)

From: Holger Knublauch <holger@topquadrant.com>
Date: Mon, 13 Aug 2012 09:57:30 +1000
To: public-rdf-dawg-comments@w3.org
Message-ID: <502842EA.20309@topquadrant.com>
Dear WG,

as someone who has supported the inclusion of BIND into SPARQL 1.1, 
please allow me to provide some feedback. Overall we (TopQuadrant, Inc) 
are happy that BIND has been added, and we and our customers use it a lot.

However, I believe the semantics of BIND need some tweaking, because the 
current design is unnecessarily restrictive and counter intuitive. In 
terms of the textual syntax, the main problem that should be 
reconsidered is the fact that variables from preceding { ... } blocks 
are not visible in BIND statements, e.g.

     GRAPH <...> {
         ?x rdfs:label ?label .
     }
     BIND (my:function(?label) AS ?str) .

does not work as expected, because ?label is not bound to the value from 
the inner graph block. While the example above is artificial to 
illustrate the syntactic issue, we have many practical use cases where 
this is a real-world problem. Here are some simplified examples to 
illustrate the issues:


1) Redundant. It becomes hard to reuse BIND sequences:

     {
         ?x rdfs:label ?label .
     }
     UNION
     {
         ?x skos:prefLabel ?label .
     }
     BIND (my:stringOperation1(?label) AS ?str) .
     BIND (my:stringOperation2(?str) AS ?str2) .

is currently invalid and would need to be changed to

     {
         ?x rdfs:label ?label .
         BIND (my:stringOperation1(?label) AS ?str) .
         BIND (my:stringOperation2(?str) AS ?str2) .
     }
     UNION
     {
         ?x skos:prefLabel ?label .
         BIND (my:stringOperation1(?label) AS ?str) .
         BIND (my:stringOperation2(?str) AS ?str2) .
     }


2) Inefficient. We very often need to perform sequences of BINDs 
intermixed with FILTERs, e.g.

     BIND (ex:firstStep(?x) AS ?a) .
     FILTER bound(?a) .
     BIND (ex:secondStep(?a) AS ?b) .
     FILTER ?b > 10 .
     BIND (ex:thirdStep(?b) AS ?c) .

The problem with the above is that SPARQL engines may (or even should) 
move the FILTERs to the end, producing effectively

     BIND (ex:firstStep(?x) AS ?a) .
     BIND (ex:secondStep(?a) AS ?b) .
     BIND (ex:thirdStep(?b) AS ?c) .
     FILTER bound(?a) .
     FILTER ?b > 10 .

However, this is not desirable because the BIND operations may be 
complex operations by themselves and we certainly don't want them to 
execute unnecessarily, or even with very unexpected input values. So the 
trick we worked around this used to be to group FILTERs and BINDs 
tightly together, so that execution stops as early as possible, e.g.

     {
         {
             BIND (ex:firstStep(?x) AS ?a) .
             FILTER bound(?a) .
         }
         BIND (ex:secondStep(?a) AS ?b) .
         FILTER (?b > 10) .
     }
     BIND (ex:thirdStep(?b) AS ?c) .

The above pattern unfortunately doesn't work with the current SPARQL 1.1 
spec.


3) Non-intuitive and inconsistent. In general, I do like the mantra that 
SPARQL is executed from the inside out, so that in general variables 
bound in inner blocks can be used in surrounding blocks. This is how 
BGPs, FILTERs etc work. So why does BIND not follow the same principle? 
This is hard to explain to end users. I certainly don't understand the 
reasons for this inconsistency, and I don't think I am a SPARQL beginner.


Sorry to raise this problem so late in the process, but we have only 
become aware of the issue after a very recent bugfix in the SPARQL API 
that we are using, and the "bug" that was there before was masking the 
behavior and was just working fine for us. In fact we have successfully 
used the syntactic patterns from above for many years for as long as the 
"bug" was present in the API.

Regards,
Holger
Received on Sunday, 12 August 2012 23:58:12 UTC