Re: ISSUE-68 definition of pre-binding from Holger Knublauch on 2016-03-22 (public-data-shapes-wg@w3.org from March 2016)

From: Holger Knublauch <holger@topquadrant.com>
Date: Tue, 22 Mar 2016 10:23:07 +1000
To: RDF Data Shapes Working Group <public-data-shapes-wg@w3.org>
Message-ID: <56F0906B.7040102@topquadrant.com>
On 22/03/2016 10:01, Peter F. Patel-Schneider wrote:
> On 03/21/2016 04:21 PM, Holger Knublauch wrote:
>> On 22/03/2016 4:08, Peter F. Patel-Schneider wrote:
>>> The definition of pre-binding in the current editors' draft says
>>>
>>> pre-binding a variable with a value means that, prior to evaluating a query,
>>> the SHACL processor needs to substitute all occurrences of the variable in the
>>> query (including inner scopes and nested SELECT queries) with the provided
>>> value.
>>>
>>> This does not match my intuitions on how pre-binding should work.
>>>
>>> It may match what happens in practice, but I think that for this definition to
>>> be acceptable there will have to be a determination that most SPARQL
>>> implementations use this definition.
>> There are no existing implementations of SHACL, and while I agree it would be
>> ideal if we could simply hook into existing implementations of pre-binding, I
>> don't see how we can realistically make this a requirement.
> I disagree here.  If what the SHACL spec calls pre-binding is different from
> what SPARQL implementations do (and advertise) as pre-binding then we will be
> in another situation where the words in the SHACL spec don't mean what readers
> think they mean.  If SHACL is going to depend on something that it calls
> pre-binding and many SPARQL implementations define something that they call
> pre-binding then the two should match up.
>
>> Having said this, Jena has 3 different implementations of pre-binding:
>> 1) QueryExecution.setInitialBindings (used by SPIN)
>> 2) Parameterized SPARQL strings (text-based, does not support bnodes)
>> 3) Query syntax tree transform (this is closest to what SHACL would use)
>> AFAIK Sesame uses a similar technique to 3, i.e. it inserts variable values
>> into a parsed Query syntax tree. Then, whenever a variable is queried, the
>> system will check if that ?var has a pre-bound value already.
> Hmm.  Then it is probably better to not use the term pre-binding at all.  Call
> it something else, like substitution.

That's fine, I don't care about terminology here.

>
> It seems to me that substitution into the syntax tree is under-defined, just
> like the definition in Appendix C is under-defined.  There are two notions of
> variable identity possible here, one that is strictly based on names and one
> where there may be two variables with the same name.  If the SHACL document
> depends on variable identity then it will depend on which notion is being used.
>
> The current text has this ambiguity in it.  I would read the definition as
> saying that there are only two substitutions done in
>
>    SELECT ?a
>    WHERE { ?a ex:r ex: c .
>            { SELECT ?b
>              WHERE { ?b ex:r ?a .
>                      { SELECT ?b
>                        WHERE { ?b ex:q ?a } } } } }
>
> because the innermost ?a is a different variable.

I suggest we adopt a definition in which any variable with the same name 
is pre-bound/substituted, e.g. because we need nested SELECTs for 
minCount. As long as we make this clear, I see no issues.

>
>> Instead of relying on the textual syntax of SPARQL (which introduces problems
>> such as bnodes), could we describe the desired behavior in terms of how Sesame
>> does it? I.e. along the lines of "Given an internal representation of a SPARQL
>> query (such as Algebra or Query objects in Java) pre-binding has the effect
>> that the evaluation of a variable returns the constant of the pre-binding.".
> This still depends on variable identity.
>
>>> There is also no indication of when invalid pre-bindings are supposed to be
>>> reported or how.
>> The Appendix tried to enumerate those cases. I am not sure what else needs to
>> be said here. Do you want us to clarify what kind of error needs to be
>> reported and when? We don't do that in other places either.
> It matters here.
>
> Suppose I write
>
>    SELECT ?a
>    WHERE { { ?a ex:b ex:c }
>            FILTER ( true || EXISTS { GRAPH ?v { ?a ex:b ex:c } } ) }
>
> and pre-bind ?v with a blank node.  If the error is supposed to be caught at
> pre-binding time then there will be no solutions.  If the error is supposed to
> be caught at query execution time then there may well be solutions.

The error cases from the appendix can be caught at "compile" time, as 
they would lead to syntax errors. Handling them at execution time would 
indeed be a mistake.

Do you have suggestions on how to move forward? Given the above 
clarifications, shall I try to draft a definition based on internal 
representations of Queries instead of textual form? We could then invite 
external feedback together with the 3rd official editor's draft.

Thanks,
Holger

>
>>> The appendix on pre-binding should be sent by several SPARQL experts to see if
>>> they think that it is reasonable.
>> Yes, that should definitely be part of the review. I asked before whether W3C
>> has any process to gather such feedback.
> I believe that there is a formal way of requesting feedback from other W3C
> groups that are currently active.  However, I do not know whether there are
> any active groups that are appropriate to ask.
>
> There is a way to require feedback.  The working group can put explicit
> conditions on exit gates and requiring feedback could be one that this working
> group could use.
>
>> Holger
> peter
>
>
>
Received on Tuesday, 22 March 2016 00:23:42 UTC