Re: ISSUE-68: Simpler definition of pre-binding from Peter F. Patel-Schneider on 2016-04-18 (public-data-shapes-wg@w3.org from April 2016)

From: Peter F. Patel-Schneider <pfpschneider@gmail.com>
Date: Sun, 17 Apr 2016 19:27:47 -0700
To: Holger Knublauch <holger@topquadrant.com>, public-data-shapes-wg@w3.org
Message-ID: <57144623.5040209@gmail.com>
There are several problems here.

1/ It is unclear what is meant by an occurrence of a variable.   Can there be
two different variables with the same name in a SPARQL query, as in
programming languages?

2/ This definition of pre-binding appears to be different from other
definitions of pre-binding and different from previous definitions of
pre-bindings in the SHACL document.  I found a few descriptions of
pre-binding.  The SPIN submission has one that is very different from this
description.  Jena appears to have query solution maps which appear to be very
different.

If SHACL is going to be using something that is different from the usual
meaning of pre-binding then it should not be calling it pre-binding.

3/ The discussion of pre-binding in 6.2.1 does not match the subsitution
description.

4/ Textual substitution before SPARQL execution is different from "whenever  a
SPARQL processor evaluates a pre-bound variable [...] it must use the given
value" because some variable mentions in SPARQL code do not evaluate the variable.

5/ Substitution will produce illegal SPARQL for all of the SPARQL definitions
of constraint components.

6/ When the substituted value is a blank node, it will not have the desired
meaning.

peter

PS:  By the way, in SPARQL the ? or $ is not part of the variable so it is not
quite correct to talk about variables that start with $.



On 04/10/2016 05:18 PM, Holger Knublauch wrote:
> (Moved back into an ISSUE-68 thread)
> 
> On 9/04/2016 0:11, Peter F. Patel-Schneider wrote:
>>
>>>> I had thought that pre-binding was the easy one.  To do pre-binding you
>>>> first need to extend SPARQL so that blank nodes can be used in SPARQL
>>>> queries, i.e., that if you have access to an RDF graph you can extract
>>>> identifiers from that graph and use these identifiers in a SPARQL query just
>>>> as if they were IRIs.  Then pre-binding just augments the (outer) SPARQL
>>>> query with a VALUES construct that binds variables to values.
>>>>
>>>> However, apparently this is not the case, as the current document makes
>>>> pre-binding out to be something quite different.  I do not have the
>>>> expertise to fix all the problems with the treatment of pre-binding in the
>>>> current document but I have pointed out a number of problems in it.
>>> This is ISSUE-68. I tried various ways of responding to your concerns, but you
>>> were not happy with either. And I agree this is work in progress. I would like
>>> to be able to finish this once and for all, but always other things pop up in
>>> between. You are raising many other ISSUEs including a full-blown counter
>>> proposal that would replace basically everything, and at the same time put
>>> pressure on me to not do my homework. It shouldn't come as a surprise that I
>>> never have time if I am forced to spend my time responding to all your other
>>> issues. Meanwhile, nobody else in the group steps up to this task either. The
>>> last time I looked into pre-binding a few weeks ago, I was experimenting with
>>> the syntax transform package in Jena. I found a bug that had to be fixed
>>> first, halting my progress:
>>>
>>> https://github.com/apache/jena/commit/bc5ace0e9460ae979079532f610a88b6363e96e5
>>>
>>> I then went on vacation and had plenty of other TopQuadrant work on my plate.
>>> I will try to get back to this topic soon.
>>>
>>> At the same time I still do not understand your problem with the semantics of
>>> pre-binding. Simply using VALUES is not going to work, because we need to be
>>> able to walk into nested scopes and even nested SELECT queries. I had
>>> explained this before. Not sure why you keep repeating the same issue.
>> Pre-binding is currently defined in a very complex manner.
>>
>> There is an initial substitution into SPARQL code.  This substitution changes
>> the behaviour of the SPARQL code in many different ways.  First there is the
>> change that would occur if the affected variable had a top-level binding.
>> However, there are other changes.   Distinct variables with the same name in
>> sub-queries are also changed.  This changes the meaning of sub-queries in a
>> way different than that of a top-level binding.  Second, the substitution
>> makes certain bits of previously-valid syntax invalid, including bindings,
>> GRAPH constructs, the bound function, GROUP BY constructs, and ORDER BY
>> constructs.  Each of these have to be fixed up by a set of compensating code
>> transformations.   There is no certainty that there are not other
>> compensations that need to be made to handle invalid syntax caused by the
>> substitution.  I can easily think of several - simple variables in select
>> clauses, variables in group conditions, variables in bindings, and variables
>> in data blocks.  There could easily be others.  There is also no certainty
>> that the initial substitution does not change the meaning of SPARQL code.  I
>> pointed out above that it does change the meaning of subqueries but there
>> could easily be other changes.
>>
>> Blank nodes then add another complication.  The current document does not give
>> an actual method for handling pre-bound blank nodes.  The document suggests
>> that using an algebra approach would work and so would a substitution
>> approach.  However, there are no details of how to do either and no
>> specification of what either should actually do.
> 
> Ok, I have switched to a minimal yet precise definition of pre-binding now:
> 
>             <p>
>                 <span class="term">pre-binding</span> a variable with a value
> means that, prior to evaluating a query,
>                 the SHACL processor needs to substitute all occurrences of the
> variable in the query (including
>                 inner scopes and nested SELECT queries) with the provided value.
>                 In other words, whenever a SPARQL processor evaluates a
> pre-bound variable, it must use the given value.
>             </p>
> 
> This avoids talking about implementation details. Informally, I could add that
> possible implementation strategies are
> - use of VALUES (in simple cases)
> - Algebra manipulation (as done by Jena setInitialBindings)
> - internal Syntax tree manipulation (as done by Jena syntaxtransforms)
> - run-time variable substitution (as done by Sesame)
> 
> This definition eliminates the bnode issues and other problems that you have
> mentioned. I believe it is sufficiently precise to explain the meaning to
> users and guide implementers, without over-complicating it.
> 
> What else is missing?
> 
> Holger
> 
>
Received on Monday, 18 April 2016 02:28:17 UTC