ISSUE-68: Simpler definition of pre-binding (was: fundamental problems with SHACL) from Holger Knublauch on 2016-04-11 (public-data-shapes-wg@w3.org from April 2016)

From: Holger Knublauch <holger@topquadrant.com>
Date: Mon, 11 Apr 2016 10:18:23 +1000
To: public-data-shapes-wg@w3.org
Message-ID: <570AED4F.205@topquadrant.com>
(Moved back into an ISSUE-68 thread)

On 9/04/2016 0:11, Peter F. Patel-Schneider wrote:
>
>>> I had thought that pre-binding was the easy one.  To do pre-binding you
>>> first need to extend SPARQL so that blank nodes can be used in SPARQL
>>> queries, i.e., that if you have access to an RDF graph you can extract
>>> identifiers from that graph and use these identifiers in a SPARQL query just
>>> as if they were IRIs.  Then pre-binding just augments the (outer) SPARQL
>>> query with a VALUES construct that binds variables to values.
>>>
>>> However, apparently this is not the case, as the current document makes
>>> pre-binding out to be something quite different.  I do not have the
>>> expertise to fix all the problems with the treatment of pre-binding in the
>>> current document but I have pointed out a number of problems in it.
>> This is ISSUE-68. I tried various ways of responding to your concerns, but you
>> were not happy with either. And I agree this is work in progress. I would like
>> to be able to finish this once and for all, but always other things pop up in
>> between. You are raising many other ISSUEs including a full-blown counter
>> proposal that would replace basically everything, and at the same time put
>> pressure on me to not do my homework. It shouldn't come as a surprise that I
>> never have time if I am forced to spend my time responding to all your other
>> issues. Meanwhile, nobody else in the group steps up to this task either. The
>> last time I looked into pre-binding a few weeks ago, I was experimenting with
>> the syntax transform package in Jena. I found a bug that had to be fixed
>> first, halting my progress:
>>
>> https://github.com/apache/jena/commit/bc5ace0e9460ae979079532f610a88b6363e96e5
>>
>> I then went on vacation and had plenty of other TopQuadrant work on my plate.
>> I will try to get back to this topic soon.
>>
>> At the same time I still do not understand your problem with the semantics of
>> pre-binding. Simply using VALUES is not going to work, because we need to be
>> able to walk into nested scopes and even nested SELECT queries. I had
>> explained this before. Not sure why you keep repeating the same issue.
> Pre-binding is currently defined in a very complex manner.
>
> There is an initial substitution into SPARQL code.  This substitution changes
> the behaviour of the SPARQL code in many different ways.  First there is the
> change that would occur if the affected variable had a top-level binding.
> However, there are other changes.   Distinct variables with the same name in
> sub-queries are also changed.  This changes the meaning of sub-queries in a
> way different than that of a top-level binding.  Second, the substitution
> makes certain bits of previously-valid syntax invalid, including bindings,
> GRAPH constructs, the bound function, GROUP BY constructs, and ORDER BY
> constructs.  Each of these have to be fixed up by a set of compensating code
> transformations.   There is no certainty that there are not other
> compensations that need to be made to handle invalid syntax caused by the
> substitution.  I can easily think of several - simple variables in select
> clauses, variables in group conditions, variables in bindings, and variables
> in data blocks.  There could easily be others.  There is also no certainty
> that the initial substitution does not change the meaning of SPARQL code.  I
> pointed out above that it does change the meaning of subqueries but there
> could easily be other changes.
>
> Blank nodes then add another complication.  The current document does not give
> an actual method for handling pre-bound blank nodes.  The document suggests
> that using an algebra approach would work and so would a substitution
> approach.  However, there are no details of how to do either and no
> specification of what either should actually do.

Ok, I have switched to a minimal yet precise definition of pre-binding now:

             <p>
                 <span class="term">pre-binding</span> a variable with a 
value means that, prior to evaluating a query,
                 the SHACL processor needs to substitute all occurrences 
of the variable in the query (including
                 inner scopes and nested SELECT queries) with the 
provided value.
                 In other words, whenever a SPARQL processor evaluates a 
pre-bound variable, it must use the given value.
             </p>

This avoids talking about implementation details. Informally, I could 
add that possible implementation strategies are
- use of VALUES (in simple cases)
- Algebra manipulation (as done by Jena setInitialBindings)
- internal Syntax tree manipulation (as done by Jena syntaxtransforms)
- run-time variable substitution (as done by Sesame)

This definition eliminates the bnode issues and other problems that you 
have mentioned. I believe it is sufficiently precise to explain the 
meaning to users and guide implementers, without over-complicating it.

What else is missing?

Holger
Received on Monday, 11 April 2016 00:18:55 UTC