Re: ISSUE-68: Simpler definition of pre-binding from Peter F. Patel-Schneider on 2016-04-19 (public-data-shapes-wg@w3.org from April 2016)

From: Peter F. Patel-Schneider <pfpschneider@gmail.com>
Date: Mon, 18 Apr 2016 19:27:45 -0700
To: Holger Knublauch <holger@topquadrant.com>, public-data-shapes-wg@w3.org
Message-ID: <571597A1.4060308@gmail.com>
On 04/18/2016 06:06 PM, Holger Knublauch wrote:
> On 19/04/2016 1:39, Peter F. Patel-Schneider wrote:
>> I see several problems with this wording:
>>
>> 1/ SPARQL only performs evaluation in certain situations.  For example, in
>>
>> SELECT $this WHERE { $this :p ?that }
>>
>> neither this nor that are evaluated at any time.
> 
> I believe both will get evaluated. The query will only return a row if there
> is at least one match for the basic graph pattern ?this :p ?that. Evaluating
> the BGP will evaluate the variables.

As far as can tell SPARQL does not evaluate variables in basic graph patterns.
 Consider the example from 18.2.3

Example: Pattern involving BIND:
{ ?s :p ?v . BIND (2*?v AS ?v2) ?s :p1 ?v2 }
Join(
   Extend( BGP(?s :p ?v), ?v2, 2*?v) ,
   BGP(?s :p1 ?v2) )

Note that variables are passed into the BGP expressions without any evaluation
happening.

>> Some other wording is needed.
>>
>> 2/ This description appears to be written to that the $this and $predicate in
>> the subquery are affected even though they are effectively different
>> variables from the ?this and ?predicate in the main query.
>>
>> SELECT $this ($this AS ?subject) $predicate
>> WHERE {
>>     {
>>         SELECT (COUNT(?value) AS ?count)
>>         WHERE {
>>             $this $predicate ?value .
>>         }
>>     }
>>     FILTER (?count < $minCount)
>> }
>>
>> Is this what pre-binding in SPARQL is supposed to do?  If not, some other term
>> should
>> be used.
> 
> Yes, this is what pre-binding in SPARQL is supposed to do. (And I believe we
> have talked about this many times now, and there is even an explicit sentence
> about it).

The initial wording stated, I think, that pre-binding was like using BIND (or
maybe VALUES).  This is very different.  It would be nice to have some
indication that this is indeed what is done in SPARQL implementations.

>> Can SPARQL implementations do this at all in an interoperable fashion?
> 
> Yes, implementations for this exist.

>> 3/ What sorts of values are allowed in pre-binding?
> 
> Any RDF node. Since I don't exclude any node kinds, this is hopefully clear.

But only RDF terms?  I don't see any wording to this effect but maybe there is
no way to get anything except an RDF term into here.


>> 4/ When do "SHACL processors" evaluate occurrences of variables?
> 
> Changed to "SPARQL processors". That was a typo.
> 
>>
>> 5/ How does this work with blank nodes?
> 
> Bnodes are nodes like any other here. The "substitution" does not go through
> the SPARQL syntax, so it can directly access the node object (e.g. Node
> instance in Jena)

OK

> Latest version online:
> 
> https://github.com/w3c/data-shapes/commit/1457ae924171fae7536102bbcabddfc4f9509d9f
> 
> 
> Holger
> 
> 
>>
>>
>>
>> I'm having a hard time finding descriptions of pre-binding for SPARQL
>> implementations.  Are there any decent descriptions available?
>>
>>
>> peter
>>
>>
>>
>>
>>
>>
>>
>> On 04/18/2016 06:49 AM, Holger Knublauch wrote:
>>> Oops, yes. I should have taken out the "prior...". Let me try again:
>>>
>>> <span class="term">Pre-binding</span> a variable with a value means that
>>> the SHACL processor needs to evaluate all
>>> occurrences of variables with that same name
>>> (including occurrences in inner scopes and nested SELECT queries)
>>> so that they have the provided value.
>>> In other words, whenever a SPARQL processor evaluates a pre-bound
>>> variable, it must use the given value.
>>>
>>> I don't see why the term "evaluation time" would be unclear. A SPARQL engine
>>> evaluates a query and this happens during a process that takes time.
>>>
>>> I replaced the term "substitution", so that people don't assume query text
>>> replacement.
>>>
>>> Does the second sentence "In other words..." help or shall I delete that?
>>>
>>> Thanks,
>>> Holger
>>>
>>>
>>> On 18/04/2016 21:57, Peter F. Patel-Schneider wrote:
>>>> I don't see how this wording, which appears to be
>>>>
>>>> <span class="term">pre-binding</span> a variable with a value means that,
>>>> prior to evaluating a query, the SHACL processor needs to substitute all
>>>> occurrences of variables with the same name at evaluation time (including
>>>> inner scopes and nested SELECT queries) with the provided value.  In other
>>>> words, whenever a SPARQL processor evaluates a pre-bound variable, it must
>>>> use the given value.
>>>>
>>>> can be considered to be coherent.
>>>>
>>>>
>>>> What is "evaluation time"?  It is not defined anywhere.
>>>>
>>>> How  can "prior to evaluating a query" something happen "at evaluation time"?
>>>>
>>>> How can substitution happen at evaluation time at all?
>>>>
>>>>
>>>> peter
>>>>
>>>>
>>>>
>>>>
>>>> On 04/17/2016 09:39 PM, Holger Knublauch wrote:
>>>>> Updated definition here:
>>>>>
>>>>> https://github.com/w3c/data-shapes/commit/3ec678b057a50e1911e9ac93b77df394bf1e45ef
>>>>>
>>>>>
>>>>>
>>>>> Main paragraph is now:
>>>>>
>>>>> pre-binding a variable with a value means that, prior to evaluating a query,
>>>>> the SHACL processor needs to substitute all occurrences of variables with
>>>>> the
>>>>> same name at evaluation time (including inner scopes and nested SELECT
>>>>> queries) with the provided value. In other words, whenever a SPARQL
>>>>> processor
>>>>> evaluates a pre-bound variable, it must use the given value.
>>>>>
>>>>> On 18/04/2016 12:27, Peter F. Patel-Schneider wrote:
>>>>>> There are several problems here.
>>>>>>
>>>>>> 1/ It is unclear what is meant by an occurrence of a variable.   Can
>>>>>> there be
>>>>>> two different variables with the same name in a SPARQL query, as in
>>>>>> programming languages?
>>>>> I have changed the prose to clarify that we mean variables with the same
>>>>> name
>>>>> (including those from nested SELECTs).
>>>>>
>>>>>> 2/ This definition of pre-binding appears to be different from other
>>>>>> definitions of pre-binding and different from previous definitions of
>>>>>> pre-bindings in the SHACL document.  I found a few descriptions of
>>>>>> pre-binding.  The SPIN submission has one that is very different from this
>>>>>> description.  Jena appears to have query solution maps which appear to be
>>>>>> very
>>>>>> different.
>>>>>>
>>>>>> If SHACL is going to be using something that is different from the usual
>>>>>> meaning of pre-binding then it should not be calling it pre-binding.
>>>>> There is no established definition of this term anywhere. No other W3C spec
>>>>> uses it. I believe we are permitted to define it, and our definition is
>>>>> local
>>>>> to our document anyway. I also believe most terms will already have a usage
>>>>> somewhere else, so we may always conflict. Could you propose a
>>>>> non-conflicting
>>>>> term?
>>>>>
>>>>>> 3/ The discussion of pre-binding in 6.2.1 does not match the subsitution
>>>>>> description.
>>>>> I have deleted the offending sentence and left only the reference to the
>>>>> appendix.
>>>>>
>>>>>> 4/ Textual substitution before SPARQL execution is different from
>>>>>> "whenever  a
>>>>>> SPARQL processor evaluates a pre-bound variable [...] it must use the given
>>>>>> value" because some variable mentions in SPARQL code do not evaluate the
>>>>>> variable.
>>>>> Not sure what you mean here. The spec is no longer referencing textual
>>>>> substitution. So has this gone away?
>>>>>
>>>>>> 5/ Substitution will produce illegal SPARQL for all of the SPARQL
>>>>>> definitions
>>>>>> of constraint components.
>>>>> This is not relevant because the spec does not produce new SPARQL. It
>>>>> operates
>>>>> "at evaluation time", and I have clarified this in the wording.
>>>>>
>>>>>> 6/ When the substituted value is a blank node, it will not have the desired
>>>>>> meaning.
>>>>> Why not?
>>>>>
>>>>>
>>>>>> peter
>>>>>>
>>>>>> PS:  By the way, in SPARQL the ? or $ is not part of the variable so it is
>>>>>> not
>>>>>> quite correct to talk about variables that start with $.
>>>>> Fixed.
>>>>>
>>>>> Thanks
>>>>> Holger
>>>>>
>>>>>
>>>>>>
>>>>>> On 04/10/2016 05:18 PM, Holger Knublauch wrote:
>>>>>>> (Moved back into an ISSUE-68 thread)
>>>>>>>
>>>>>>> On 9/04/2016 0:11, Peter F. Patel-Schneider wrote:
>>>>>>>>>> I had thought that pre-binding was the easy one.  To do pre-binding you
>>>>>>>>>> first need to extend SPARQL so that blank nodes can be used in SPARQL
>>>>>>>>>> queries, i.e., that if you have access to an RDF graph you can extract
>>>>>>>>>> identifiers from that graph and use these identifiers in a SPARQL
>>>>>>>>>> query just
>>>>>>>>>> as if they were IRIs.  Then pre-binding just augments the (outer)
>>>>>>>>>> SPARQL
>>>>>>>>>> query with a VALUES construct that binds variables to values.
>>>>>>>>>>
>>>>>>>>>> However, apparently this is not the case, as the current document makes
>>>>>>>>>> pre-binding out to be something quite different.  I do not have the
>>>>>>>>>> expertise to fix all the problems with the treatment of pre-binding in
>>>>>>>>>> the
>>>>>>>>>> current document but I have pointed out a number of problems in it.
>>>>>>>>> This is ISSUE-68. I tried various ways of responding to your concerns,
>>>>>>>>> but you
>>>>>>>>> were not happy with either. And I agree this is work in progress. I
>>>>>>>>> would like
>>>>>>>>> to be able to finish this once and for all, but always other things pop
>>>>>>>>> up in
>>>>>>>>> between. You are raising many other ISSUEs including a full-blown
>>>>>>>>> counter
>>>>>>>>> proposal that would replace basically everything, and at the same
>>>>>>>>> time put
>>>>>>>>> pressure on me to not do my homework. It shouldn't come as a surprise
>>>>>>>>> that I
>>>>>>>>> never have time if I am forced to spend my time responding to all your
>>>>>>>>> other
>>>>>>>>> issues. Meanwhile, nobody else in the group steps up to this task
>>>>>>>>> either. The
>>>>>>>>> last time I looked into pre-binding a few weeks ago, I was
>>>>>>>>> experimenting with
>>>>>>>>> the syntax transform package in Jena. I found a bug that had to be fixed
>>>>>>>>> first, halting my progress:
>>>>>>>>>
>>>>>>>>> https://github.com/apache/jena/commit/bc5ace0e9460ae979079532f610a88b6363e96e5
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> I then went on vacation and had plenty of other TopQuadrant work on my
>>>>>>>>> plate.
>>>>>>>>> I will try to get back to this topic soon.
>>>>>>>>>
>>>>>>>>> At the same time I still do not understand your problem with the
>>>>>>>>> semantics of
>>>>>>>>> pre-binding. Simply using VALUES is not going to work, because we need
>>>>>>>>> to be
>>>>>>>>> able to walk into nested scopes and even nested SELECT queries. I had
>>>>>>>>> explained this before. Not sure why you keep repeating the same issue.
>>>>>>>> Pre-binding is currently defined in a very complex manner.
>>>>>>>>
>>>>>>>> There is an initial substitution into SPARQL code.  This substitution
>>>>>>>> changes
>>>>>>>> the behaviour of the SPARQL code in many different ways.  First there is
>>>>>>>> the
>>>>>>>> change that would occur if the affected variable had a top-level binding.
>>>>>>>> However, there are other changes.   Distinct variables with the same
>>>>>>>> name in
>>>>>>>> sub-queries are also changed.  This changes the meaning of sub-queries
>>>>>>>> in a
>>>>>>>> way different than that of a top-level binding.  Second, the substitution
>>>>>>>> makes certain bits of previously-valid syntax invalid, including
>>>>>>>> bindings,
>>>>>>>> GRAPH constructs, the bound function, GROUP BY constructs, and ORDER BY
>>>>>>>> constructs.  Each of these have to be fixed up by a set of compensating
>>>>>>>> code
>>>>>>>> transformations.   There is no certainty that there are not other
>>>>>>>> compensations that need to be made to handle invalid syntax caused by the
>>>>>>>> substitution.  I can easily think of several - simple variables in select
>>>>>>>> clauses, variables in group conditions, variables in bindings, and
>>>>>>>> variables
>>>>>>>> in data blocks.  There could easily be others.  There is also no
>>>>>>>> certainty
>>>>>>>> that the initial substitution does not change the meaning of SPARQL
>>>>>>>> code.  I
>>>>>>>> pointed out above that it does change the meaning of subqueries but there
>>>>>>>> could easily be other changes.
>>>>>>>>
>>>>>>>> Blank nodes then add another complication.  The current document does
>>>>>>>> not give
>>>>>>>> an actual method for handling pre-bound blank nodes.  The document
>>>>>>>> suggests
>>>>>>>> that using an algebra approach would work and so would a substitution
>>>>>>>> approach.  However, there are no details of how to do either and no
>>>>>>>> specification of what either should actually do.
>>>>>>> Ok, I have switched to a minimal yet precise definition of pre-binding
>>>>>>> now:
>>>>>>>
>>>>>>>               <p>
>>>>>>>                   <span class="term">pre-binding</span> a variable with a
>>>>>>> value
>>>>>>> means that, prior to evaluating a query,
>>>>>>>                   the SHACL processor needs to substitute all occurrences
>>>>>>> of the
>>>>>>> variable in the query (including
>>>>>>>                   inner scopes and nested SELECT queries) with the
>>>>>>> provided value.
>>>>>>>                   In other words, whenever a SPARQL processor evaluates a
>>>>>>> pre-bound variable, it must use the given value.
>>>>>>>               </p>
>>>>>>>
>>>>>>> This avoids talking about implementation details. Informally, I could add
>>>>>>> that
>>>>>>> possible implementation strategies are
>>>>>>> - use of VALUES (in simple cases)
>>>>>>> - Algebra manipulation (as done by Jena setInitialBindings)
>>>>>>> - internal Syntax tree manipulation (as done by Jena syntaxtransforms)
>>>>>>> - run-time variable substitution (as done by Sesame)
>>>>>>>
>>>>>>> This definition eliminates the bnode issues and other problems that you
>>>>>>> have
>>>>>>> mentioned. I believe it is sufficiently precise to explain the meaning to
>>>>>>> users and guide implementers, without over-complicating it.
>>>>>>>
>>>>>>> What else is missing?
>>>>>>>
>>>>>>> Holger
>>>>>>>
>>>>>>>
>>>
> 
>
Received on Tuesday, 19 April 2016 02:28:19 UTC