Re: ISSUE-68: Simpler definition of pre-binding from Peter F. Patel-Schneider on 2016-04-21 (public-data-shapes-wg@w3.org from April 2016)

From: Peter F. Patel-Schneider <pfpschneider@gmail.com>
Date: Thu, 21 Apr 2016 05:20:45 -0700
To: Holger Knublauch <holger@topquadrant.com>, public-data-shapes-wg@w3.org
Message-ID: <5718C59D.1030702@gmail.com>
Sure, basic graph patterns are processed in SPARQL, but this process (Section
18.3) does not involve evaluation of variables in basic graph patterns.  An
implementation of the current spec would not be licensed to do replacement in
basic graph patterns.

This may be only a matter of terminology.  However the spec is currently so
imprecise that in many places it is not possible for me to determine the
intent of the spec.

When I encounter places where the intent of the spec is undeterminable or the
spec is imprecise or self-contradictory I point them out.  It is vital that
all these places are fixed before the working group is finished.  It is best
to fix these places as soon as possible as it is quite often the case that
elucidating the places where intent is unclear or there are imprecisions or
contradictions will bring out significant problems.

peter


On 04/21/2016 01:01 AM, Holger Knublauch wrote:
> On 19/04/2016 12:27, Peter F. Patel-Schneider wrote:
>> On 04/18/2016 06:06 PM, Holger Knublauch wrote:
>>> On 19/04/2016 1:39, Peter F. Patel-Schneider wrote:
>>>> I see several problems with this wording:
>>>>
>>>> 1/ SPARQL only performs evaluation in certain situations.  For example, in
>>>>
>>>> SELECT $this WHERE { $this :p ?that }
>>>>
>>>> neither this nor that are evaluated at any time.
>>> I believe both will get evaluated. The query will only return a row if there
>>> is at least one match for the basic graph pattern ?this :p ?that. Evaluating
>>> the BGP will evaluate the variables.
>> As far as can tell SPARQL does not evaluate variables in basic graph patterns.
>>   Consider the example from 18.2.3
>>
>> Example: Pattern involving BIND:
>> { ?s :p ?v . BIND (2*?v AS ?v2) ?s :p1 ?v2 }
>> Join(
>>     Extend( BGP(?s :p ?v), ?v2, 2*?v) ,
>>     BGP(?s :p1 ?v2) )
>>
>> Note that variables are passed into the BGP expressions without any evaluation
>> happening.
> 
> The section that you quote is only about the mapping of the string-based
> SPARQL syntax into Algebra objects. These Algebra objects are then evaluated
> (using the eval() operations later in the SPARQL 1.1 document). Each step in
> the evaluation takes previous solution bindings as input and produces new
> solution bindings - until they get projected out as SELECT results. This is a
> bit like a stream. The pre-binding basically means that the solution bindings
> coming into each step already contain bindings for the given variable names.
> 
> Anyway, I believe this is now entirely about terminology. The behavior should
> be clear by now. We will define test cases that we could point at to provide
> some more formal backing.
> 
>>
>>>> Some other wording is needed.
>>>>
>>>> 2/ This description appears to be written to that the $this and $predicate in
>>>> the subquery are affected even though they are effectively different
>>>> variables from the ?this and ?predicate in the main query.
>>>>
>>>> SELECT $this ($this AS ?subject) $predicate
>>>> WHERE {
>>>>      {
>>>>          SELECT (COUNT(?value) AS ?count)
>>>>          WHERE {
>>>>              $this $predicate ?value .
>>>>          }
>>>>      }
>>>>      FILTER (?count < $minCount)
>>>> }
>>>>
>>>> Is this what pre-binding in SPARQL is supposed to do?  If not, some other
>>>> term
>>>> should
>>>> be used.
>>> Yes, this is what pre-binding in SPARQL is supposed to do. (And I believe we
>>> have talked about this many times now, and there is even an explicit sentence
>>> about it).
>> The initial wording stated, I think, that pre-binding was like using BIND (or
>> maybe VALUES).  This is very different.  It would be nice to have some
>> indication that this is indeed what is done in SPARQL implementations.
> 
> The BIND or VALUES is no longer mentioned, and what we have mentioned in
> previous attempts of this document shouldn't matter to the final version.
> 
>>
>>>> Can SPARQL implementations do this at all in an interoperable fashion?
>>> Yes, implementations for this exist.
>>>> 3/ What sorts of values are allowed in pre-binding?
>>> Any RDF node. Since I don't exclude any node kinds, this is hopefully clear.
>> But only RDF terms?  I don't see any wording to this effect but maybe there is
>> no way to get anything except an RDF term into here.
> 
> Yes, there is nothing else possible here.
> 
> Holger
> 
> 
>>
>>
>>>> 4/ When do "SHACL processors" evaluate occurrences of variables?
>>> Changed to "SPARQL processors". That was a typo.
>>>
>>>> 5/ How does this work with blank nodes?
>>> Bnodes are nodes like any other here. The "substitution" does not go through
>>> the SPARQL syntax, so it can directly access the node object (e.g. Node
>>> instance in Jena)
>> OK
>>
>>> Latest version online:
>>>
>>> https://github.com/w3c/data-shapes/commit/1457ae924171fae7536102bbcabddfc4f9509d9f
>>>
>>>
>>>
>>> Holger
>>>
>>>
>>>>
>>>>
>>>> I'm having a hard time finding descriptions of pre-binding for SPARQL
>>>> implementations.  Are there any decent descriptions available?
>>>>
>>>>
>>>> peter
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On 04/18/2016 06:49 AM, Holger Knublauch wrote:
>>>>> Oops, yes. I should have taken out the "prior...". Let me try again:
>>>>>
>>>>> <span class="term">Pre-binding</span> a variable with a value means that
>>>>> the SHACL processor needs to evaluate all
>>>>> occurrences of variables with that same name
>>>>> (including occurrences in inner scopes and nested SELECT queries)
>>>>> so that they have the provided value.
>>>>> In other words, whenever a SPARQL processor evaluates a pre-bound
>>>>> variable, it must use the given value.
>>>>>
>>>>> I don't see why the term "evaluation time" would be unclear. A SPARQL engine
>>>>> evaluates a query and this happens during a process that takes time.
>>>>>
>>>>> I replaced the term "substitution", so that people don't assume query text
>>>>> replacement.
>>>>>
>>>>> Does the second sentence "In other words..." help or shall I delete that?
>>>>>
>>>>> Thanks,
>>>>> Holger
>>>>>
>>>>>
>>>>> On 18/04/2016 21:57, Peter F. Patel-Schneider wrote:
>>>>>> I don't see how this wording, which appears to be
>>>>>>
>>>>>> <span class="term">pre-binding</span> a variable with a value means that,
>>>>>> prior to evaluating a query, the SHACL processor needs to substitute all
>>>>>> occurrences of variables with the same name at evaluation time (including
>>>>>> inner scopes and nested SELECT queries) with the provided value.  In other
>>>>>> words, whenever a SPARQL processor evaluates a pre-bound variable, it must
>>>>>> use the given value.
>>>>>>
>>>>>> can be considered to be coherent.
>>>>>>
>>>>>>
>>>>>> What is "evaluation time"?  It is not defined anywhere.
>>>>>>
>>>>>> How  can "prior to evaluating a query" something happen "at evaluation
>>>>>> time"?
>>>>>>
>>>>>> How can substitution happen at evaluation time at all?
>>>>>>
>>>>>>
>>>>>> peter
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 04/17/2016 09:39 PM, Holger Knublauch wrote:
>>>>>>> Updated definition here:
>>>>>>>
>>>>>>> https://github.com/w3c/data-shapes/commit/3ec678b057a50e1911e9ac93b77df394bf1e45ef
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Main paragraph is now:
>>>>>>>
>>>>>>> pre-binding a variable with a value means that, prior to evaluating a
>>>>>>> query,
>>>>>>> the SHACL processor needs to substitute all occurrences of variables with
>>>>>>> the
>>>>>>> same name at evaluation time (including inner scopes and nested SELECT
>>>>>>> queries) with the provided value. In other words, whenever a SPARQL
>>>>>>> processor
>>>>>>> evaluates a pre-bound variable, it must use the given value.
>>>>>>>
>>>>>>> On 18/04/2016 12:27, Peter F. Patel-Schneider wrote:
>>>>>>>> There are several problems here.
>>>>>>>>
>>>>>>>> 1/ It is unclear what is meant by an occurrence of a variable.   Can
>>>>>>>> there be
>>>>>>>> two different variables with the same name in a SPARQL query, as in
>>>>>>>> programming languages?
>>>>>>> I have changed the prose to clarify that we mean variables with the same
>>>>>>> name
>>>>>>> (including those from nested SELECTs).
>>>>>>>
>>>>>>>> 2/ This definition of pre-binding appears to be different from other
>>>>>>>> definitions of pre-binding and different from previous definitions of
>>>>>>>> pre-bindings in the SHACL document.  I found a few descriptions of
>>>>>>>> pre-binding.  The SPIN submission has one that is very different from
>>>>>>>> this
>>>>>>>> description.  Jena appears to have query solution maps which appear to be
>>>>>>>> very
>>>>>>>> different.
>>>>>>>>
>>>>>>>> If SHACL is going to be using something that is different from the usual
>>>>>>>> meaning of pre-binding then it should not be calling it pre-binding.
>>>>>>> There is no established definition of this term anywhere. No other W3C
>>>>>>> spec
>>>>>>> uses it. I believe we are permitted to define it, and our definition is
>>>>>>> local
>>>>>>> to our document anyway. I also believe most terms will already have a
>>>>>>> usage
>>>>>>> somewhere else, so we may always conflict. Could you propose a
>>>>>>> non-conflicting
>>>>>>> term?
>>>>>>>
>>>>>>>> 3/ The discussion of pre-binding in 6.2.1 does not match the subsitution
>>>>>>>> description.
>>>>>>> I have deleted the offending sentence and left only the reference to the
>>>>>>> appendix.
>>>>>>>
>>>>>>>> 4/ Textual substitution before SPARQL execution is different from
>>>>>>>> "whenever  a
>>>>>>>> SPARQL processor evaluates a pre-bound variable [...] it must use the
>>>>>>>> given
>>>>>>>> value" because some variable mentions in SPARQL code do not evaluate the
>>>>>>>> variable.
>>>>>>> Not sure what you mean here. The spec is no longer referencing textual
>>>>>>> substitution. So has this gone away?
>>>>>>>
>>>>>>>> 5/ Substitution will produce illegal SPARQL for all of the SPARQL
>>>>>>>> definitions
>>>>>>>> of constraint components.
>>>>>>> This is not relevant because the spec does not produce new SPARQL. It
>>>>>>> operates
>>>>>>> "at evaluation time", and I have clarified this in the wording.
>>>>>>>
>>>>>>>> 6/ When the substituted value is a blank node, it will not have the
>>>>>>>> desired
>>>>>>>> meaning.
>>>>>>> Why not?
>>>>>>>
>>>>>>>
>>>>>>>> peter
>>>>>>>>
>>>>>>>> PS:  By the way, in SPARQL the ? or $ is not part of the variable so
>>>>>>>> it is
>>>>>>>> not
>>>>>>>> quite correct to talk about variables that start with $.
>>>>>>> Fixed.
>>>>>>>
>>>>>>> Thanks
>>>>>>> Holger
>>>>>>>
>>>>>>>
>>>>>>>> On 04/10/2016 05:18 PM, Holger Knublauch wrote:
>>>>>>>>> (Moved back into an ISSUE-68 thread)
>>>>>>>>>
>>>>>>>>> On 9/04/2016 0:11, Peter F. Patel-Schneider wrote:
>>>>>>>>>>>> I had thought that pre-binding was the easy one.  To do
>>>>>>>>>>>> pre-binding you
>>>>>>>>>>>> first need to extend SPARQL so that blank nodes can be used in SPARQL
>>>>>>>>>>>> queries, i.e., that if you have access to an RDF graph you can
>>>>>>>>>>>> extract
>>>>>>>>>>>> identifiers from that graph and use these identifiers in a SPARQL
>>>>>>>>>>>> query just
>>>>>>>>>>>> as if they were IRIs.  Then pre-binding just augments the (outer)
>>>>>>>>>>>> SPARQL
>>>>>>>>>>>> query with a VALUES construct that binds variables to values.
>>>>>>>>>>>>
>>>>>>>>>>>> However, apparently this is not the case, as the current document
>>>>>>>>>>>> makes
>>>>>>>>>>>> pre-binding out to be something quite different.  I do not have the
>>>>>>>>>>>> expertise to fix all the problems with the treatment of
>>>>>>>>>>>> pre-binding in
>>>>>>>>>>>> the
>>>>>>>>>>>> current document but I have pointed out a number of problems in it.
>>>>>>>>>>> This is ISSUE-68. I tried various ways of responding to your concerns,
>>>>>>>>>>> but you
>>>>>>>>>>> were not happy with either. And I agree this is work in progress. I
>>>>>>>>>>> would like
>>>>>>>>>>> to be able to finish this once and for all, but always other things
>>>>>>>>>>> pop
>>>>>>>>>>> up in
>>>>>>>>>>> between. You are raising many other ISSUEs including a full-blown
>>>>>>>>>>> counter
>>>>>>>>>>> proposal that would replace basically everything, and at the same
>>>>>>>>>>> time put
>>>>>>>>>>> pressure on me to not do my homework. It shouldn't come as a surprise
>>>>>>>>>>> that I
>>>>>>>>>>> never have time if I am forced to spend my time responding to all your
>>>>>>>>>>> other
>>>>>>>>>>> issues. Meanwhile, nobody else in the group steps up to this task
>>>>>>>>>>> either. The
>>>>>>>>>>> last time I looked into pre-binding a few weeks ago, I was
>>>>>>>>>>> experimenting with
>>>>>>>>>>> the syntax transform package in Jena. I found a bug that had to be
>>>>>>>>>>> fixed
>>>>>>>>>>> first, halting my progress:
>>>>>>>>>>>
>>>>>>>>>>> https://github.com/apache/jena/commit/bc5ace0e9460ae979079532f610a88b6363e96e5
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> I then went on vacation and had plenty of other TopQuadrant work on my
>>>>>>>>>>> plate.
>>>>>>>>>>> I will try to get back to this topic soon.
>>>>>>>>>>>
>>>>>>>>>>> At the same time I still do not understand your problem with the
>>>>>>>>>>> semantics of
>>>>>>>>>>> pre-binding. Simply using VALUES is not going to work, because we need
>>>>>>>>>>> to be
>>>>>>>>>>> able to walk into nested scopes and even nested SELECT queries. I had
>>>>>>>>>>> explained this before. Not sure why you keep repeating the same issue.
>>>>>>>>>> Pre-binding is currently defined in a very complex manner.
>>>>>>>>>>
>>>>>>>>>> There is an initial substitution into SPARQL code.  This substitution
>>>>>>>>>> changes
>>>>>>>>>> the behaviour of the SPARQL code in many different ways.  First
>>>>>>>>>> there is
>>>>>>>>>> the
>>>>>>>>>> change that would occur if the affected variable had a top-level
>>>>>>>>>> binding.
>>>>>>>>>> However, there are other changes.   Distinct variables with the same
>>>>>>>>>> name in
>>>>>>>>>> sub-queries are also changed.  This changes the meaning of sub-queries
>>>>>>>>>> in a
>>>>>>>>>> way different than that of a top-level binding.  Second, the
>>>>>>>>>> substitution
>>>>>>>>>> makes certain bits of previously-valid syntax invalid, including
>>>>>>>>>> bindings,
>>>>>>>>>> GRAPH constructs, the bound function, GROUP BY constructs, and ORDER BY
>>>>>>>>>> constructs.  Each of these have to be fixed up by a set of compensating
>>>>>>>>>> code
>>>>>>>>>> transformations.   There is no certainty that there are not other
>>>>>>>>>> compensations that need to be made to handle invalid syntax caused
>>>>>>>>>> by the
>>>>>>>>>> substitution.  I can easily think of several - simple variables in
>>>>>>>>>> select
>>>>>>>>>> clauses, variables in group conditions, variables in bindings, and
>>>>>>>>>> variables
>>>>>>>>>> in data blocks.  There could easily be others.  There is also no
>>>>>>>>>> certainty
>>>>>>>>>> that the initial substitution does not change the meaning of SPARQL
>>>>>>>>>> code.  I
>>>>>>>>>> pointed out above that it does change the meaning of subqueries but
>>>>>>>>>> there
>>>>>>>>>> could easily be other changes.
>>>>>>>>>>
>>>>>>>>>> Blank nodes then add another complication.  The current document does
>>>>>>>>>> not give
>>>>>>>>>> an actual method for handling pre-bound blank nodes.  The document
>>>>>>>>>> suggests
>>>>>>>>>> that using an algebra approach would work and so would a substitution
>>>>>>>>>> approach.  However, there are no details of how to do either and no
>>>>>>>>>> specification of what either should actually do.
>>>>>>>>> Ok, I have switched to a minimal yet precise definition of pre-binding
>>>>>>>>> now:
>>>>>>>>>
>>>>>>>>>                <p>
>>>>>>>>>                    <span class="term">pre-binding</span> a variable
>>>>>>>>> with a
>>>>>>>>> value
>>>>>>>>> means that, prior to evaluating a query,
>>>>>>>>>                    the SHACL processor needs to substitute all
>>>>>>>>> occurrences
>>>>>>>>> of the
>>>>>>>>> variable in the query (including
>>>>>>>>>                    inner scopes and nested SELECT queries) with the
>>>>>>>>> provided value.
>>>>>>>>>                    In other words, whenever a SPARQL processor
>>>>>>>>> evaluates a
>>>>>>>>> pre-bound variable, it must use the given value.
>>>>>>>>>                </p>
>>>>>>>>>
>>>>>>>>> This avoids talking about implementation details. Informally, I could
>>>>>>>>> add
>>>>>>>>> that
>>>>>>>>> possible implementation strategies are
>>>>>>>>> - use of VALUES (in simple cases)
>>>>>>>>> - Algebra manipulation (as done by Jena setInitialBindings)
>>>>>>>>> - internal Syntax tree manipulation (as done by Jena syntaxtransforms)
>>>>>>>>> - run-time variable substitution (as done by Sesame)
>>>>>>>>>
>>>>>>>>> This definition eliminates the bnode issues and other problems that you
>>>>>>>>> have
>>>>>>>>> mentioned. I believe it is sufficiently precise to explain the
>>>>>>>>> meaning to
>>>>>>>>> users and guide implementers, without over-complicating it.
>>>>>>>>>
>>>>>>>>> What else is missing?
>>>>>>>>>
>>>>>>>>> Holger
>>>>>>>>>
>>>>>>>>>
>>>
> 
>
Received on Thursday, 21 April 2016 12:21:15 UTC