Re: Related comments from Peter F. Patel-Schneider on 2017-02-08 (public-sparql-exists@w3.org from February 2017)

From: Peter F. Patel-Schneider <pfpschneider@gmail.com>
Date: Wed, 8 Feb 2017 10:20:18 -0800
To: Andy Seaborne <andy@apache.org>, public-sparql-exists@w3.org
Message-ID: <b18b059e-d879-3e0d-216c-4dc8211eeda6@gmail.com>
So the idea appears to be to show that the only effect of variable replacement
in an algebra expression is to systematically change the variable in the
resulting solution sequence.  Now this hypothesis has to be turned into an
inductive hypothesis and proven to be true for all of the SPARQL algebra.  If
this can be done, then it is easy to show the PrjMap(P,PV) is fine because the
variable doesn't show up in the resulting solution sequence.

peter

On 02/07/2017 03:30 AM, Andy Seaborne wrote:
> 
> 
> On 06/02/17 11:48, Peter F. Patel-Schneider wrote:
>> On 02/06/2017 02:21 AM, Andy Seaborne wrote:
>>> Peter has some comments on the SHACL comments list that relate to EXISTS. [1]
>>>
>>>> There is no demonstration that the choice of fresh variables in the
>>>> definition of PrjMap(P,PV) is insignificant.
>>>
>>> I hope we can explain in the document to clarify this, but I'm not clear what
>>> you are looking for.
>>
>> That the result of the evaluation doesn't depend on the choice of free
>> variables.
>>
>>> What would constitute such a demonstration?
>>
>> That's a good question.  When I first noticed this problem I was thinking that
>> this was just a t that hadn't been dotted, but I'm really not sure how to go
>> about showing that the choice doesn't matter.
>>
>>> Do you have a example where it is significant?
>>
>> No, at least not yet?  Do you have a demonstration that there are none?
> 
> An explanation:
> 
> Evaluation of a projection results in a solution sequence that can contains
> only variables of the projection and no others.
> 
> For any algebra expression, replacing a variable systematically with a fresh
> variable has a visible effect as a change in the solution sequence binding for
> that variable.
> 
> You can't find the name of a variable during the evaluation of a SPARQL query
> (because graph patterns and solution modifiers are not available as
> datastructures to access). It's call-by-value and even the special forms like
> IF, and COALESCE don't expose the variable name because they only change when
> arguments are evaluated, not pass down the argument expression itself.
> 
> ((
> The best I can think of is expressions that associate a value with a variable
> like
> 
>     IF ( bound(?x) , "x", "not x")
> 
> but that's more of an alias, not the variable name itself. Renaming ?x is not
> observable and the alias is unchanged.
> ))
> 
> For a projection, one can rename the unprojected variables of the expression
> over which the project operates because the renaming changes the solution
> sequence before projection only on variables that projection does not expose.
> 
> It is not visible in the solution sequence result of the projection.
> 
> Another way of thinking about it is that the binding due to unprojected
> variables are not accessible to operations that use the result of the projection.
> 
>>
>>>> The result of PrjMap(X) depends on the order in which the projections
>>>> in X are chosen, but this order is not specified.
>>>
>>> Yes - it would be better to define the order and the outcome is order
>>> dependent with respect to replaced variables but does it make a difference? It
>>> is only variables restricted by scope that are changed.
>>
>> The mappings reach down into sub-expressions and change disconnected variables
>> there so they violate the scoping of SPARQL.
> 
> In fact, there is a design choice here - either choice is workable, both have
> use cases for different audiences. It's not a technical issue - it's a judgement.
> 
> The other design is one where there is no remapping variables and then the
> EXISTS insertion of the current row would affect the disconnected variables.
> 
> It violates the property of SPARQL evaluation that renaming inside project of
> disconnected variables does not matter anywhere else. Optimizers and parallel
> execution exploit that property. (I got a related question from someone about
> this last week - they are implementing some kind of optimized evaluation and
> wanted to discuss the details.)
> 
>>> Do you have a case where it makes an observable difference?
>>
>> NO, at least not yet.  Do you have a demonstration that there are none?
>>
>>> Would a bottom-up replacement be suitable?
>>
>> I think so.  If you fixed the free variables for all the mappings then I think
>> that a bottom-up replacement schedule would produce a unique result.  This
>> remains to be demonstrated but shouldn't be too hard. As well, none of the
>> mappings would affect disconnected variables, I think.
> 
> bottom-up is the safest (the fresh variables must be fresh across all the
> renaming going on) and easiest to explain.
> 
> The requirement is top-down one : to rename at first SELECT down every branch
> of the expression tree where the variable is hidden.  If done top-down, it is
> renamed once.
> 
> It is minimal renaming if done for each variable of scope of the current row
> is considered separately. That makes it more complicated - it might be worth
> non-definitional text to say this but the more direct definition is a
> bottom-up walk; even left-to-right, bottom-up to give a unique walk order.
> 
>> Of course, doing only this part doesn't solve the major problem.
> 
> I'll leave the SHACL specific comments to the SHACL WG and comments list.
> 
>>
>>>     Andy
>>>
>>> [1] https://lists.w3.org/Archives/Public/public-rdf-shapes/2017Jan/0010.html
>>
>> It seems so obvious that the choice of variables does not matter, but thinking
>> about how to demonstrate that this is so leads to lots of tricky bits.
>>
>> peter
>>
Received on Wednesday, 8 February 2017 18:20:53 UTC