Re: Related comments from Andy Seaborne on 2017-02-07 (public-sparql-exists@w3.org from February 2017)

From: Andy Seaborne <andy@apache.org>
Date: Tue, 7 Feb 2017 11:30:32 +0000
To: "Peter F. Patel-Schneider" <pfpschneider@gmail.com>, public-sparql-exists@w3.org
Message-ID: <b9a9f706-0ec3-b20b-aab9-a3a8c3f5bc74@apache.org>
On 06/02/17 11:48, Peter F. Patel-Schneider wrote:
> On 02/06/2017 02:21 AM, Andy Seaborne wrote:
>> Peter has some comments on the SHACL comments list that relate to EXISTS. [1]
>>
>>> There is no demonstration that the choice of fresh variables in the
>>> definition of PrjMap(P,PV) is insignificant.
>>
>> I hope we can explain in the document to clarify this, but I'm not clear what
>> you are looking for.
>
> That the result of the evaluation doesn't depend on the choice of free variables.
>
>> What would constitute such a demonstration?
>
> That's a good question.  When I first noticed this problem I was thinking that
> this was just a t that hadn't been dotted, but I'm really not sure how to go
> about showing that the choice doesn't matter.
>
>> Do you have a example where it is significant?
>
> No, at least not yet?  Do you have a demonstration that there are none?

An explanation:

Evaluation of a projection results in a solution sequence that can 
contains only variables of the projection and no others.

For any algebra expression, replacing a variable systematically with a 
fresh variable has a visible effect as a change in the solution sequence 
binding for that variable.

You can't find the name of a variable during the evaluation of a SPARQL 
query (because graph patterns and solution modifiers are not available 
as datastructures to access). It's call-by-value and even the special 
forms like IF, and COALESCE don't expose the variable name because they 
only change when arguments are evaluated, not pass down the argument 
expression itself.

((
The best I can think of is expressions that associate a value with a 
variable like

     IF ( bound(?x) , "x", "not x")

but that's more of an alias, not the variable name itself. Renaming ?x 
is not observable and the alias is unchanged.
))

For a projection, one can rename the unprojected variables of the 
expression over which the project operates because the renaming changes 
the solution sequence before projection only on variables that 
projection does not expose.

It is not visible in the solution sequence result of the projection.

Another way of thinking about it is that the binding due to unprojected 
variables are not accessible to operations that use the result of the 
projection.

>
>>> The result of PrjMap(X) depends on the order in which the projections
>>> in X are chosen, but this order is not specified.
>>
>> Yes - it would be better to define the order and the outcome is order
>> dependent with respect to replaced variables but does it make a difference? It
>> is only variables restricted by scope that are changed.
>
> The mappings reach down into sub-expressions and change disconnected variables
> there so they violate the scoping of SPARQL.

In fact, there is a design choice here - either choice is workable, both 
have use cases for different audiences. It's not a technical issue - 
it's a judgement.

The other design is one where there is no remapping variables and then 
the EXISTS insertion of the current row would affect the disconnected 
variables.

It violates the property of SPARQL evaluation that renaming inside 
project of disconnected variables does not matter anywhere else. 
Optimizers and parallel execution exploit that property. (I got a 
related question from someone about this last week - they are 
implementing some kind of optimized evaluation and wanted to discuss the 
details.)

>> Do you have a case where it makes an observable difference?
>
> NO, at least not yet.  Do you have a demonstration that there are none?
>
>> Would a bottom-up replacement be suitable?
>
> I think so.  If you fixed the free variables for all the mappings then I think
> that a bottom-up replacement schedule would produce a unique result.  This
> remains to be demonstrated but shouldn't be too hard. As well, none of the
> mappings would affect disconnected variables, I think.

bottom-up is the safest (the fresh variables must be fresh across all 
the renaming going on) and easiest to explain.

The requirement is top-down one : to rename at first SELECT down every 
branch of the expression tree where the variable is hidden.  If done 
top-down, it is renamed once.

It is minimal renaming if done for each variable of scope of the current 
row is considered separately. That makes it more complicated - it might 
be worth non-definitional text to say this but the more direct 
definition is a bottom-up walk; even left-to-right, bottom-up to give a 
unique walk order.

> Of course, doing only this part doesn't solve the major problem.

I'll leave the SHACL specific comments to the SHACL WG and comments list.

>
>>     Andy
>>
>> [1] https://lists.w3.org/Archives/Public/public-rdf-shapes/2017Jan/0010.html
>
> It seems so obvious that the choice of variables does not matter, but thinking
> about how to demonstrate that this is so leads to lots of tricky bits.
>
> peter
>
Received on Tuesday, 7 February 2017 11:31:08 UTC