- From: Andy Seaborne <andy@apache.org>
- Date: Tue, 7 Feb 2017 11:30:32 +0000
- To: "Peter F. Patel-Schneider" <pfpschneider@gmail.com>, public-sparql-exists@w3.org
On 06/02/17 11:48, Peter F. Patel-Schneider wrote:
> On 02/06/2017 02:21 AM, Andy Seaborne wrote:
>> Peter has some comments on the SHACL comments list that relate to EXISTS. [1]
>>
>>> There is no demonstration that the choice of fresh variables in the
>>> definition of PrjMap(P,PV) is insignificant.
>>
>> I hope we can explain in the document to clarify this, but I'm not clear what
>> you are looking for.
>
> That the result of the evaluation doesn't depend on the choice of free variables.
>
>> What would constitute such a demonstration?
>
> That's a good question. When I first noticed this problem I was thinking that
> this was just a t that hadn't been dotted, but I'm really not sure how to go
> about showing that the choice doesn't matter.
>
>> Do you have a example where it is significant?
>
> No, at least not yet? Do you have a demonstration that there are none?
An explanation:
Evaluation of a projection results in a solution sequence that can
contains only variables of the projection and no others.
For any algebra expression, replacing a variable systematically with a
fresh variable has a visible effect as a change in the solution sequence
binding for that variable.
You can't find the name of a variable during the evaluation of a SPARQL
query (because graph patterns and solution modifiers are not available
as datastructures to access). It's call-by-value and even the special
forms like IF, and COALESCE don't expose the variable name because they
only change when arguments are evaluated, not pass down the argument
expression itself.
((
The best I can think of is expressions that associate a value with a
variable like
IF ( bound(?x) , "x", "not x")
but that's more of an alias, not the variable name itself. Renaming ?x
is not observable and the alias is unchanged.
))
For a projection, one can rename the unprojected variables of the
expression over which the project operates because the renaming changes
the solution sequence before projection only on variables that
projection does not expose.
It is not visible in the solution sequence result of the projection.
Another way of thinking about it is that the binding due to unprojected
variables are not accessible to operations that use the result of the
projection.
>
>>> The result of PrjMap(X) depends on the order in which the projections
>>> in X are chosen, but this order is not specified.
>>
>> Yes - it would be better to define the order and the outcome is order
>> dependent with respect to replaced variables but does it make a difference? It
>> is only variables restricted by scope that are changed.
>
> The mappings reach down into sub-expressions and change disconnected variables
> there so they violate the scoping of SPARQL.
In fact, there is a design choice here - either choice is workable, both
have use cases for different audiences. It's not a technical issue -
it's a judgement.
The other design is one where there is no remapping variables and then
the EXISTS insertion of the current row would affect the disconnected
variables.
It violates the property of SPARQL evaluation that renaming inside
project of disconnected variables does not matter anywhere else.
Optimizers and parallel execution exploit that property. (I got a
related question from someone about this last week - they are
implementing some kind of optimized evaluation and wanted to discuss the
details.)
>> Do you have a case where it makes an observable difference?
>
> NO, at least not yet. Do you have a demonstration that there are none?
>
>> Would a bottom-up replacement be suitable?
>
> I think so. If you fixed the free variables for all the mappings then I think
> that a bottom-up replacement schedule would produce a unique result. This
> remains to be demonstrated but shouldn't be too hard. As well, none of the
> mappings would affect disconnected variables, I think.
bottom-up is the safest (the fresh variables must be fresh across all
the renaming going on) and easiest to explain.
The requirement is top-down one : to rename at first SELECT down every
branch of the expression tree where the variable is hidden. If done
top-down, it is renamed once.
It is minimal renaming if done for each variable of scope of the current
row is considered separately. That makes it more complicated - it might
be worth non-definitional text to say this but the more direct
definition is a bottom-up walk; even left-to-right, bottom-up to give a
unique walk order.
> Of course, doing only this part doesn't solve the major problem.
I'll leave the SHACL specific comments to the SHACL WG and comments list.
>
>> Andy
>>
>> [1] https://lists.w3.org/Archives/Public/public-rdf-shapes/2017Jan/0010.html
>
> It seems so obvious that the choice of variables does not matter, but thinking
> about how to demonstrate that this is so leads to lots of tricky bits.
>
> peter
>
Received on Tuesday, 7 February 2017 11:31:08 UTC