Re: Related comments from Andy Seaborne on 2017-02-11 (public-sparql-exists@w3.org from February 2017)

From: Andy Seaborne <andy@apache.org>
Date: Sat, 11 Feb 2017 15:36:41 +0000
To: public-sparql-exists@w3.org
Message-ID: <f5da0654-cf0c-7815-238f-38797f982be2@apache.org>
On 10/02/17 13:21, Peter F. Patel-Schneider wrote:
> I think that you have to show that no form is sensitive to the variable name
> in any other way, for example removing a solution because of some interaction
> between a variable name and anything else.

There is no way to get the name of a variable in expression evaluation.

Nor is there a way to introduce a variable during execution except 
PrjMap when all variables are "fresh" - so making the "fresh" clear it 
means as you go along, not a static choice at the start of PrjMap (which 
I have already said is a good thing to say) means there is no clash.

There is no quoting variables and no eval()-like functionality to create 
expressions during evaluation either.

> And now that I think of it, I don't think that the analysis below is adequate
> as it doesn't show that the variable name doesn't somehow end up in the value
> side of the binding.

The value side of a binding is an RDFTerm or an error. Named variables 
are not RDF terms.  A SPARQL function can not return a variable or 
expression, nor can the pattern binding operations create bindings from 
variable to a variable or expression.

Peter - if you know of problem areas, can we have an example please?

 Andy

>
> peter
>
>
> On 02/10/2017 04:21 AM, Andy Seaborne wrote:
>>
>>
>> On 09/02/17 15:15, Peter F. Patel-Schneider wrote:
>>> OK, that's another piece of the puzzle.
>>>
>>> Any more pieces done?
>>
>> Are you referring to SHACL?
>>
>> For EXISTS, both your comments have answers.
>>
>>     Andy
>>
>>>
>>> peter
>>>
>>> On 02/09/2017 06:08 AM, Andy Seaborne wrote:
>>>> There are four forms that create bindings, all the rest recombine bindings in
>>>> solution mapping/sequences into other solution mapping/sequences but not
>>>> create or modify bindings.
>>>>
>>>>
>>>>     Basic Graph Pattern Matching
>>>>     Property Path Patterns
>>>>     GRAPH ?variable
>>>>     AS
>>>>
>>>> All of these only create bindings according to the variable names used in
>>>> their form and do not introduce a free choice of by name.
>>>>
>>>> A systematic renaming therefore changes the form and it's outcome.
>>>>
>>>>    eval(rename_algebra(X)) is equivalent to  rename_solution(eval(X))
>>>>
>>>>     Andy
>>>>
>>>> On 08/02/17 18:20, Peter F. Patel-Schneider wrote:
>>>>> So the idea appears to be to show that the only effect of variable
>>>>> replacement
>>>>> in an algebra expression is to systematically change the variable in the
>>>>> resulting solution sequence.  Now this hypothesis has to be turned into an
>>>>> inductive hypothesis and proven to be true for all of the SPARQL algebra.  If
>>>>> this can be done, then it is easy to show the PrjMap(P,PV) is fine because
>>>>> the
>>>>> variable doesn't show up in the resulting solution sequence.
>>>>>
>>>>> peter
>>>>>
>>>>> On 02/07/2017 03:30 AM, Andy Seaborne wrote:
>>>>>>
>>>>>>
>>>>>> On 06/02/17 11:48, Peter F. Patel-Schneider wrote:
>>>>>>> On 02/06/2017 02:21 AM, Andy Seaborne wrote:
>>>>>>>> Peter has some comments on the SHACL comments list that relate to EXISTS.
>>>>>>>> [1]
>>>>>>>>
>>>>>>>>> There is no demonstration that the choice of fresh variables in the
>>>>>>>>> definition of PrjMap(P,PV) is insignificant.
>>>>>>>>
>>>>>>>> I hope we can explain in the document to clarify this, but I'm not clear
>>>>>>>> what
>>>>>>>> you are looking for.
>>>>>>>
>>>>>>> That the result of the evaluation doesn't depend on the choice of free
>>>>>>> variables.
>>>>>>>
>>>>>>>> What would constitute such a demonstration?
>>>>>>>
>>>>>>> That's a good question.  When I first noticed this problem I was thinking
>>>>>>> that
>>>>>>> this was just a t that hadn't been dotted, but I'm really not sure how
>>>>>>> to go
>>>>>>> about showing that the choice doesn't matter.
>>>>>>>
>>>>>>>> Do you have a example where it is significant?
>>>>>>>
>>>>>>> No, at least not yet?  Do you have a demonstration that there are none?
>>>>>>
>>>>>> An explanation:
>>>>>>
>>>>>> Evaluation of a projection results in a solution sequence that can contains
>>>>>> only variables of the projection and no others.
>>>>>>
>>>>>> For any algebra expression, replacing a variable systematically with a fresh
>>>>>> variable has a visible effect as a change in the solution sequence
>>>>>> binding for
>>>>>> that variable.
>>>>>>
>>>>>> You can't find the name of a variable during the evaluation of a SPARQL
>>>>>> query
>>>>>> (because graph patterns and solution modifiers are not available as
>>>>>> datastructures to access). It's call-by-value and even the special forms
>>>>>> like
>>>>>> IF, and COALESCE don't expose the variable name because they only change
>>>>>> when
>>>>>> arguments are evaluated, not pass down the argument expression itself.
>>>>>>
>>>>>> ((
>>>>>> The best I can think of is expressions that associate a value with a
>>>>>> variable
>>>>>> like
>>>>>>
>>>>>>     IF ( bound(?x) , "x", "not x")
>>>>>>
>>>>>> but that's more of an alias, not the variable name itself. Renaming ?x is
>>>>>> not
>>>>>> observable and the alias is unchanged.
>>>>>> ))
>>>>>>
>>>>>> For a projection, one can rename the unprojected variables of the expression
>>>>>> over which the project operates because the renaming changes the solution
>>>>>> sequence before projection only on variables that projection does not
>>>>>> expose.
>>>>>>
>>>>>> It is not visible in the solution sequence result of the projection.
>>>>>>
>>>>>> Another way of thinking about it is that the binding due to unprojected
>>>>>> variables are not accessible to operations that use the result of the
>>>>>> projection.
>>>>>>
>>>>>>>
>>>>>>>>> The result of PrjMap(X) depends on the order in which the projections
>>>>>>>>> in X are chosen, but this order is not specified.
>>>>>>>>
>>>>>>>> Yes - it would be better to define the order and the outcome is order
>>>>>>>> dependent with respect to replaced variables but does it make a
>>>>>>>> difference? It
>>>>>>>> is only variables restricted by scope that are changed.
>>>>>>>
>>>>>>> The mappings reach down into sub-expressions and change disconnected
>>>>>>> variables
>>>>>>> there so they violate the scoping of SPARQL.
>>>>>>
>>>>>> In fact, there is a design choice here - either choice is workable, both
>>>>>> have
>>>>>> use cases for different audiences. It's not a technical issue - it's a
>>>>>> judgement.
>>>>>>
>>>>>> The other design is one where there is no remapping variables and then the
>>>>>> EXISTS insertion of the current row would affect the disconnected variables.
>>>>>>
>>>>>> It violates the property of SPARQL evaluation that renaming inside
>>>>>> project of
>>>>>> disconnected variables does not matter anywhere else. Optimizers and
>>>>>> parallel
>>>>>> execution exploit that property. (I got a related question from someone
>>>>>> about
>>>>>> this last week - they are implementing some kind of optimized evaluation and
>>>>>> wanted to discuss the details.)
>>>>>>
>>>>>>>> Do you have a case where it makes an observable difference?
>>>>>>>
>>>>>>> NO, at least not yet.  Do you have a demonstration that there are none?
>>>>>>>
>>>>>>>> Would a bottom-up replacement be suitable?
>>>>>>>
>>>>>>> I think so.  If you fixed the free variables for all the mappings then I
>>>>>>> think
>>>>>>> that a bottom-up replacement schedule would produce a unique result.  This
>>>>>>> remains to be demonstrated but shouldn't be too hard. As well, none of the
>>>>>>> mappings would affect disconnected variables, I think.
>>>>>>
>>>>>> bottom-up is the safest (the fresh variables must be fresh across all the
>>>>>> renaming going on) and easiest to explain.
>>>>>>
>>>>>> The requirement is top-down one : to rename at first SELECT down every
>>>>>> branch
>>>>>> of the expression tree where the variable is hidden.  If done top-down,
>>>>>> it is
>>>>>> renamed once.
>>>>>>
>>>>>> It is minimal renaming if done for each variable of scope of the current row
>>>>>> is considered separately. That makes it more complicated - it might be worth
>>>>>> non-definitional text to say this but the more direct definition is a
>>>>>> bottom-up walk; even left-to-right, bottom-up to give a unique walk order.
>>>>>>
>>>>>>> Of course, doing only this part doesn't solve the major problem.
>>>>>>
>>>>>> I'll leave the SHACL specific comments to the SHACL WG and comments list.
>>>>>>
>>>>>>>
>>>>>>>>     Andy
>>>>>>>>
>>>>>>>> [1]
>>>>>>>> https://lists.w3.org/Archives/Public/public-rdf-shapes/2017Jan/0010.html
>>>>>>>
>>>>>>> It seems so obvious that the choice of variables does not matter, but
>>>>>>> thinking
>>>>>>> about how to demonstrate that this is so leads to lots of tricky bits.
>>>>>>>
>>>>>>> peter
>>>>>>>
>>>
>>
>
Received on Saturday, 11 February 2017 15:37:17 UTC