Re: Related comments from Peter F. Patel-Schneider on 2017-02-10 (public-sparql-exists@w3.org from February 2017)

From: Peter F. Patel-Schneider <pfpschneider@gmail.com>
Date: Fri, 10 Feb 2017 05:21:22 -0800
To: Andy Seaborne <andy.seaborne@topquadrant.com>, public-sparql-exists@w3.org
Message-ID: <4b2222b6-df40-91cd-55ac-f9f6bf6b0839@gmail.com>
I think that you have to show that no form is sensitive to the variable name
in any other way, for example removing a solution because of some interaction
between a variable name and anything else.

And now that I think of it, I don't think that the analysis below is adequate
as it doesn't show that the variable name doesn't somehow end up in the value
side of the binding.

peter


On 02/10/2017 04:21 AM, Andy Seaborne wrote:
> 
> 
> On 09/02/17 15:15, Peter F. Patel-Schneider wrote:
>> OK, that's another piece of the puzzle.
>>
>> Any more pieces done?
> 
> Are you referring to SHACL?
> 
> For EXISTS, both your comments have answers.
> 
>     Andy
> 
>>
>> peter
>>
>> On 02/09/2017 06:08 AM, Andy Seaborne wrote:
>>> There are four forms that create bindings, all the rest recombine bindings in
>>> solution mapping/sequences into other solution mapping/sequences but not
>>> create or modify bindings.
>>>
>>>
>>>     Basic Graph Pattern Matching
>>>     Property Path Patterns
>>>     GRAPH ?variable
>>>     AS
>>>
>>> All of these only create bindings according to the variable names used in
>>> their form and do not introduce a free choice of by name.
>>>
>>> A systematic renaming therefore changes the form and it's outcome.
>>>
>>>    eval(rename_algebra(X)) is equivalent to  rename_solution(eval(X))
>>>
>>>     Andy
>>>
>>> On 08/02/17 18:20, Peter F. Patel-Schneider wrote:
>>>> So the idea appears to be to show that the only effect of variable
>>>> replacement
>>>> in an algebra expression is to systematically change the variable in the
>>>> resulting solution sequence.  Now this hypothesis has to be turned into an
>>>> inductive hypothesis and proven to be true for all of the SPARQL algebra.  If
>>>> this can be done, then it is easy to show the PrjMap(P,PV) is fine because
>>>> the
>>>> variable doesn't show up in the resulting solution sequence.
>>>>
>>>> peter
>>>>
>>>> On 02/07/2017 03:30 AM, Andy Seaborne wrote:
>>>>>
>>>>>
>>>>> On 06/02/17 11:48, Peter F. Patel-Schneider wrote:
>>>>>> On 02/06/2017 02:21 AM, Andy Seaborne wrote:
>>>>>>> Peter has some comments on the SHACL comments list that relate to EXISTS.
>>>>>>> [1]
>>>>>>>
>>>>>>>> There is no demonstration that the choice of fresh variables in the
>>>>>>>> definition of PrjMap(P,PV) is insignificant.
>>>>>>>
>>>>>>> I hope we can explain in the document to clarify this, but I'm not clear
>>>>>>> what
>>>>>>> you are looking for.
>>>>>>
>>>>>> That the result of the evaluation doesn't depend on the choice of free
>>>>>> variables.
>>>>>>
>>>>>>> What would constitute such a demonstration?
>>>>>>
>>>>>> That's a good question.  When I first noticed this problem I was thinking
>>>>>> that
>>>>>> this was just a t that hadn't been dotted, but I'm really not sure how
>>>>>> to go
>>>>>> about showing that the choice doesn't matter.
>>>>>>
>>>>>>> Do you have a example where it is significant?
>>>>>>
>>>>>> No, at least not yet?  Do you have a demonstration that there are none?
>>>>>
>>>>> An explanation:
>>>>>
>>>>> Evaluation of a projection results in a solution sequence that can contains
>>>>> only variables of the projection and no others.
>>>>>
>>>>> For any algebra expression, replacing a variable systematically with a fresh
>>>>> variable has a visible effect as a change in the solution sequence
>>>>> binding for
>>>>> that variable.
>>>>>
>>>>> You can't find the name of a variable during the evaluation of a SPARQL
>>>>> query
>>>>> (because graph patterns and solution modifiers are not available as
>>>>> datastructures to access). It's call-by-value and even the special forms
>>>>> like
>>>>> IF, and COALESCE don't expose the variable name because they only change
>>>>> when
>>>>> arguments are evaluated, not pass down the argument expression itself.
>>>>>
>>>>> ((
>>>>> The best I can think of is expressions that associate a value with a
>>>>> variable
>>>>> like
>>>>>
>>>>>     IF ( bound(?x) , "x", "not x")
>>>>>
>>>>> but that's more of an alias, not the variable name itself. Renaming ?x is
>>>>> not
>>>>> observable and the alias is unchanged.
>>>>> ))
>>>>>
>>>>> For a projection, one can rename the unprojected variables of the expression
>>>>> over which the project operates because the renaming changes the solution
>>>>> sequence before projection only on variables that projection does not
>>>>> expose.
>>>>>
>>>>> It is not visible in the solution sequence result of the projection.
>>>>>
>>>>> Another way of thinking about it is that the binding due to unprojected
>>>>> variables are not accessible to operations that use the result of the
>>>>> projection.
>>>>>
>>>>>>
>>>>>>>> The result of PrjMap(X) depends on the order in which the projections
>>>>>>>> in X are chosen, but this order is not specified.
>>>>>>>
>>>>>>> Yes - it would be better to define the order and the outcome is order
>>>>>>> dependent with respect to replaced variables but does it make a
>>>>>>> difference? It
>>>>>>> is only variables restricted by scope that are changed.
>>>>>>
>>>>>> The mappings reach down into sub-expressions and change disconnected
>>>>>> variables
>>>>>> there so they violate the scoping of SPARQL.
>>>>>
>>>>> In fact, there is a design choice here - either choice is workable, both
>>>>> have
>>>>> use cases for different audiences. It's not a technical issue - it's a
>>>>> judgement.
>>>>>
>>>>> The other design is one where there is no remapping variables and then the
>>>>> EXISTS insertion of the current row would affect the disconnected variables.
>>>>>
>>>>> It violates the property of SPARQL evaluation that renaming inside
>>>>> project of
>>>>> disconnected variables does not matter anywhere else. Optimizers and
>>>>> parallel
>>>>> execution exploit that property. (I got a related question from someone
>>>>> about
>>>>> this last week - they are implementing some kind of optimized evaluation and
>>>>> wanted to discuss the details.)
>>>>>
>>>>>>> Do you have a case where it makes an observable difference?
>>>>>>
>>>>>> NO, at least not yet.  Do you have a demonstration that there are none?
>>>>>>
>>>>>>> Would a bottom-up replacement be suitable?
>>>>>>
>>>>>> I think so.  If you fixed the free variables for all the mappings then I
>>>>>> think
>>>>>> that a bottom-up replacement schedule would produce a unique result.  This
>>>>>> remains to be demonstrated but shouldn't be too hard. As well, none of the
>>>>>> mappings would affect disconnected variables, I think.
>>>>>
>>>>> bottom-up is the safest (the fresh variables must be fresh across all the
>>>>> renaming going on) and easiest to explain.
>>>>>
>>>>> The requirement is top-down one : to rename at first SELECT down every
>>>>> branch
>>>>> of the expression tree where the variable is hidden.  If done top-down,
>>>>> it is
>>>>> renamed once.
>>>>>
>>>>> It is minimal renaming if done for each variable of scope of the current row
>>>>> is considered separately. That makes it more complicated - it might be worth
>>>>> non-definitional text to say this but the more direct definition is a
>>>>> bottom-up walk; even left-to-right, bottom-up to give a unique walk order.
>>>>>
>>>>>> Of course, doing only this part doesn't solve the major problem.
>>>>>
>>>>> I'll leave the SHACL specific comments to the SHACL WG and comments list.
>>>>>
>>>>>>
>>>>>>>     Andy
>>>>>>>
>>>>>>> [1]
>>>>>>> https://lists.w3.org/Archives/Public/public-rdf-shapes/2017Jan/0010.html
>>>>>>
>>>>>> It seems so obvious that the choice of variables does not matter, but
>>>>>> thinking
>>>>>> about how to demonstrate that this is so leads to lots of tricky bits.
>>>>>>
>>>>>> peter
>>>>>>
>>
>
Received on Friday, 10 February 2017 13:21:58 UTC