Re: Related comments from Andy Seaborne on 2017-02-10 (public-sparql-exists@w3.org from February 2017)

From: Andy Seaborne <andy.seaborne@topquadrant.com>
Date: Fri, 10 Feb 2017 12:21:28 +0000
To: public-sparql-exists@w3.org
Message-ID: <d1dfa59f-45f4-d37d-24b2-7b310075c8c7@topquadrant.com>
On 09/02/17 15:15, Peter F. Patel-Schneider wrote:
> OK, that's another piece of the puzzle.
>
> Any more pieces done?

Are you referring to SHACL?

For EXISTS, both your comments have answers.

 Andy

>
> peter
>
> On 02/09/2017 06:08 AM, Andy Seaborne wrote:
>> There are four forms that create bindings, all the rest recombine bindings in
>> solution mapping/sequences into other solution mapping/sequences but not
>> create or modify bindings.
>>
>>
>>     Basic Graph Pattern Matching
>>     Property Path Patterns
>>     GRAPH ?variable
>>     AS
>>
>> All of these only create bindings according to the variable names used in
>> their form and do not introduce a free choice of by name.
>>
>> A systematic renaming therefore changes the form and it's outcome.
>>
>>    eval(rename_algebra(X)) is equivalent to  rename_solution(eval(X))
>>
>>     Andy
>>
>> On 08/02/17 18:20, Peter F. Patel-Schneider wrote:
>>> So the idea appears to be to show that the only effect of variable replacement
>>> in an algebra expression is to systematically change the variable in the
>>> resulting solution sequence.  Now this hypothesis has to be turned into an
>>> inductive hypothesis and proven to be true for all of the SPARQL algebra.  If
>>> this can be done, then it is easy to show the PrjMap(P,PV) is fine because the
>>> variable doesn't show up in the resulting solution sequence.
>>>
>>> peter
>>>
>>> On 02/07/2017 03:30 AM, Andy Seaborne wrote:
>>>>
>>>>
>>>> On 06/02/17 11:48, Peter F. Patel-Schneider wrote:
>>>>> On 02/06/2017 02:21 AM, Andy Seaborne wrote:
>>>>>> Peter has some comments on the SHACL comments list that relate to EXISTS.
>>>>>> [1]
>>>>>>
>>>>>>> There is no demonstration that the choice of fresh variables in the
>>>>>>> definition of PrjMap(P,PV) is insignificant.
>>>>>>
>>>>>> I hope we can explain in the document to clarify this, but I'm not clear
>>>>>> what
>>>>>> you are looking for.
>>>>>
>>>>> That the result of the evaluation doesn't depend on the choice of free
>>>>> variables.
>>>>>
>>>>>> What would constitute such a demonstration?
>>>>>
>>>>> That's a good question.  When I first noticed this problem I was thinking
>>>>> that
>>>>> this was just a t that hadn't been dotted, but I'm really not sure how to go
>>>>> about showing that the choice doesn't matter.
>>>>>
>>>>>> Do you have a example where it is significant?
>>>>>
>>>>> No, at least not yet?  Do you have a demonstration that there are none?
>>>>
>>>> An explanation:
>>>>
>>>> Evaluation of a projection results in a solution sequence that can contains
>>>> only variables of the projection and no others.
>>>>
>>>> For any algebra expression, replacing a variable systematically with a fresh
>>>> variable has a visible effect as a change in the solution sequence binding for
>>>> that variable.
>>>>
>>>> You can't find the name of a variable during the evaluation of a SPARQL query
>>>> (because graph patterns and solution modifiers are not available as
>>>> datastructures to access). It's call-by-value and even the special forms like
>>>> IF, and COALESCE don't expose the variable name because they only change when
>>>> arguments are evaluated, not pass down the argument expression itself.
>>>>
>>>> ((
>>>> The best I can think of is expressions that associate a value with a variable
>>>> like
>>>>
>>>>     IF ( bound(?x) , "x", "not x")
>>>>
>>>> but that's more of an alias, not the variable name itself. Renaming ?x is not
>>>> observable and the alias is unchanged.
>>>> ))
>>>>
>>>> For a projection, one can rename the unprojected variables of the expression
>>>> over which the project operates because the renaming changes the solution
>>>> sequence before projection only on variables that projection does not expose.
>>>>
>>>> It is not visible in the solution sequence result of the projection.
>>>>
>>>> Another way of thinking about it is that the binding due to unprojected
>>>> variables are not accessible to operations that use the result of the
>>>> projection.
>>>>
>>>>>
>>>>>>> The result of PrjMap(X) depends on the order in which the projections
>>>>>>> in X are chosen, but this order is not specified.
>>>>>>
>>>>>> Yes - it would be better to define the order and the outcome is order
>>>>>> dependent with respect to replaced variables but does it make a
>>>>>> difference? It
>>>>>> is only variables restricted by scope that are changed.
>>>>>
>>>>> The mappings reach down into sub-expressions and change disconnected
>>>>> variables
>>>>> there so they violate the scoping of SPARQL.
>>>>
>>>> In fact, there is a design choice here - either choice is workable, both have
>>>> use cases for different audiences. It's not a technical issue - it's a
>>>> judgement.
>>>>
>>>> The other design is one where there is no remapping variables and then the
>>>> EXISTS insertion of the current row would affect the disconnected variables.
>>>>
>>>> It violates the property of SPARQL evaluation that renaming inside project of
>>>> disconnected variables does not matter anywhere else. Optimizers and parallel
>>>> execution exploit that property. (I got a related question from someone about
>>>> this last week - they are implementing some kind of optimized evaluation and
>>>> wanted to discuss the details.)
>>>>
>>>>>> Do you have a case where it makes an observable difference?
>>>>>
>>>>> NO, at least not yet.  Do you have a demonstration that there are none?
>>>>>
>>>>>> Would a bottom-up replacement be suitable?
>>>>>
>>>>> I think so.  If you fixed the free variables for all the mappings then I
>>>>> think
>>>>> that a bottom-up replacement schedule would produce a unique result.  This
>>>>> remains to be demonstrated but shouldn't be too hard. As well, none of the
>>>>> mappings would affect disconnected variables, I think.
>>>>
>>>> bottom-up is the safest (the fresh variables must be fresh across all the
>>>> renaming going on) and easiest to explain.
>>>>
>>>> The requirement is top-down one : to rename at first SELECT down every branch
>>>> of the expression tree where the variable is hidden.  If done top-down, it is
>>>> renamed once.
>>>>
>>>> It is minimal renaming if done for each variable of scope of the current row
>>>> is considered separately. That makes it more complicated - it might be worth
>>>> non-definitional text to say this but the more direct definition is a
>>>> bottom-up walk; even left-to-right, bottom-up to give a unique walk order.
>>>>
>>>>> Of course, doing only this part doesn't solve the major problem.
>>>>
>>>> I'll leave the SHACL specific comments to the SHACL WG and comments list.
>>>>
>>>>>
>>>>>>     Andy
>>>>>>
>>>>>> [1] https://lists.w3.org/Archives/Public/public-rdf-shapes/2017Jan/0010.html
>>>>>
>>>>> It seems so obvious that the choice of variables does not matter, but
>>>>> thinking
>>>>> about how to demonstrate that this is so leads to lots of tricky bits.
>>>>>
>>>>> peter
>>>>>
>
Received on Friday, 10 February 2017 12:22:04 UTC