Re: Related comments from Andy Seaborne on 2017-02-26 (public-sparql-exists@w3.org from February 2017)

From: Andy Seaborne <andy@apache.org>
Date: Sun, 26 Feb 2017 16:32:25 +0000
To: "Peter F. Patel-Schneider" <pfpschneider@gmail.com>, public-sparql-exists@w3.org
Message-ID: <f91e6753-8811-4e75-06bf-fddae5dcc119@apache.org>
With links this time.

On 22/02/17 20:27, Peter F. Patel-Schneider wrote:
> All I am asking for is a real demonstration that inserting fresh variables is
> a benign operation and that the choice of which fresh variable to use is not
> significant.  I view this demonstration as necessary for this proposal.

I have said I will add text to explain the process - I haven't had time 
to do that yet.  I don't know what proof would satisfy you, particularly 
what can be assumed to be already known, and I don't want to write one 
only to have it rejected completely.

> As
> the results of a SPARQL query do depend on the variable names in it, the
> demonstration is going to be different from the standard demonstration that
> renaming variables in a regular programming language does not change the
> result of a program.

In this thread we have that:

* The text for replacement needs to say that fresh variables are unique 
through later replacements as well

* The replacement process should be done bottom-up

* The name of a variable is not accessible to expressions. [1]

* Replacing a variable happens in only 4 places [2].
   The name makes the name in the solutions change.

So a name change of a variable affects the solutions as a rename (c.f. 
relational algebra rename operation).

When applied to a projection, renaming unprojected variables is not 
observable.

> This is not my proposal.  I view it as inferior partly because it does depend
> on this demonstration.  I also view it as inferior as it does not meet the
> goal of not changing the meaning of SPARQL queries that are completely
> unproblematic with the current definition of EXISTS.  For example, it
> introduces a shared variable to both sides of MINUS constructs, potentially
> changing their meaning.

Both proposals make changes around MINUS from what the SPARQL 1.1 spec 
defines - this is issue 4.

Proposal-A changes GRAPH, UNION (and MINUS) queries [3] so that existing 
EXISTS uses are changed. This is addressed by proposal-B by making the 
current row available throughout the EXISTS pattern evaluation.

     Andy


[1]
https://lists.w3.org/Archives/Public/public-sparql-exists/2017Feb/0002.html

[2]
https://lists.w3.org/Archives/Public/public-sparql-exists/2017Feb/0004.html
https://lists.w3.org/Archives/Public/public-sparql-exists/2017Feb/0005.html

[3]
https://lists.w3.org/Archives/Public/public-sparql-exists/2016Sep/0027.html
and examples
https://lists.w3.org/Archives/Public/public-sparql-exists/2016Nov/0026.html




>
> I am not willing to further help with this proposal.
>
> Peter F. Patel-Schneider
> Nuance Communications
>
>
> On 02/14/2017 06:33 AM, Andy Seaborne wrote:
>> Would you like to draft that?
>>
>> I'm able to provide text that explains this I don't think I can to the level
>> you seem to be asking for.
>>
>> It is aliasing and follows from (1) replacing one named variable in an algebra
>> expression, which is a data structure, with a fresh variable, (2) evaluation
>> can not introduce named variables and (3) it's a project.
>>
>> For explanation, saying that the project does not expose the unprojected
>> variables, so the evaluation has the same results after renaming unprojected
>> variables. Much of this come down to "fresh variable" i.e. no clash with an
>> existing one.
>>
>>     Andy
>>
>> On 11/02/17 21:28, Peter F. Patel-Schneider wrote:
>>> I don't know of any particular problem areas.  If I did, I would say so.
>>>
>>> I also don't know that there are no problem areas.  That's what the
>>> demonstration is for  - to show that mapping to fresh variables doesn't affect
>>> anything that it shouoldn't.
>>>
>>> It looks as if the parts may all be in place.  The next step is to write it
>>> all down in a formal proof.   It may be that writing everything down formally
>>> will expose some missing bits.
>>>
>>> peter
>>>
>>>
>>> On 02/11/2017 07:36 AM, Andy Seaborne wrote:
>>>>
>>>> On 10/02/17 13:21, Peter F. Patel-Schneider wrote:
>>>>> I think that you have to show that no form is sensitive to the variable name
>>>>> in any other way, for example removing a solution because of some interaction
>>>>> between a variable name and anything else.
>>>>
>>>> There is no way to get the name of a variable in expression evaluation.
>>>>
>>>> Nor is there a way to introduce a variable during execution except PrjMap when
>>>> all variables are "fresh" - so making the "fresh" clear it means as you go
>>>> along, not a static choice at the start of PrjMap (which I have already said
>>>> is a good thing to say) means there is no clash.
>>>>
>>>> There is no quoting variables and no eval()-like functionality to create
>>>> expressions during evaluation either.
>>>>
>>>>> And now that I think of it, I don't think that the analysis below is adequate
>>>>> as it doesn't show that the variable name doesn't somehow end up in the value
>>>>> side of the binding.
>>>>
>>>> The value side of a binding is an RDFTerm or an error. Named variables are not
>>>> RDF terms.  A SPARQL function can not return a variable or expression, nor can
>>>> the pattern binding operations create bindings from variable to a variable or
>>>> expression.
>>>>
>>>> Peter - if you know of problem areas, can we have an example please?
>>>>
>>>>     Andy
>>>>
>>>>>
>>>>> peter
>>>>>
>>>>>
>>>>> On 02/10/2017 04:21 AM, Andy Seaborne wrote:
>>>>>>
>>>>>>
>>>>>> On 09/02/17 15:15, Peter F. Patel-Schneider wrote:
>>>>>>> OK, that's another piece of the puzzle.
>>>>>>>
>>>>>>> Any more pieces done?
>>>>>>
>>>>>> Are you referring to SHACL?
>>>>>>
>>>>>> For EXISTS, both your comments have answers.
>>>>>>
>>>>>>     Andy
>>>>>>
>>>>>>>
>>>>>>> peter
>>>>>>>
>>>>>>> On 02/09/2017 06:08 AM, Andy Seaborne wrote:
>>>>>>>> There are four forms that create bindings, all the rest recombine
>>>>>>>> bindings in
>>>>>>>> solution mapping/sequences into other solution mapping/sequences but not
>>>>>>>> create or modify bindings.
>>>>>>>>
>>>>>>>>
>>>>>>>>     Basic Graph Pattern Matching
>>>>>>>>     Property Path Patterns
>>>>>>>>     GRAPH ?variable
>>>>>>>>     AS
>>>>>>>>
>>>>>>>> All of these only create bindings according to the variable names used in
>>>>>>>> their form and do not introduce a free choice of by name.
>>>>>>>>
>>>>>>>> A systematic renaming therefore changes the form and it's outcome.
>>>>>>>>
>>>>>>>>    eval(rename_algebra(X)) is equivalent to  rename_solution(eval(X))
>>>>>>>>
>>>>>>>>     Andy
>>>>>>>>
>>>>>>>> On 08/02/17 18:20, Peter F. Patel-Schneider wrote:
>>>>>>>>> So the idea appears to be to show that the only effect of variable
>>>>>>>>> replacement
>>>>>>>>> in an algebra expression is to systematically change the variable in the
>>>>>>>>> resulting solution sequence.  Now this hypothesis has to be turned
>>>>>>>>> into an
>>>>>>>>> inductive hypothesis and proven to be true for all of the SPARQL
>>>>>>>>> algebra.  If
>>>>>>>>> this can be done, then it is easy to show the PrjMap(P,PV) is fine
>>>>>>>>> because
>>>>>>>>> the
>>>>>>>>> variable doesn't show up in the resulting solution sequence.
>>>>>>>>>
>>>>>>>>> peter
>>>>>>>>>
>>>>>>>>> On 02/07/2017 03:30 AM, Andy Seaborne wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 06/02/17 11:48, Peter F. Patel-Schneider wrote:
>>>>>>>>>>> On 02/06/2017 02:21 AM, Andy Seaborne wrote:
>>>>>>>>>>>> Peter has some comments on the SHACL comments list that relate to
>>>>>>>>>>>> EXISTS.
>>>>>>>>>>>> [1]
>>>>>>>>>>>>
>>>>>>>>>>>>> There is no demonstration that the choice of fresh variables in the
>>>>>>>>>>>>> definition of PrjMap(P,PV) is insignificant.
>>>>>>>>>>>>
>>>>>>>>>>>> I hope we can explain in the document to clarify this, but I'm not
>>>>>>>>>>>> clear
>>>>>>>>>>>> what
>>>>>>>>>>>> you are looking for.
>>>>>>>>>>>
>>>>>>>>>>> That the result of the evaluation doesn't depend on the choice of free
>>>>>>>>>>> variables.
>>>>>>>>>>>
>>>>>>>>>>>> What would constitute such a demonstration?
>>>>>>>>>>>
>>>>>>>>>>> That's a good question.  When I first noticed this problem I was
>>>>>>>>>>> thinking
>>>>>>>>>>> that
>>>>>>>>>>> this was just a t that hadn't been dotted, but I'm really not sure how
>>>>>>>>>>> to go
>>>>>>>>>>> about showing that the choice doesn't matter.
>>>>>>>>>>>
>>>>>>>>>>>> Do you have a example where it is significant?
>>>>>>>>>>>
>>>>>>>>>>> No, at least not yet?  Do you have a demonstration that there are none?
>>>>>>>>>>
>>>>>>>>>> An explanation:
>>>>>>>>>>
>>>>>>>>>> Evaluation of a projection results in a solution sequence that can
>>>>>>>>>> contains
>>>>>>>>>> only variables of the projection and no others.
>>>>>>>>>>
>>>>>>>>>> For any algebra expression, replacing a variable systematically with a
>>>>>>>>>> fresh
>>>>>>>>>> variable has a visible effect as a change in the solution sequence
>>>>>>>>>> binding for
>>>>>>>>>> that variable.
>>>>>>>>>>
>>>>>>>>>> You can't find the name of a variable during the evaluation of a SPARQL
>>>>>>>>>> query
>>>>>>>>>> (because graph patterns and solution modifiers are not available as
>>>>>>>>>> datastructures to access). It's call-by-value and even the special forms
>>>>>>>>>> like
>>>>>>>>>> IF, and COALESCE don't expose the variable name because they only change
>>>>>>>>>> when
>>>>>>>>>> arguments are evaluated, not pass down the argument expression itself.
>>>>>>>>>>
>>>>>>>>>> ((
>>>>>>>>>> The best I can think of is expressions that associate a value with a
>>>>>>>>>> variable
>>>>>>>>>> like
>>>>>>>>>>
>>>>>>>>>>     IF ( bound(?x) , "x", "not x")
>>>>>>>>>>
>>>>>>>>>> but that's more of an alias, not the variable name itself. Renaming
>>>>>>>>>> ?x is
>>>>>>>>>> not
>>>>>>>>>> observable and the alias is unchanged.
>>>>>>>>>> ))
>>>>>>>>>>
>>>>>>>>>> For a projection, one can rename the unprojected variables of the
>>>>>>>>>> expression
>>>>>>>>>> over which the project operates because the renaming changes the
>>>>>>>>>> solution
>>>>>>>>>> sequence before projection only on variables that projection does not
>>>>>>>>>> expose.
>>>>>>>>>>
>>>>>>>>>> It is not visible in the solution sequence result of the projection.
>>>>>>>>>>
>>>>>>>>>> Another way of thinking about it is that the binding due to unprojected
>>>>>>>>>> variables are not accessible to operations that use the result of the
>>>>>>>>>> projection.
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>> The result of PrjMap(X) depends on the order in which the projections
>>>>>>>>>>>>> in X are chosen, but this order is not specified.
>>>>>>>>>>>>
>>>>>>>>>>>> Yes - it would be better to define the order and the outcome is order
>>>>>>>>>>>> dependent with respect to replaced variables but does it make a
>>>>>>>>>>>> difference? It
>>>>>>>>>>>> is only variables restricted by scope that are changed.
>>>>>>>>>>>
>>>>>>>>>>> The mappings reach down into sub-expressions and change disconnected
>>>>>>>>>>> variables
>>>>>>>>>>> there so they violate the scoping of SPARQL.
>>>>>>>>>>
>>>>>>>>>> In fact, there is a design choice here - either choice is workable, both
>>>>>>>>>> have
>>>>>>>>>> use cases for different audiences. It's not a technical issue - it's a
>>>>>>>>>> judgement.
>>>>>>>>>>
>>>>>>>>>> The other design is one where there is no remapping variables and
>>>>>>>>>> then the
>>>>>>>>>> EXISTS insertion of the current row would affect the disconnected
>>>>>>>>>> variables.
>>>>>>>>>>
>>>>>>>>>> It violates the property of SPARQL evaluation that renaming inside
>>>>>>>>>> project of
>>>>>>>>>> disconnected variables does not matter anywhere else. Optimizers and
>>>>>>>>>> parallel
>>>>>>>>>> execution exploit that property. (I got a related question from someone
>>>>>>>>>> about
>>>>>>>>>> this last week - they are implementing some kind of optimized
>>>>>>>>>> evaluation and
>>>>>>>>>> wanted to discuss the details.)
>>>>>>>>>>
>>>>>>>>>>>> Do you have a case where it makes an observable difference?
>>>>>>>>>>>
>>>>>>>>>>> NO, at least not yet.  Do you have a demonstration that there are none?
>>>>>>>>>>>
>>>>>>>>>>>> Would a bottom-up replacement be suitable?
>>>>>>>>>>>
>>>>>>>>>>> I think so.  If you fixed the free variables for all the mappings
>>>>>>>>>>> then I
>>>>>>>>>>> think
>>>>>>>>>>> that a bottom-up replacement schedule would produce a unique result.
>>>>>>>>>>> This
>>>>>>>>>>> remains to be demonstrated but shouldn't be too hard. As well, none of
>>>>>>>>>>> the
>>>>>>>>>>> mappings would affect disconnected variables, I think.
>>>>>>>>>>
>>>>>>>>>> bottom-up is the safest (the fresh variables must be fresh across all
>>>>>>>>>> the
>>>>>>>>>> renaming going on) and easiest to explain.
>>>>>>>>>>
>>>>>>>>>> The requirement is top-down one : to rename at first SELECT down every
>>>>>>>>>> branch
>>>>>>>>>> of the expression tree where the variable is hidden.  If done top-down,
>>>>>>>>>> it is
>>>>>>>>>> renamed once.
>>>>>>>>>>
>>>>>>>>>> It is minimal renaming if done for each variable of scope of the
>>>>>>>>>> current row
>>>>>>>>>> is considered separately. That makes it more complicated - it might be
>>>>>>>>>> worth
>>>>>>>>>> non-definitional text to say this but the more direct definition is a
>>>>>>>>>> bottom-up walk; even left-to-right, bottom-up to give a unique walk
>>>>>>>>>> order.
>>>>>>>>>>
>>>>>>>>>>> Of course, doing only this part doesn't solve the major problem.
>>>>>>>>>>
>>>>>>>>>> I'll leave the SHACL specific comments to the SHACL WG and comments
>>>>>>>>>> list.
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>     Andy
>>>>>>>>>>>>
>>>>>>>>>>>> [1]
>>>>>>>>>>>> https://lists.w3.org/Archives/Public/public-rdf-shapes/2017Jan/0010.html
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> It seems so obvious that the choice of variables does not matter, but
>>>>>>>>>>> thinking
>>>>>>>>>>> about how to demonstrate that this is so leads to lots of tricky bits.
>>>>>>>>>>>
>>>>>>>>>>> peter
>>>>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
Received on Sunday, 26 February 2017 17:00:47 UTC