Re: Related comments from Peter F. Patel-Schneider on 2017-02-22 (public-sparql-exists@w3.org from February 2017)

From: Peter F. Patel-Schneider <pfpschneider@gmail.com>
Date: Wed, 22 Feb 2017 12:27:23 -0800
To: Andy Seaborne <andy@apache.org>, public-sparql-exists@w3.org
Message-ID: <c871f02f-3d95-00c2-e07d-a6d1d743e74a@gmail.com>
All I am asking for is a real demonstration that inserting fresh variables is
a benign operation and that the choice of which fresh variable to use is not
significant.  I view this demonstration as necessary for this proposal.  As
the results of a SPARQL query do depend on the variable names in it, the
demonstration is going to be different from the standard demonstration that
renaming variables in a regular programming language does not change the
result of a program.

This is not my proposal.  I view it as inferior partly because it does depend
on this demonstration.  I also view it as inferior as it does not meet the
goal of not changing the meaning of SPARQL queries that are completely
unproblematic with the current definition of EXISTS.  For example, it
introduces a shared variable to both sides of MINUS constructs, potentially
changing their meaning.

I am not willing to further help with this proposal.

Peter F. Patel-Schneider
Nuance Communications


On 02/14/2017 06:33 AM, Andy Seaborne wrote:
> Would you like to draft that?
> 
> I'm able to provide text that explains this I don't think I can to the level
> you seem to be asking for.
> 
> It is aliasing and follows from (1) replacing one named variable in an algebra
> expression, which is a data structure, with a fresh variable, (2) evaluation
> can not introduce named variables and (3) it's a project.
> 
> For explanation, saying that the project does not expose the unprojected
> variables, so the evaluation has the same results after renaming unprojected
> variables. Much of this come down to "fresh variable" i.e. no clash with an
> existing one.
> 
>     Andy
> 
> On 11/02/17 21:28, Peter F. Patel-Schneider wrote:
>> I don't know of any particular problem areas.  If I did, I would say so.
>>
>> I also don't know that there are no problem areas.  That's what the
>> demonstration is for  - to show that mapping to fresh variables doesn't affect
>> anything that it shouoldn't.
>>
>> It looks as if the parts may all be in place.  The next step is to write it
>> all down in a formal proof.   It may be that writing everything down formally
>> will expose some missing bits.
>>
>> peter
>>
>>
>> On 02/11/2017 07:36 AM, Andy Seaborne wrote:
>>>
>>> On 10/02/17 13:21, Peter F. Patel-Schneider wrote:
>>>> I think that you have to show that no form is sensitive to the variable name
>>>> in any other way, for example removing a solution because of some interaction
>>>> between a variable name and anything else.
>>>
>>> There is no way to get the name of a variable in expression evaluation.
>>>
>>> Nor is there a way to introduce a variable during execution except PrjMap when
>>> all variables are "fresh" - so making the "fresh" clear it means as you go
>>> along, not a static choice at the start of PrjMap (which I have already said
>>> is a good thing to say) means there is no clash.
>>>
>>> There is no quoting variables and no eval()-like functionality to create
>>> expressions during evaluation either.
>>>
>>>> And now that I think of it, I don't think that the analysis below is adequate
>>>> as it doesn't show that the variable name doesn't somehow end up in the value
>>>> side of the binding.
>>>
>>> The value side of a binding is an RDFTerm or an error. Named variables are not
>>> RDF terms.  A SPARQL function can not return a variable or expression, nor can
>>> the pattern binding operations create bindings from variable to a variable or
>>> expression.
>>>
>>> Peter - if you know of problem areas, can we have an example please?
>>>
>>>     Andy
>>>
>>>>
>>>> peter
>>>>
>>>>
>>>> On 02/10/2017 04:21 AM, Andy Seaborne wrote:
>>>>>
>>>>>
>>>>> On 09/02/17 15:15, Peter F. Patel-Schneider wrote:
>>>>>> OK, that's another piece of the puzzle.
>>>>>>
>>>>>> Any more pieces done?
>>>>>
>>>>> Are you referring to SHACL?
>>>>>
>>>>> For EXISTS, both your comments have answers.
>>>>>
>>>>>     Andy
>>>>>
>>>>>>
>>>>>> peter
>>>>>>
>>>>>> On 02/09/2017 06:08 AM, Andy Seaborne wrote:
>>>>>>> There are four forms that create bindings, all the rest recombine
>>>>>>> bindings in
>>>>>>> solution mapping/sequences into other solution mapping/sequences but not
>>>>>>> create or modify bindings.
>>>>>>>
>>>>>>>
>>>>>>>     Basic Graph Pattern Matching
>>>>>>>     Property Path Patterns
>>>>>>>     GRAPH ?variable
>>>>>>>     AS
>>>>>>>
>>>>>>> All of these only create bindings according to the variable names used in
>>>>>>> their form and do not introduce a free choice of by name.
>>>>>>>
>>>>>>> A systematic renaming therefore changes the form and it's outcome.
>>>>>>>
>>>>>>>    eval(rename_algebra(X)) is equivalent to  rename_solution(eval(X))
>>>>>>>
>>>>>>>     Andy
>>>>>>>
>>>>>>> On 08/02/17 18:20, Peter F. Patel-Schneider wrote:
>>>>>>>> So the idea appears to be to show that the only effect of variable
>>>>>>>> replacement
>>>>>>>> in an algebra expression is to systematically change the variable in the
>>>>>>>> resulting solution sequence.  Now this hypothesis has to be turned
>>>>>>>> into an
>>>>>>>> inductive hypothesis and proven to be true for all of the SPARQL
>>>>>>>> algebra.  If
>>>>>>>> this can be done, then it is easy to show the PrjMap(P,PV) is fine
>>>>>>>> because
>>>>>>>> the
>>>>>>>> variable doesn't show up in the resulting solution sequence.
>>>>>>>>
>>>>>>>> peter
>>>>>>>>
>>>>>>>> On 02/07/2017 03:30 AM, Andy Seaborne wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 06/02/17 11:48, Peter F. Patel-Schneider wrote:
>>>>>>>>>> On 02/06/2017 02:21 AM, Andy Seaborne wrote:
>>>>>>>>>>> Peter has some comments on the SHACL comments list that relate to
>>>>>>>>>>> EXISTS.
>>>>>>>>>>> [1]
>>>>>>>>>>>
>>>>>>>>>>>> There is no demonstration that the choice of fresh variables in the
>>>>>>>>>>>> definition of PrjMap(P,PV) is insignificant.
>>>>>>>>>>>
>>>>>>>>>>> I hope we can explain in the document to clarify this, but I'm not
>>>>>>>>>>> clear
>>>>>>>>>>> what
>>>>>>>>>>> you are looking for.
>>>>>>>>>>
>>>>>>>>>> That the result of the evaluation doesn't depend on the choice of free
>>>>>>>>>> variables.
>>>>>>>>>>
>>>>>>>>>>> What would constitute such a demonstration?
>>>>>>>>>>
>>>>>>>>>> That's a good question.  When I first noticed this problem I was
>>>>>>>>>> thinking
>>>>>>>>>> that
>>>>>>>>>> this was just a t that hadn't been dotted, but I'm really not sure how
>>>>>>>>>> to go
>>>>>>>>>> about showing that the choice doesn't matter.
>>>>>>>>>>
>>>>>>>>>>> Do you have a example where it is significant?
>>>>>>>>>>
>>>>>>>>>> No, at least not yet?  Do you have a demonstration that there are none?
>>>>>>>>>
>>>>>>>>> An explanation:
>>>>>>>>>
>>>>>>>>> Evaluation of a projection results in a solution sequence that can
>>>>>>>>> contains
>>>>>>>>> only variables of the projection and no others.
>>>>>>>>>
>>>>>>>>> For any algebra expression, replacing a variable systematically with a
>>>>>>>>> fresh
>>>>>>>>> variable has a visible effect as a change in the solution sequence
>>>>>>>>> binding for
>>>>>>>>> that variable.
>>>>>>>>>
>>>>>>>>> You can't find the name of a variable during the evaluation of a SPARQL
>>>>>>>>> query
>>>>>>>>> (because graph patterns and solution modifiers are not available as
>>>>>>>>> datastructures to access). It's call-by-value and even the special forms
>>>>>>>>> like
>>>>>>>>> IF, and COALESCE don't expose the variable name because they only change
>>>>>>>>> when
>>>>>>>>> arguments are evaluated, not pass down the argument expression itself.
>>>>>>>>>
>>>>>>>>> ((
>>>>>>>>> The best I can think of is expressions that associate a value with a
>>>>>>>>> variable
>>>>>>>>> like
>>>>>>>>>
>>>>>>>>>     IF ( bound(?x) , "x", "not x")
>>>>>>>>>
>>>>>>>>> but that's more of an alias, not the variable name itself. Renaming
>>>>>>>>> ?x is
>>>>>>>>> not
>>>>>>>>> observable and the alias is unchanged.
>>>>>>>>> ))
>>>>>>>>>
>>>>>>>>> For a projection, one can rename the unprojected variables of the
>>>>>>>>> expression
>>>>>>>>> over which the project operates because the renaming changes the
>>>>>>>>> solution
>>>>>>>>> sequence before projection only on variables that projection does not
>>>>>>>>> expose.
>>>>>>>>>
>>>>>>>>> It is not visible in the solution sequence result of the projection.
>>>>>>>>>
>>>>>>>>> Another way of thinking about it is that the binding due to unprojected
>>>>>>>>> variables are not accessible to operations that use the result of the
>>>>>>>>> projection.
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>> The result of PrjMap(X) depends on the order in which the projections
>>>>>>>>>>>> in X are chosen, but this order is not specified.
>>>>>>>>>>>
>>>>>>>>>>> Yes - it would be better to define the order and the outcome is order
>>>>>>>>>>> dependent with respect to replaced variables but does it make a
>>>>>>>>>>> difference? It
>>>>>>>>>>> is only variables restricted by scope that are changed.
>>>>>>>>>>
>>>>>>>>>> The mappings reach down into sub-expressions and change disconnected
>>>>>>>>>> variables
>>>>>>>>>> there so they violate the scoping of SPARQL.
>>>>>>>>>
>>>>>>>>> In fact, there is a design choice here - either choice is workable, both
>>>>>>>>> have
>>>>>>>>> use cases for different audiences. It's not a technical issue - it's a
>>>>>>>>> judgement.
>>>>>>>>>
>>>>>>>>> The other design is one where there is no remapping variables and
>>>>>>>>> then the
>>>>>>>>> EXISTS insertion of the current row would affect the disconnected
>>>>>>>>> variables.
>>>>>>>>>
>>>>>>>>> It violates the property of SPARQL evaluation that renaming inside
>>>>>>>>> project of
>>>>>>>>> disconnected variables does not matter anywhere else. Optimizers and
>>>>>>>>> parallel
>>>>>>>>> execution exploit that property. (I got a related question from someone
>>>>>>>>> about
>>>>>>>>> this last week - they are implementing some kind of optimized
>>>>>>>>> evaluation and
>>>>>>>>> wanted to discuss the details.)
>>>>>>>>>
>>>>>>>>>>> Do you have a case where it makes an observable difference?
>>>>>>>>>>
>>>>>>>>>> NO, at least not yet.  Do you have a demonstration that there are none?
>>>>>>>>>>
>>>>>>>>>>> Would a bottom-up replacement be suitable?
>>>>>>>>>>
>>>>>>>>>> I think so.  If you fixed the free variables for all the mappings
>>>>>>>>>> then I
>>>>>>>>>> think
>>>>>>>>>> that a bottom-up replacement schedule would produce a unique result.
>>>>>>>>>> This
>>>>>>>>>> remains to be demonstrated but shouldn't be too hard. As well, none of
>>>>>>>>>> the
>>>>>>>>>> mappings would affect disconnected variables, I think.
>>>>>>>>>
>>>>>>>>> bottom-up is the safest (the fresh variables must be fresh across all
>>>>>>>>> the
>>>>>>>>> renaming going on) and easiest to explain.
>>>>>>>>>
>>>>>>>>> The requirement is top-down one : to rename at first SELECT down every
>>>>>>>>> branch
>>>>>>>>> of the expression tree where the variable is hidden.  If done top-down,
>>>>>>>>> it is
>>>>>>>>> renamed once.
>>>>>>>>>
>>>>>>>>> It is minimal renaming if done for each variable of scope of the
>>>>>>>>> current row
>>>>>>>>> is considered separately. That makes it more complicated - it might be
>>>>>>>>> worth
>>>>>>>>> non-definitional text to say this but the more direct definition is a
>>>>>>>>> bottom-up walk; even left-to-right, bottom-up to give a unique walk
>>>>>>>>> order.
>>>>>>>>>
>>>>>>>>>> Of course, doing only this part doesn't solve the major problem.
>>>>>>>>>
>>>>>>>>> I'll leave the SHACL specific comments to the SHACL WG and comments
>>>>>>>>> list.
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>     Andy
>>>>>>>>>>>
>>>>>>>>>>> [1]
>>>>>>>>>>> https://lists.w3.org/Archives/Public/public-rdf-shapes/2017Jan/0010.html
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> It seems so obvious that the choice of variables does not matter, but
>>>>>>>>>> thinking
>>>>>>>>>> about how to demonstrate that this is so leads to lots of tricky bits.
>>>>>>>>>>
>>>>>>>>>> peter
>>>>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
Received on Wednesday, 22 February 2017 20:28:02 UTC