Re: Related comments from Peter F. Patel-Schneider on 2017-02-26 (public-sparql-exists@w3.org from February 2017)

From: Peter F. Patel-Schneider <pfpschneider@gmail.com>
Date: Sun, 26 Feb 2017 12:16:19 -0800
To: Andy Seaborne <andy@apache.org>, public-sparql-exists@w3.org
Message-ID: <769b6a8b-7021-f05e-8dff-b1d72e404e7f@gmail.com>
On 02/26/2017 08:32 AM, Andy Seaborne wrote:
> With links this time.
> 
> On 22/02/17 20:27, Peter F. Patel-Schneider wrote:
>> All I am asking for is a real demonstration that inserting fresh variables is
>> a benign operation and that the choice of which fresh variable to use is not
>> significant.  I view this demonstration as necessary for this proposal.
> 
> I have said I will add text to explain the process - I haven't had time to do
> that yet.  I don't know what proof would satisfy you, particularly what can be
> assumed to be already known, and I don't want to write one only to have it
> rejected completely.

Well, that's a problem then.  I do sympathize, having been in the same
situation more than once.  However, you just have to put the work in an see
what happens.  I do think that things will work, or at least I did a couple of
days ago.

However, I noticed that the definition of the F function isn't going to work.
F appears to be defined on all variable, so there are no fresh variables
available.

So something has to be done to fix up F.


>> As
>> the results of a SPARQL query do depend on the variable names in it, the
>> demonstration is going to be different from the standard demonstration that
>> renaming variables in a regular programming language does not change the
>> result of a program.
> 
> In this thread we have that:
> 
> * The text for replacement needs to say that fresh variables are unique
> through later replacements as well
> 
> * The replacement process should be done bottom-up
> 
> * The name of a variable is not accessible to expressions. [1]
> 
> * Replacing a variable happens in only 4 places [2].
>   The name makes the name in the solutions change.
> 
> So a name change of a variable affects the solutions as a rename (c.f.
> relational algebra rename operation).
> 
> When applied to a projection, renaming unprojected variables is not observable.

Yes, I agree that this all looks OK so far.  However, again, I've been in this
situation before and when I wrote everything up, crossing all the i's and
dotting all the t's, I noticed that there were some holes that I had to patch.
 Fortunately this was doable.

What you appear to be asking for is my unconditional blessing in advance.
Sorry, but that's not going to be forthcoming.

peter


>> This is not my proposal.  I view it as inferior partly because it does depend
>> on this demonstration.  I also view it as inferior as it does not meet the
>> goal of not changing the meaning of SPARQL queries that are completely
>> unproblematic with the current definition of EXISTS.  For example, it
>> introduces a shared variable to both sides of MINUS constructs, potentially
>> changing their meaning.
> 
> Both proposals make changes around MINUS from what the SPARQL 1.1 spec defines
> - this is issue 4.
> 
> Proposal-A changes GRAPH, UNION (and MINUS) queries [3] so that existing
> EXISTS uses are changed. This is addressed by proposal-B by making the current
> row available throughout the EXISTS pattern evaluation.
> 
>     Andy
> 
> 
> [1]
> https://lists.w3.org/Archives/Public/public-sparql-exists/2017Feb/0002.html
> 
> [2]
> https://lists.w3.org/Archives/Public/public-sparql-exists/2017Feb/0004.html
> https://lists.w3.org/Archives/Public/public-sparql-exists/2017Feb/0005.html
> 
> [3]
> https://lists.w3.org/Archives/Public/public-sparql-exists/2016Sep/0027.html
> and examples
> https://lists.w3.org/Archives/Public/public-sparql-exists/2016Nov/0026.html
> 
> 
> 
> 
>>
>> I am not willing to further help with this proposal.
>>
>> Peter F. Patel-Schneider
>> Nuance Communications
>>
>>
>> On 02/14/2017 06:33 AM, Andy Seaborne wrote:
>>> Would you like to draft that?
>>>
>>> I'm able to provide text that explains this I don't think I can to the level
>>> you seem to be asking for.
>>>
>>> It is aliasing and follows from (1) replacing one named variable in an algebra
>>> expression, which is a data structure, with a fresh variable, (2) evaluation
>>> can not introduce named variables and (3) it's a project.
>>>
>>> For explanation, saying that the project does not expose the unprojected
>>> variables, so the evaluation has the same results after renaming unprojected
>>> variables. Much of this come down to "fresh variable" i.e. no clash with an
>>> existing one.
>>>
>>>     Andy
>>>
>>> On 11/02/17 21:28, Peter F. Patel-Schneider wrote:
>>>> I don't know of any particular problem areas.  If I did, I would say so.
>>>>
>>>> I also don't know that there are no problem areas.  That's what the
>>>> demonstration is for  - to show that mapping to fresh variables doesn't
>>>> affect
>>>> anything that it shouoldn't.
>>>>
>>>> It looks as if the parts may all be in place.  The next step is to write it
>>>> all down in a formal proof.   It may be that writing everything down formally
>>>> will expose some missing bits.
>>>>
>>>> peter
>>>>
>>>>
>>>> On 02/11/2017 07:36 AM, Andy Seaborne wrote:
>>>>>
>>>>> On 10/02/17 13:21, Peter F. Patel-Schneider wrote:
>>>>>> I think that you have to show that no form is sensitive to the variable
>>>>>> name
>>>>>> in any other way, for example removing a solution because of some
>>>>>> interaction
>>>>>> between a variable name and anything else.
>>>>>
>>>>> There is no way to get the name of a variable in expression evaluation.
>>>>>
>>>>> Nor is there a way to introduce a variable during execution except PrjMap
>>>>> when
>>>>> all variables are "fresh" - so making the "fresh" clear it means as you go
>>>>> along, not a static choice at the start of PrjMap (which I have already said
>>>>> is a good thing to say) means there is no clash.
>>>>>
>>>>> There is no quoting variables and no eval()-like functionality to create
>>>>> expressions during evaluation either.
>>>>>
>>>>>> And now that I think of it, I don't think that the analysis below is
>>>>>> adequate
>>>>>> as it doesn't show that the variable name doesn't somehow end up in the
>>>>>> value
>>>>>> side of the binding.
>>>>>
>>>>> The value side of a binding is an RDFTerm or an error. Named variables
>>>>> are not
>>>>> RDF terms.  A SPARQL function can not return a variable or expression,
>>>>> nor can
>>>>> the pattern binding operations create bindings from variable to a
>>>>> variable or
>>>>> expression.
>>>>>
>>>>> Peter - if you know of problem areas, can we have an example please?
>>>>>
>>>>>     Andy
>>>>>
>>>>>>
>>>>>> peter
>>>>>>
>>>>>>
>>>>>> On 02/10/2017 04:21 AM, Andy Seaborne wrote:
>>>>>>>
>>>>>>>
>>>>>>> On 09/02/17 15:15, Peter F. Patel-Schneider wrote:
>>>>>>>> OK, that's another piece of the puzzle.
>>>>>>>>
>>>>>>>> Any more pieces done?
>>>>>>>
>>>>>>> Are you referring to SHACL?
>>>>>>>
>>>>>>> For EXISTS, both your comments have answers.
>>>>>>>
>>>>>>>     Andy
>>>>>>>
>>>>>>>>
>>>>>>>> peter
>>>>>>>>
>>>>>>>> On 02/09/2017 06:08 AM, Andy Seaborne wrote:
>>>>>>>>> There are four forms that create bindings, all the rest recombine
>>>>>>>>> bindings in
>>>>>>>>> solution mapping/sequences into other solution mapping/sequences but not
>>>>>>>>> create or modify bindings.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>     Basic Graph Pattern Matching
>>>>>>>>>     Property Path Patterns
>>>>>>>>>     GRAPH ?variable
>>>>>>>>>     AS
>>>>>>>>>
>>>>>>>>> All of these only create bindings according to the variable names
>>>>>>>>> used in
>>>>>>>>> their form and do not introduce a free choice of by name.
>>>>>>>>>
>>>>>>>>> A systematic renaming therefore changes the form and it's outcome.
>>>>>>>>>
>>>>>>>>>    eval(rename_algebra(X)) is equivalent to  rename_solution(eval(X))
>>>>>>>>>
>>>>>>>>>     Andy
>>>>>>>>>
>>>>>>>>> On 08/02/17 18:20, Peter F. Patel-Schneider wrote:
>>>>>>>>>> So the idea appears to be to show that the only effect of variable
>>>>>>>>>> replacement
>>>>>>>>>> in an algebra expression is to systematically change the variable in
>>>>>>>>>> the
>>>>>>>>>> resulting solution sequence.  Now this hypothesis has to be turned
>>>>>>>>>> into an
>>>>>>>>>> inductive hypothesis and proven to be true for all of the SPARQL
>>>>>>>>>> algebra.  If
>>>>>>>>>> this can be done, then it is easy to show the PrjMap(P,PV) is fine
>>>>>>>>>> because
>>>>>>>>>> the
>>>>>>>>>> variable doesn't show up in the resulting solution sequence.
>>>>>>>>>>
>>>>>>>>>> peter
>>>>>>>>>>
>>>>>>>>>> On 02/07/2017 03:30 AM, Andy Seaborne wrote:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On 06/02/17 11:48, Peter F. Patel-Schneider wrote:
>>>>>>>>>>>> On 02/06/2017 02:21 AM, Andy Seaborne wrote:
>>>>>>>>>>>>> Peter has some comments on the SHACL comments list that relate to
>>>>>>>>>>>>> EXISTS.
>>>>>>>>>>>>> [1]
>>>>>>>>>>>>>
>>>>>>>>>>>>>> There is no demonstration that the choice of fresh variables in the
>>>>>>>>>>>>>> definition of PrjMap(P,PV) is insignificant.
>>>>>>>>>>>>>
>>>>>>>>>>>>> I hope we can explain in the document to clarify this, but I'm not
>>>>>>>>>>>>> clear
>>>>>>>>>>>>> what
>>>>>>>>>>>>> you are looking for.
>>>>>>>>>>>>
>>>>>>>>>>>> That the result of the evaluation doesn't depend on the choice of
>>>>>>>>>>>> free
>>>>>>>>>>>> variables.
>>>>>>>>>>>>
>>>>>>>>>>>>> What would constitute such a demonstration?
>>>>>>>>>>>>
>>>>>>>>>>>> That's a good question.  When I first noticed this problem I was
>>>>>>>>>>>> thinking
>>>>>>>>>>>> that
>>>>>>>>>>>> this was just a t that hadn't been dotted, but I'm really not sure
>>>>>>>>>>>> how
>>>>>>>>>>>> to go
>>>>>>>>>>>> about showing that the choice doesn't matter.
>>>>>>>>>>>>
>>>>>>>>>>>>> Do you have a example where it is significant?
>>>>>>>>>>>>
>>>>>>>>>>>> No, at least not yet?  Do you have a demonstration that there are
>>>>>>>>>>>> none?
>>>>>>>>>>>
>>>>>>>>>>> An explanation:
>>>>>>>>>>>
>>>>>>>>>>> Evaluation of a projection results in a solution sequence that can
>>>>>>>>>>> contains
>>>>>>>>>>> only variables of the projection and no others.
>>>>>>>>>>>
>>>>>>>>>>> For any algebra expression, replacing a variable systematically with a
>>>>>>>>>>> fresh
>>>>>>>>>>> variable has a visible effect as a change in the solution sequence
>>>>>>>>>>> binding for
>>>>>>>>>>> that variable.
>>>>>>>>>>>
>>>>>>>>>>> You can't find the name of a variable during the evaluation of a
>>>>>>>>>>> SPARQL
>>>>>>>>>>> query
>>>>>>>>>>> (because graph patterns and solution modifiers are not available as
>>>>>>>>>>> datastructures to access). It's call-by-value and even the special
>>>>>>>>>>> forms
>>>>>>>>>>> like
>>>>>>>>>>> IF, and COALESCE don't expose the variable name because they only
>>>>>>>>>>> change
>>>>>>>>>>> when
>>>>>>>>>>> arguments are evaluated, not pass down the argument expression itself.
>>>>>>>>>>>
>>>>>>>>>>> ((
>>>>>>>>>>> The best I can think of is expressions that associate a value with a
>>>>>>>>>>> variable
>>>>>>>>>>> like
>>>>>>>>>>>
>>>>>>>>>>>     IF ( bound(?x) , "x", "not x")
>>>>>>>>>>>
>>>>>>>>>>> but that's more of an alias, not the variable name itself. Renaming
>>>>>>>>>>> ?x is
>>>>>>>>>>> not
>>>>>>>>>>> observable and the alias is unchanged.
>>>>>>>>>>> ))
>>>>>>>>>>>
>>>>>>>>>>> For a projection, one can rename the unprojected variables of the
>>>>>>>>>>> expression
>>>>>>>>>>> over which the project operates because the renaming changes the
>>>>>>>>>>> solution
>>>>>>>>>>> sequence before projection only on variables that projection does not
>>>>>>>>>>> expose.
>>>>>>>>>>>
>>>>>>>>>>> It is not visible in the solution sequence result of the projection.
>>>>>>>>>>>
>>>>>>>>>>> Another way of thinking about it is that the binding due to
>>>>>>>>>>> unprojected
>>>>>>>>>>> variables are not accessible to operations that use the result of the
>>>>>>>>>>> projection.
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>>> The result of PrjMap(X) depends on the order in which the
>>>>>>>>>>>>>> projections
>>>>>>>>>>>>>> in X are chosen, but this order is not specified.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Yes - it would be better to define the order and the outcome is
>>>>>>>>>>>>> order
>>>>>>>>>>>>> dependent with respect to replaced variables but does it make a
>>>>>>>>>>>>> difference? It
>>>>>>>>>>>>> is only variables restricted by scope that are changed.
>>>>>>>>>>>>
>>>>>>>>>>>> The mappings reach down into sub-expressions and change disconnected
>>>>>>>>>>>> variables
>>>>>>>>>>>> there so they violate the scoping of SPARQL.
>>>>>>>>>>>
>>>>>>>>>>> In fact, there is a design choice here - either choice is workable,
>>>>>>>>>>> both
>>>>>>>>>>> have
>>>>>>>>>>> use cases for different audiences. It's not a technical issue - it's a
>>>>>>>>>>> judgement.
>>>>>>>>>>>
>>>>>>>>>>> The other design is one where there is no remapping variables and
>>>>>>>>>>> then the
>>>>>>>>>>> EXISTS insertion of the current row would affect the disconnected
>>>>>>>>>>> variables.
>>>>>>>>>>>
>>>>>>>>>>> It violates the property of SPARQL evaluation that renaming inside
>>>>>>>>>>> project of
>>>>>>>>>>> disconnected variables does not matter anywhere else. Optimizers and
>>>>>>>>>>> parallel
>>>>>>>>>>> execution exploit that property. (I got a related question from
>>>>>>>>>>> someone
>>>>>>>>>>> about
>>>>>>>>>>> this last week - they are implementing some kind of optimized
>>>>>>>>>>> evaluation and
>>>>>>>>>>> wanted to discuss the details.)
>>>>>>>>>>>
>>>>>>>>>>>>> Do you have a case where it makes an observable difference?
>>>>>>>>>>>>
>>>>>>>>>>>> NO, at least not yet.  Do you have a demonstration that there are
>>>>>>>>>>>> none?
>>>>>>>>>>>>
>>>>>>>>>>>>> Would a bottom-up replacement be suitable?
>>>>>>>>>>>>
>>>>>>>>>>>> I think so.  If you fixed the free variables for all the mappings
>>>>>>>>>>>> then I
>>>>>>>>>>>> think
>>>>>>>>>>>> that a bottom-up replacement schedule would produce a unique result.
>>>>>>>>>>>> This
>>>>>>>>>>>> remains to be demonstrated but shouldn't be too hard. As well,
>>>>>>>>>>>> none of
>>>>>>>>>>>> the
>>>>>>>>>>>> mappings would affect disconnected variables, I think.
>>>>>>>>>>>
>>>>>>>>>>> bottom-up is the safest (the fresh variables must be fresh across all
>>>>>>>>>>> the
>>>>>>>>>>> renaming going on) and easiest to explain.
>>>>>>>>>>>
>>>>>>>>>>> The requirement is top-down one : to rename at first SELECT down every
>>>>>>>>>>> branch
>>>>>>>>>>> of the expression tree where the variable is hidden.  If done
>>>>>>>>>>> top-down,
>>>>>>>>>>> it is
>>>>>>>>>>> renamed once.
>>>>>>>>>>>
>>>>>>>>>>> It is minimal renaming if done for each variable of scope of the
>>>>>>>>>>> current row
>>>>>>>>>>> is considered separately. That makes it more complicated - it might be
>>>>>>>>>>> worth
>>>>>>>>>>> non-definitional text to say this but the more direct definition is a
>>>>>>>>>>> bottom-up walk; even left-to-right, bottom-up to give a unique walk
>>>>>>>>>>> order.
>>>>>>>>>>>
>>>>>>>>>>>> Of course, doing only this part doesn't solve the major problem.
>>>>>>>>>>>
>>>>>>>>>>> I'll leave the SHACL specific comments to the SHACL WG and comments
>>>>>>>>>>> list.
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>>     Andy
>>>>>>>>>>>>>
>>>>>>>>>>>>> [1]
>>>>>>>>>>>>> https://lists.w3.org/Archives/Public/public-rdf-shapes/2017Jan/0010.html
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> It seems so obvious that the choice of variables does not matter, but
>>>>>>>>>>>> thinking
>>>>>>>>>>>> about how to demonstrate that this is so leads to lots of tricky
>>>>>>>>>>>> bits.
>>>>>>>>>>>>
>>>>>>>>>>>> peter
>>>>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
Received on Sunday, 26 February 2017 20:49:09 UTC