- From: Andy Seaborne <andy@apache.org>
- Date: Sun, 26 Feb 2017 16:32:25 +0000
- To: "Peter F. Patel-Schneider" <pfpschneider@gmail.com>, public-sparql-exists@w3.org
With links this time. On 22/02/17 20:27, Peter F. Patel-Schneider wrote: > All I am asking for is a real demonstration that inserting fresh variables is > a benign operation and that the choice of which fresh variable to use is not > significant. I view this demonstration as necessary for this proposal. I have said I will add text to explain the process - I haven't had time to do that yet. I don't know what proof would satisfy you, particularly what can be assumed to be already known, and I don't want to write one only to have it rejected completely. > As > the results of a SPARQL query do depend on the variable names in it, the > demonstration is going to be different from the standard demonstration that > renaming variables in a regular programming language does not change the > result of a program. In this thread we have that: * The text for replacement needs to say that fresh variables are unique through later replacements as well * The replacement process should be done bottom-up * The name of a variable is not accessible to expressions. [1] * Replacing a variable happens in only 4 places [2]. The name makes the name in the solutions change. So a name change of a variable affects the solutions as a rename (c.f. relational algebra rename operation). When applied to a projection, renaming unprojected variables is not observable. > This is not my proposal. I view it as inferior partly because it does depend > on this demonstration. I also view it as inferior as it does not meet the > goal of not changing the meaning of SPARQL queries that are completely > unproblematic with the current definition of EXISTS. For example, it > introduces a shared variable to both sides of MINUS constructs, potentially > changing their meaning. Both proposals make changes around MINUS from what the SPARQL 1.1 spec defines - this is issue 4. Proposal-A changes GRAPH, UNION (and MINUS) queries [3] so that existing EXISTS uses are changed. This is addressed by proposal-B by making the current row available throughout the EXISTS pattern evaluation. Andy [1] https://lists.w3.org/Archives/Public/public-sparql-exists/2017Feb/0002.html [2] https://lists.w3.org/Archives/Public/public-sparql-exists/2017Feb/0004.html https://lists.w3.org/Archives/Public/public-sparql-exists/2017Feb/0005.html [3] https://lists.w3.org/Archives/Public/public-sparql-exists/2016Sep/0027.html and examples https://lists.w3.org/Archives/Public/public-sparql-exists/2016Nov/0026.html > > I am not willing to further help with this proposal. > > Peter F. Patel-Schneider > Nuance Communications > > > On 02/14/2017 06:33 AM, Andy Seaborne wrote: >> Would you like to draft that? >> >> I'm able to provide text that explains this I don't think I can to the level >> you seem to be asking for. >> >> It is aliasing and follows from (1) replacing one named variable in an algebra >> expression, which is a data structure, with a fresh variable, (2) evaluation >> can not introduce named variables and (3) it's a project. >> >> For explanation, saying that the project does not expose the unprojected >> variables, so the evaluation has the same results after renaming unprojected >> variables. Much of this come down to "fresh variable" i.e. no clash with an >> existing one. >> >> Andy >> >> On 11/02/17 21:28, Peter F. Patel-Schneider wrote: >>> I don't know of any particular problem areas. If I did, I would say so. >>> >>> I also don't know that there are no problem areas. That's what the >>> demonstration is for - to show that mapping to fresh variables doesn't affect >>> anything that it shouoldn't. >>> >>> It looks as if the parts may all be in place. The next step is to write it >>> all down in a formal proof. It may be that writing everything down formally >>> will expose some missing bits. >>> >>> peter >>> >>> >>> On 02/11/2017 07:36 AM, Andy Seaborne wrote: >>>> >>>> On 10/02/17 13:21, Peter F. Patel-Schneider wrote: >>>>> I think that you have to show that no form is sensitive to the variable name >>>>> in any other way, for example removing a solution because of some interaction >>>>> between a variable name and anything else. >>>> >>>> There is no way to get the name of a variable in expression evaluation. >>>> >>>> Nor is there a way to introduce a variable during execution except PrjMap when >>>> all variables are "fresh" - so making the "fresh" clear it means as you go >>>> along, not a static choice at the start of PrjMap (which I have already said >>>> is a good thing to say) means there is no clash. >>>> >>>> There is no quoting variables and no eval()-like functionality to create >>>> expressions during evaluation either. >>>> >>>>> And now that I think of it, I don't think that the analysis below is adequate >>>>> as it doesn't show that the variable name doesn't somehow end up in the value >>>>> side of the binding. >>>> >>>> The value side of a binding is an RDFTerm or an error. Named variables are not >>>> RDF terms. A SPARQL function can not return a variable or expression, nor can >>>> the pattern binding operations create bindings from variable to a variable or >>>> expression. >>>> >>>> Peter - if you know of problem areas, can we have an example please? >>>> >>>> Andy >>>> >>>>> >>>>> peter >>>>> >>>>> >>>>> On 02/10/2017 04:21 AM, Andy Seaborne wrote: >>>>>> >>>>>> >>>>>> On 09/02/17 15:15, Peter F. Patel-Schneider wrote: >>>>>>> OK, that's another piece of the puzzle. >>>>>>> >>>>>>> Any more pieces done? >>>>>> >>>>>> Are you referring to SHACL? >>>>>> >>>>>> For EXISTS, both your comments have answers. >>>>>> >>>>>> Andy >>>>>> >>>>>>> >>>>>>> peter >>>>>>> >>>>>>> On 02/09/2017 06:08 AM, Andy Seaborne wrote: >>>>>>>> There are four forms that create bindings, all the rest recombine >>>>>>>> bindings in >>>>>>>> solution mapping/sequences into other solution mapping/sequences but not >>>>>>>> create or modify bindings. >>>>>>>> >>>>>>>> >>>>>>>> Basic Graph Pattern Matching >>>>>>>> Property Path Patterns >>>>>>>> GRAPH ?variable >>>>>>>> AS >>>>>>>> >>>>>>>> All of these only create bindings according to the variable names used in >>>>>>>> their form and do not introduce a free choice of by name. >>>>>>>> >>>>>>>> A systematic renaming therefore changes the form and it's outcome. >>>>>>>> >>>>>>>> eval(rename_algebra(X)) is equivalent to rename_solution(eval(X)) >>>>>>>> >>>>>>>> Andy >>>>>>>> >>>>>>>> On 08/02/17 18:20, Peter F. Patel-Schneider wrote: >>>>>>>>> So the idea appears to be to show that the only effect of variable >>>>>>>>> replacement >>>>>>>>> in an algebra expression is to systematically change the variable in the >>>>>>>>> resulting solution sequence. Now this hypothesis has to be turned >>>>>>>>> into an >>>>>>>>> inductive hypothesis and proven to be true for all of the SPARQL >>>>>>>>> algebra. If >>>>>>>>> this can be done, then it is easy to show the PrjMap(P,PV) is fine >>>>>>>>> because >>>>>>>>> the >>>>>>>>> variable doesn't show up in the resulting solution sequence. >>>>>>>>> >>>>>>>>> peter >>>>>>>>> >>>>>>>>> On 02/07/2017 03:30 AM, Andy Seaborne wrote: >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On 06/02/17 11:48, Peter F. Patel-Schneider wrote: >>>>>>>>>>> On 02/06/2017 02:21 AM, Andy Seaborne wrote: >>>>>>>>>>>> Peter has some comments on the SHACL comments list that relate to >>>>>>>>>>>> EXISTS. >>>>>>>>>>>> [1] >>>>>>>>>>>> >>>>>>>>>>>>> There is no demonstration that the choice of fresh variables in the >>>>>>>>>>>>> definition of PrjMap(P,PV) is insignificant. >>>>>>>>>>>> >>>>>>>>>>>> I hope we can explain in the document to clarify this, but I'm not >>>>>>>>>>>> clear >>>>>>>>>>>> what >>>>>>>>>>>> you are looking for. >>>>>>>>>>> >>>>>>>>>>> That the result of the evaluation doesn't depend on the choice of free >>>>>>>>>>> variables. >>>>>>>>>>> >>>>>>>>>>>> What would constitute such a demonstration? >>>>>>>>>>> >>>>>>>>>>> That's a good question. When I first noticed this problem I was >>>>>>>>>>> thinking >>>>>>>>>>> that >>>>>>>>>>> this was just a t that hadn't been dotted, but I'm really not sure how >>>>>>>>>>> to go >>>>>>>>>>> about showing that the choice doesn't matter. >>>>>>>>>>> >>>>>>>>>>>> Do you have a example where it is significant? >>>>>>>>>>> >>>>>>>>>>> No, at least not yet? Do you have a demonstration that there are none? >>>>>>>>>> >>>>>>>>>> An explanation: >>>>>>>>>> >>>>>>>>>> Evaluation of a projection results in a solution sequence that can >>>>>>>>>> contains >>>>>>>>>> only variables of the projection and no others. >>>>>>>>>> >>>>>>>>>> For any algebra expression, replacing a variable systematically with a >>>>>>>>>> fresh >>>>>>>>>> variable has a visible effect as a change in the solution sequence >>>>>>>>>> binding for >>>>>>>>>> that variable. >>>>>>>>>> >>>>>>>>>> You can't find the name of a variable during the evaluation of a SPARQL >>>>>>>>>> query >>>>>>>>>> (because graph patterns and solution modifiers are not available as >>>>>>>>>> datastructures to access). It's call-by-value and even the special forms >>>>>>>>>> like >>>>>>>>>> IF, and COALESCE don't expose the variable name because they only change >>>>>>>>>> when >>>>>>>>>> arguments are evaluated, not pass down the argument expression itself. >>>>>>>>>> >>>>>>>>>> (( >>>>>>>>>> The best I can think of is expressions that associate a value with a >>>>>>>>>> variable >>>>>>>>>> like >>>>>>>>>> >>>>>>>>>> IF ( bound(?x) , "x", "not x") >>>>>>>>>> >>>>>>>>>> but that's more of an alias, not the variable name itself. Renaming >>>>>>>>>> ?x is >>>>>>>>>> not >>>>>>>>>> observable and the alias is unchanged. >>>>>>>>>> )) >>>>>>>>>> >>>>>>>>>> For a projection, one can rename the unprojected variables of the >>>>>>>>>> expression >>>>>>>>>> over which the project operates because the renaming changes the >>>>>>>>>> solution >>>>>>>>>> sequence before projection only on variables that projection does not >>>>>>>>>> expose. >>>>>>>>>> >>>>>>>>>> It is not visible in the solution sequence result of the projection. >>>>>>>>>> >>>>>>>>>> Another way of thinking about it is that the binding due to unprojected >>>>>>>>>> variables are not accessible to operations that use the result of the >>>>>>>>>> projection. >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>>> The result of PrjMap(X) depends on the order in which the projections >>>>>>>>>>>>> in X are chosen, but this order is not specified. >>>>>>>>>>>> >>>>>>>>>>>> Yes - it would be better to define the order and the outcome is order >>>>>>>>>>>> dependent with respect to replaced variables but does it make a >>>>>>>>>>>> difference? It >>>>>>>>>>>> is only variables restricted by scope that are changed. >>>>>>>>>>> >>>>>>>>>>> The mappings reach down into sub-expressions and change disconnected >>>>>>>>>>> variables >>>>>>>>>>> there so they violate the scoping of SPARQL. >>>>>>>>>> >>>>>>>>>> In fact, there is a design choice here - either choice is workable, both >>>>>>>>>> have >>>>>>>>>> use cases for different audiences. It's not a technical issue - it's a >>>>>>>>>> judgement. >>>>>>>>>> >>>>>>>>>> The other design is one where there is no remapping variables and >>>>>>>>>> then the >>>>>>>>>> EXISTS insertion of the current row would affect the disconnected >>>>>>>>>> variables. >>>>>>>>>> >>>>>>>>>> It violates the property of SPARQL evaluation that renaming inside >>>>>>>>>> project of >>>>>>>>>> disconnected variables does not matter anywhere else. Optimizers and >>>>>>>>>> parallel >>>>>>>>>> execution exploit that property. (I got a related question from someone >>>>>>>>>> about >>>>>>>>>> this last week - they are implementing some kind of optimized >>>>>>>>>> evaluation and >>>>>>>>>> wanted to discuss the details.) >>>>>>>>>> >>>>>>>>>>>> Do you have a case where it makes an observable difference? >>>>>>>>>>> >>>>>>>>>>> NO, at least not yet. Do you have a demonstration that there are none? >>>>>>>>>>> >>>>>>>>>>>> Would a bottom-up replacement be suitable? >>>>>>>>>>> >>>>>>>>>>> I think so. If you fixed the free variables for all the mappings >>>>>>>>>>> then I >>>>>>>>>>> think >>>>>>>>>>> that a bottom-up replacement schedule would produce a unique result. >>>>>>>>>>> This >>>>>>>>>>> remains to be demonstrated but shouldn't be too hard. As well, none of >>>>>>>>>>> the >>>>>>>>>>> mappings would affect disconnected variables, I think. >>>>>>>>>> >>>>>>>>>> bottom-up is the safest (the fresh variables must be fresh across all >>>>>>>>>> the >>>>>>>>>> renaming going on) and easiest to explain. >>>>>>>>>> >>>>>>>>>> The requirement is top-down one : to rename at first SELECT down every >>>>>>>>>> branch >>>>>>>>>> of the expression tree where the variable is hidden. If done top-down, >>>>>>>>>> it is >>>>>>>>>> renamed once. >>>>>>>>>> >>>>>>>>>> It is minimal renaming if done for each variable of scope of the >>>>>>>>>> current row >>>>>>>>>> is considered separately. That makes it more complicated - it might be >>>>>>>>>> worth >>>>>>>>>> non-definitional text to say this but the more direct definition is a >>>>>>>>>> bottom-up walk; even left-to-right, bottom-up to give a unique walk >>>>>>>>>> order. >>>>>>>>>> >>>>>>>>>>> Of course, doing only this part doesn't solve the major problem. >>>>>>>>>> >>>>>>>>>> I'll leave the SHACL specific comments to the SHACL WG and comments >>>>>>>>>> list. >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> Andy >>>>>>>>>>>> >>>>>>>>>>>> [1] >>>>>>>>>>>> https://lists.w3.org/Archives/Public/public-rdf-shapes/2017Jan/0010.html >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> It seems so obvious that the choice of variables does not matter, but >>>>>>>>>>> thinking >>>>>>>>>>> about how to demonstrate that this is so leads to lots of tricky bits. >>>>>>>>>>> >>>>>>>>>>> peter >>>>>>>>>>> >>>>>>> >>>>>> >>>>> >>>>
Received on Sunday, 26 February 2017 17:00:47 UTC