Re: can subqueries be executed first in SPARQL?

My summary of my position: The current definition of EXISTS in SPARQL is
broken bad.  Fixing EXISTS is not just a matter of clarifying and extending
its definition but has to instead modify the definition in a way that
changes the behaviour of SPARQL as currently defined.  This will legitimize
parts of current SPARQL implementations that are currently non-conformant
with the SPARQL definition and also delegitimize parts of current SPARQL
implementations that are currently supported by the SPARQL definition.

My summary of james's position: The current definition of EXISTS in SPARQL
is incomplete and needs to be extended in some places to implement SPARQL.
Fixing these parts of the definition of EXISTS just requires selecting a set
of extensions as the ones that will be part of its extended definition and
modifying parts of SPARQL implementations that have made extensions that do
not conform with this extended definition of EXISTS.


[I previously accidentally replied without copying the mailing list.  I've
tried to extract the relevant parts of the private exchange and am posting
this to the mailing list with James's permission, extending the conversation
with my response, which leads to the summary of our two positions above.]

On 06/17/2016 08:19 AM, james anderson wrote:
>
>> On 2016-06-17, at 16:23, Peter F. Patel-Schneider wrote:

[...]

>>> james
>>>> peter
>>>>> james

[...]

The discussion is about what the SPARQL 1.1 Query specification says is the
meaning of

>>>>> 1 SELECT ?x WHERE {
>>>>> 2    ?x a ?y .
>>>>> 3    FILTER EXISTS {
>>>>> 4      SELECT ?z WHERE { ?z ?w ?v . FILTER sameTerm(?x,?y) }
>>>>> 5    }
>>>>> 6  }

which is translated into the algebra structure:

>>>>> 1 (base <http://example/base/>
>>>>> 2   (project (?x)
>>>>> 3     (filter (exists
>>>>> 4                (project (?z)
>>>>> 5                  (filter (sameTerm ?x ?y)
>>>>> 6                    (bgp (triple ?z ?w ?v)))))
>>>>> 7       (bgp (triple ?x
<http://www.w3.org/1999/02/22-rdf-syntax-ns#type> ?y)))))

The definition of exists is based on the definition of substitute, which is

>>>> **************************
>>>> Definition: Substitute
>>>>
>>>> Let μ be a solution mapping.
>>>>
>>>>   substitute(pattern, μ) = the pattern formed by replacing every occurrence
>>>> of a variable v in pattern by μ(v) for each v in dom(μ)
>>>
>>> sections five and eighteen describe a number of “patterns”. none of which
>>> includes “select”.
>>>
>>> which means i read the recommendation to leave the particular operation, which
>>> you and hernández are exploring, as undefined.

[james has argued that one reason for this reading is that other readings
produce results that run counter to a notion of variable scoping.]

>>> in which situation, why first set up the untenable straw man, rather than
>>> going straight to a useful definition?
>>
>> I don't see [reading this situation as undefined] as either the intent or the
>> formal meaning of substitute.
>>
>> As far as formal meaning goes, "pattern" is simply the argument to substitute,
>> whatever that is.  It doesn't have to be anything that is a pattern at all.
>> Of course, it should be something that is somehow a pattern, because otherwise
>> the name is misleading.
>
> following that approach, the entire sparql form is a "pattern" and the word is
> left with little consequence.

True, but so what?  Using "pattern" as the name of a formal variable
doesn't have any consequence.  It could have been any other name, like
"xyxyxy" without affecting the formal meaning of substitute.  Using pattern
in the description itself does allow for a bit of wiggle room.

However, the entre definition for substitute is given above.  If this
definition doesn't hold when the actual value for the formal variable or
the result of substitute is somehow not a pattern then the meaning of such
constructs will be unspecified.  This defeats the desirability of using
this wiggle room.

Further, the wiggle room completely goes away if the SELECT is wrapped
inside a graph pattern construct, as happens in
... EXISTS { { SELECT ... } }
Here both the argument to substitute and its value are Group constructs and
thus actual graph patterns in the SPARQL algebra.


> we understand the recommendation differently.
> in particular, from the document context, it should be a “graph pattern”.

You appear to be saying that definition of substitute should be

*************
Definition: Substitute

Let μ be a solution mapping
  substitute(pattern, μ) = the pattern formed by replacing every occurrence
    of a variable v in pattern by μ(v) for each v in dom(μ), for pattern a
    SPARQL algebra construct that is a graph pattern
  substitute(xyxy, μ) = xyxy, otherwise
*************

This has several problems.  In particular, it substitutes into Projects that
are inside Joins, as would result from
  FILTER EXISTS { BIND ( "A" as ?y ) { SELECT ... } }
so it violates the same scope rules as before.

Or maybe you are saying that it should be

*************
Definition: Substitute

Let μ be a solution mapping
  substitute(pattern, μ) = the pattern formed by replacing every in-sope
    occurrence of a variable v in pattern by μ(v) for each v in dom(μ), for
    pattern a SPARQL algebra construct that is a graph pattern
  substitute(xyxy, μ) = xyxy, otherwise
*************

This at least adds the scope rules from SPARQL into the picture.
Unfortunately the scope rules in SPARQL don't help here, for multiple
reasons, including that they do not distinguish between what might be
considered different variables with the same name.

Or maybe substitute should drop variables from the solution mapping as soon
as the variable goes out of scope.  That's sounding better, but needs
careful consideration of just when to drop variables and what to do with,
for example, Projects whose variable has been substituted.  As well, there
is no notion of scope in the SPARQL algebra so this has to be either
defined there or scope dragged into the algebra during translation or some
other method for determining when to drop variables devised that matches
scoping.  A correct definition of this version of substitute is going to be
very different from the definition in the specification.

I'm actually strongly in favour of changing EXISTS to work somewhat like
this, but that's a decided change from the SPARQL 1.1 Query specification
and certainly not a clarification of something that is ambiguous there or a
minor obvious fix.

>> Intent is harder to decipher, but it appears that at least Virtuoso implements
>> substitute by substituting inside project constructs, which are what
>> subselects turn into.  (I am arguing that this is the wrong meaning but there
>> is no good evidence in the SPARQL 1.1 Query specification to support the claim
>> that there is any other intent than one that is supported by the formal
>> meaning.)
>
> i empathise with them, but would not have followed that approach as it
> violates otherwise clear scoping rules.

I'm not actually sure what "otherwise clear scoping rules" you mean here.
SPARQL has a notion of in-scope, but that doesn't match scoping of
variables in programming languages.  Perhaps you actually mean to use
something like programming language scope, but I don't see any evidence
that SPARQL uses that notion at all.

> my approach, as indicated earlier, has been to employ dynamic bindings, with
> the intent follow the incompletely defined intent of substitution, to the
> extent is can be reconciled with lexical contours, as established by binding
> forms.

Well I suppose that that is a possible way to implement something, but I
don't think that that ends up conforming to the definition of SPARQL.

>> In the example above the syntactic argument to EXISTS is, as always, a
>> GroupGraphPattern.  One of the options for GroupGraphPattern is '{' SubSelect
>> '}', which is the case above.
>
> i have always interpreted this situation to be a consequence of the general
> sloppiness of the specification.
> that is,
> - GroupGraphPattern appears in that situation in 1.0,
> - we need to include selects there,
> - let us do that and it does not matter that the clause is still called a
> “pattern".
> i would never have ventured to impute intent from that editorial change.

That's an editorial change?  I don't think that changes to the grammar can
be considered to be editorial.

>>  And SubSelect is a somewhat restricted version
>> of a SELECT query.  So from syntactic grounds the syntactic argument to EXISTS
>> is something that is a kind of pattern and this is support for using "pattern"
>> here.
>
> we disagree.

OK, we disagree.

>> So I don't see how it can be argued that the SPARQL 1.1 Query document says
that
>>
>> substitute((project (?z) (filter (sameTerm ?x ?y) (bgp (triple ?z ?w ?v))))),
>>            {(x,"A"), (y,"A")})
>>
>> is anything other than
>>
>> (project (?z) (filter (sameTerm "A" "A") (bgp (triple ?z ?w ?v))))
>
> we disagree.
>
> in general, you are arguing from failure to define and sloppy exposition to an
> incorrect intent.

Not at all.  I am arguing from the formal definition of substitute in the
SPARQL 1.1 Query specification,
https://www.w3.org/TR/2013/REC-sparql11-query-20130321/
This may be a bad definition - it does end up producing terms whose meaning
is unspecified in lots of cases and has counter-intuitive results - but
there is no way to read the particular application of substitute above in
any other way that the way I state.

> i do not see benefit to that approach and suggest it would be more worthwhile
> to argue directly for clear and adequate definitions.

I agree that there needs to be an adequate definition of EXISTS.

I further believe that the current definition of EXISTS, and in particular
the current definition of substitute, is broken bad.  So fixing it is not
just a matter of clarifying and extending the definition but has to instead
replace the definition in a way that changes the behaviour of SPARQL.

To argue otherwise is to argue that some current implementations have made
unfounded assumptions about EXISTS that don't conform to the correct
clarification of the specification and thus should be changed.  Instead the
argument has to be that it is necessary to change the SPARQL spec in a way
that delegitimizes some current implementations that are currently
legitimate and legitimizes other current implementations that are currently
illegitimate.  My hope is that this latter argument will be made
successfully!

[...]

peter

Received on Friday, 17 June 2016 18:38:35 UTC