Re: ISSUE-68: Updated definition from Holger Knublauch on 2016-03-14 (public-data-shapes-wg@w3.org from March 2016)

From: Holger Knublauch <holger@topquadrant.com>
Date: Mon, 14 Mar 2016 14:37:18 +1000
To: "public-data-shapes-wg@w3.org" <public-data-shapes-wg@w3.org>
Message-ID: <56E63FFE.5060808@topquadrant.com>
On 11/03/2016 16:20, Peter F. Patel-Schneider wrote:
>
> On 03/10/2016 10:15 PM, Holger Knublauch wrote:
>> On 11/03/2016 15:27, Peter F. Patel-Schneider wrote:
>>> On 03/10/2016 08:37 PM, Holger Knublauch wrote:
>>>> On 11/03/2016 13:22, Peter F. Patel-Schneider wrote:
>>>>> On 03/10/2016 06:04 PM, Holger Knublauch wrote:
>>>>>> On 10/03/2016 1:17, Peter F. Patel-Schneider wrote:
>>>>>>> On 03/09/2016 12:46 AM, Holger Knublauch wrote:
>>>>>>>> On 9/03/2016 18:17, Peter F. Patel-Schneider wrote:
>>>>>>>>> I'm pretty sure that this fails in a number of places.
>>> [...]
>>>>>>>>> The substitution can modify variables from different scopes, which will
>>>>>>>>> change
>>>>>>>>> results.
>>>>>>>> Do you have an example for this?
>>>>>>> SELECT ?this ?that
>>>>>>> WHERE { ?this ex:a ex:b
>>>>>>>       SELECT ?that WHERE { ?this ex:a ?that } }
>>>>>> The definition states that substitution also happens in nested SELECTs. I
>>>>>> believe this meets user expectations, and would be needed for cases like
>>>>>> sh:minCount that use a nested SELECT. I don't quite see a problem with the
>>>>>> above. Do you have data to illustrate why this would cause problems?
>>>>> Because the ?this in the inner SELECT is a different variable.  Before
>>>>> substitution it would return any ?that that is the object of any ex:a triple.
>>>>>     After substitution it returns only those that have the substituted
>>>>> value as a
>>>>> subject.
>>>> Yes, but that is exactly the desired outcome.
>>>>
>>>> Holger
>>> I'm not a SPARQL expert, but I don't think so.
>>>
>>>
>>> Let's pre-bind  ?this to ex:c (this is a very easy case of pre-binding, so
>>> what to do is pretty obvious) in
>>>
>>> SELECT ?this ?that
>>> WHERE { ?this ex:a ex:b .
>>>          SELECT ?that WHERE { ?this ex:a ?that } }
>>>
>>> against graph
>>>
>>> ex:c ex:a ex:b .
>>> ex:d ex:a ex:f .
>>>
>>> The result set is
>>> ?this = ex:c, ?that = ex:b
>>> ?this = ex:c, ?that = ex:f
>> Yes that's without binding the inner ?this.
>>
>>> Let's substitute ?this by ex:c to get (roughly)
>>>
>>> SELECT (ex:c AS ?this) ?that
>>> WHERE { ex:c ex:a ex:b .
>>> SELECT ?that WHERE { ex:c ex:a ?that } }
>>>
>>> which results in only one solution
>>> ?this = ex:c, ?that = ex:b
>> Yes, that's the intended result. I don't see the problem (yet?)
>>
>> Holger
> They are different.  You proposed that substitution into embedded queries
> would implement pre-binding.  It doesn't, at least as I understand this simple
> case of pre-binding.
>
> If pre-binding doesn't work the way I think it should, then the old intuition
> (values statements) was wrong.

Yes, I tend to think that the old intuition (VALUES statement) was 
indeed misleading. VALUES can often be used and will return the same 
outcome, but the most consistent behavior for the spec would be to do a 
full substitution because we need to get the scoping consistent. This 
applies here with nested SELECTs but also with MINUS, where I no longer 
agree there is a problem either. The intuition "text substitution" 
sounds more suitable to me.

The remaining questions are about how to express this best for the 
purpose of this specification. I see two issues:

1) If we substitute variables in certain expressions, then these can 
become invalid syntax. Going through the use of "Var" in the Grammar [1] 
of the SPARQL 1.1 spec, I believe this affects
- SELECT ?var: pre-bound results must be added to the result set by 
other means
- bound(?var): must be substituted with true if ?var is bound
- AS ?var: syntax error
- GROUP BY ?var -> GROUP BY (value)
- ORDER BY ?var -> ORDER BY (value)
There are some places where pre-binding a variable with a bnode or a 
literal would be invalid, e.g. GRAPH ?var but these are easy to detect 
too and would equally lead to runtime errors anyway.

So far, as long as the data is about literals and IRIs only, this should 
be safe.

2) What to do with bnodes. Skolemization could be used, although it is 
clear that this is not realistic implementation advice. The main issue 
here, as you point out, is that some functions behave differently when 
they get bnodes vs IRIs. Here is a question of how detailed we really 
need to go into these details. We could either enumerate these functions 
(they are not too many) or simply state "Built-in functions such as 
isIRI and str must return the same results for skolemized bnodes as if 
they were real bnodes". In the case of str(?var) we can produce ?unbound 
variable if value is a bnode, eliminating the invalid cases at 
pre-binding time.

An alternative to skolemization might be to operate on Algebra level, 
avoiding the syntax problems too. This may lead to a much more compact 
definition of pre-binding with just a few general sentences. I do now 
know whether a specification of pre-binding really MUST rely on the 
SPARQL textual syntax only. As long as we just want to describe the 
*effect*, then why not use the Algebra directly?

Do you have further thoughts on this topic?

Thanks,
Holger

[1] https://www.w3.org/TR/sparql11-query/#grammar
Received on Monday, 14 March 2016 04:37:53 UTC