Re: rq23 def'n "Pattern Solution" wrong? (and more on BGP') from Pat Hayes on 2006-02-28 (public-rdf-dawg@w3.org from January to March 2006)

From: Pat Hayes <phayes@ihmc.us>
Date: Tue, 28 Feb 2006 17:02:22 -0600
To: Enrico Franconi <franconi@inf.unibz.it>
Cc: RDF Data Access Working Group <public-rdf-dawg@w3.org>
Message-Id: <p06230901c02a745f40c6@[10.100.0.25]>
>[ACTION: Enrico to explain definitions in reply to LeeF's questions]
>
>On 14 Feb 2006, at 07:53, Lee Feigenbaum wrote:
>>I ran across this while finally having a chance today to spend
>>several hours going over the discussion between Enrico, Sergio, Jos, and
>>Pat about the issues surrounding BGP' in the rq23 E-matching definitions.
>>To the best that I can tell, the use of BGP' in these definitions does not
>>break anything at all; Pat seems to argue that the presence of BGP' breaks
>>CONSTRUCT queries because it allows answer bindings to be introduced with
>>bnodes that clash between BGP and G'. Pat says a few times that this might
>>create a problem when substituting Si(BGP) to create the CONSTRUCT output
>>graph, but as Sergio pointed out, the CONSTRUCT output graph is made up of
>>substitutions into the Ti template pieces, which are not related to BGP.
>>In fact, once BGP has morphed into BGP', answers are never again
>>substituted into BGP, which is one reason that I feel that the current
>>definitions are correct. Pat, I'd appreciate it if you could explain to me
>>(online or offline), if you think that there is still a technical
>>deficiency in the current definitions... A full example would be most
>>helpful, as the example in the existing threads is fragmented and
>>incomplete.
>>
>>That being said, it is only since the January 26th decision that Pat has
>>elucidated the correctness justifications for the simpler definitions (w/o
>>BGP' or OrderedMerge, simply requiring that G E-entail (G' union S(BGP))).
>>I find Pat's arguments concerning the distinction between the mathematical
>>bnode objects (which have no scope but global) and bnode IDs (which we
>>must carefully scope between documents) to be convincing, and I have not
>>found an example or general justification from Enrico or Sergio (or Jos)
>>that in particular addresses Pat's recent claims that the simpler
>>definitions work (and that claims to the contrary are due to a conflation
>>of bnodes and bnode IDs).
>>
>>All this is to say that while I am OK with the current definitions because
>>I do not believe them to be broken, I would prefer the simpler definitions
>>in lieu of evidence that the simpler definitions do not work correctly.
>>Enrico, Sergio, Jos, or anyone else, as with Pat above, I'd appreciate if
>>you could explain to me (online or offline) why you feel that the simpler
>>definitions (the so-called "Pat H definitions") require the addition of
>>the BGP' construct.
>
>First of all, I agree that we (and I mean the whole WG) forgot about 
>the abstract syntax while writing the definitions in rq23 2.5. So, 
>whenever in 2.5 we use "blank node names" or "blank node labels", we 
>should actually just use "blank node". I understand that everybody 
>agrees with this, so this should be done by Jos.
>
>Considering this change that emphasises the abstract syntax aspect, 
>the current definitions are still obviously correct.
>
>From a pure semantic point of view, the argument saying that BGP' is 
>"useless" (but keeping G') has the very same validity as a similar 
>argument saying that introducing G' is "useless" (but keeping BGP').

The point of having G' (which was originally G itself) in the 
definition is to ensure that there is a single scope for multiple 
answer bindings. The scope of G' in the definitions is the entire 
answer set, while the scope of BGP' is a single answer binding. So 
one does not get the same result from either choice: and in fact, to 
eliminate G' while keeping BGP' would not get the correct behavior in 
the answer document, since it would in effect mean that bnode scopes 
would be to answers; but the document scope for bnodeIDs is the 
entire document.

If you meant using G rather than G', we have already rehearsed why 
this is not appropriate, since all bnodes in the answer would be 
told-bnodes: In effect, the scopes of the target graph and answer 
document would be identical. Either way, eliminating G' is not an 
option.

>This is true since both BGP' and G' are used in the context of 
>distinct sets of triples: in this case, the bnode interpretation of 
>BGP' and the bnode interpretation of G' are independent from the 
>original bnode identities anyway (see "Semantic conditions for blank 
>nodes" in section 1.5 of RDF-MT). So, if there is an argument not to 
>have BGP' in the definition (but keeping G') on the ground of this 
>observation, the same argument should be applied in order to get rid 
>of G' (but keeping BGP') in alternative.

(1) that argument would be incorrect, see above, but anyway
(2) nobody has ever suggested having BGP' without G', so to introduce 
this alternative now is pointless. The only issue we have to decide 
is between having G' alone, or having G' together with BGP'.

>Note that you need at least one of the two in order to be able to 
>state the bnode-disjointness condition between G' (G) and BGP (BGP').

That is one of the reasons, but there are other reasons also, most 
particularly the need to ensure that answer bindings to bnodes from 
the scoping graph are treated uniformly across multiple answers.

>  In our original definition using the ordered-merge we didn't need 
>neither G' nor BGP', since the ordered-merge took care automatically 
>of the disjointness.
>
>However, as we already pointed out several times, even from the 
>abstract syntax point of view, if we don't introduce either G' or 
>BGP' in the definition, we limit the abstract (and therefore any 
>concrete) syntax not to enjoy important properties:
>
>- if we don't have BGP', the (abstract syntax representation of the) 
>answer set can not use bnodes which appear in the (abstract syntax 
>representation of the) query;

That is true, but the alternative (that it *should* use those bnodes) 
has never been seriously contemplated by the WG as a design option 
for SPARQL, and there is no provision for it in any surface notation 
(it would involve having partially overlapping bnode scopes between 
the query and answer documents) so this point seems to be irrelevant.

>- if we don't have G', the (abstract syntax representation of the) 
>answer set can not use bnodes which do not appear in the (abstract 
>syntax representation of the) data.

Quite.

>
>Please note again that the above restrictions do not affect the 
>semantics, which would be OK in both cases anyway, since, as noted 
>above, the bnodes are always evaluated locally independently on 
>their abstract syntax identity.

They DO affect the semantics. In fact, these definitions ARE the 
semantics of SPARQL, in effect. RDF graph syntax treats bnodes as 
syntactic objects with a global identity. There is no notion of 
'local evaluation' of a bnode in RDF.

>On the other hand, either of the above restriction seems to us a 
>strong limitation on the abstract syntax of the answer set, that has 
>to be reflected on any concrete linearisation of the abstract syntax.

They already are so reflected. The definition using G' and BGP, with 
the condition that they be bnode-disjoint, G' be the same for all 
answers and graph-equivalent to G, and all answer bindings come from 
G', captures exactly the abstract-syntactic conditions corresponding 
to the scoping conditions we have already imposed for bnodeIDs in the 
various documents. The query document and the answer document are 
each scoped locally, so that it is impossible for a bnodeID in one of 
them to refer to the same bnode as a bnodeID in the other: this is 
why no purpose is served by allowing BGP (the query) and G' (the 
source of answer bindings) to share bnodes. All answer bindings come 
from a single RDF object, G', because the single answer document 
contains all answers inside a common scope. These are the only 
constraints which arise from the document scoping of bnodeIDs. 
However, no conditions are placed on G' and G, to allow for a 
possible told-bnode usage where the target graph and an answer 
document might be understood to have overlapping bnodeID scopes; 
hence, G' is required to be equivalent to G but is neither required 
to be G itself nor required to be bnode-disjoint from G, although 
that would be the likeliest case.

>So, if we keep the "union" in the definition, we have to use both G' and BGP'.

Not only do we not have to, the definition would be greatly improved 
if we left it the way it was originally written, without BGP', as it 
would then be as complicated as it needs to be but not any more 
complicated, and the three parts of it (G, G' and BGP) would then 
correspond precisely to the three distinct document scopes in the 
surface syntax (respectively target graph, answer document and 
query); whereas in the current definition, there is nothing in the 
operational picture which corresponds to the four-way distinction in 
the semantic definition.

Pat

>
>cheers
>--e.


-- 
---------------------------------------------------------------------
IHMC		(850)434 8903 or (650)494 3973   home
40 South Alcaniz St.	(850)202 4416   office
Pensacola			(850)202 4440   fax
FL 32502			(850)291 0667    cell
phayesAT-SIGNihmc.us       http://www.ihmc.us/users/phayes
Received on Tuesday, 28 February 2006 23:02:43 UTC