Re: Editorial thread for BGP matching from Enrico Franconi on 2006-01-23 (public-rdf-dawg@w3.org from January to March 2006)

From: Enrico Franconi <franconi@inf.unibz.it>
Date: Tue, 24 Jan 2006 00:02:51 +0100
To: Pat Hayes <phayes@ihmc.us>
Cc: RDF Data Access Working Group <public-rdf-dawg@w3.org>
Message-Id: <9912E9FF-D933-48F8-A786-13CBB410A424@inf.unibz.it>
On 23 Jan 2006, at 23:23, Pat Hayes wrote:
>> So again, why don't you like my proposal of having precise  
>> definitions with the simple wording explaining them?
>
> Both styles of wording are equally precise.

No, since the relationship between G' and BGP is left to the wording  
rather than to the formulas: this happens for simple entailment, when  
you say that G' and BGP don't share any term. Unluckily enough, this  
become even worst when we will have to define told-bnode simple  
entailment, when we have to (again verbally) revise this part of the  
definition, and add (again verbally) that bnodes marked as told in  
BGP if they appear in G' have not to be renamed. You see the mess?  
(And equally it happens if you rename bnodes in BGP - just the dual  
situation). With orderedMerge, everything is solved where it should  
be: in the relationship between G' and BGP, in both cases.
I REALLY HOPE THAT YOU SEE THIS.

> The choice is, I agree, purely aesthetic at this point.

Absolutely not, see above.

> I find the definition using union simpler and easier to understand,  
> partly because it does not require following the implicit reasoning  
> behind the directionality of the merge, but mostly because it seems  
> to keep separate issues better separated. To be honest, I don't  
> think I fully understood the ordered-merge definition myself until  
> I figured out that it defined essentially the same as the simpler one.

It may be easier to understand, and that's why I proposed to have it  
as an explanation in the current text. Look for it, and propose a  
better wording of the explanation.

> There is the issue of how to keep the three bnode scopes (dataset,  
> query and answer set) clearly distinct, and we can do that, given  
> that we have the scoping set G' available to be defined, simply by  
> requiring that G' and BGP share no bnodes. We should say this  
> explicitly when defining the scoping graph, as part of the  
> definition. Since the exact bnodeID vocabulary of G' is otherwise  
> unconstrained, there is no loss of generality in this requirement;  
> it is quite precise; it is easy to understand (and its motivation);  
> and it is already familiar to anyone dealing with multiple RDF  
> documents.

See above: if you state this requirement in the definition, then you  
have to revise it when defining the told-bnode simple entailment. By  
doing so, the told-bnode simple entailment will be no more compatible  
with the SPARQL spec, and by allowing so, we will implicitly say that  
everybody will be allowed to be incompatible with the SPARQL spec.  
And this is exactly what I want to prevent.

> One often needs to take such care over bnodeID scopes when dealing  
> with multiple sources of RDF content. Now, given this bnode  
> separation, the two definitions are indeed equivalent: but now it  
> is simply shorter, easier to write and to understand (G' union S 
> (BGP)) than S(G' order-merge BGP). The former doesn't require  
> defining order-merge.

See above.

> To fully understand the latter requires a reader to understand why  
> the S needs to be applied to a G' which contains no variables  
> (puzzle#1), why merging needs to be done at all (puzzle #2) and why  
> it needs to be ordered (puzzle #3). Whereas, if anyone is puzzled  
> why the first wording uses union rather than merge, the answer is  
> also the answer to the question why G' is needed at all (puzzle #4,  
> for both wordings), viz. that we want S(BGP) to be in the scope of  
> G' because G' defines the scope of answer bnodes (bnodes in answer  
> bindings): that is the very reason for having it there, to ensure  
> that bnodeIDs are used in the answer set in ways that conform with  
> their uses in other answers in the answer set. This also makes  
> intuitive sense of the identification of the bnode scope of G' with  
> that of the answer sequence document, since one can think of G' as  
> a 'virtual copy' of the data graph G which is 'virtually included'  
> in the answer document; and this intuitive picture then implies all  
> the rest of the structure that one needs to understand. The ordered- 
> merge machinery was needed when we were including G rather than G'  
> in the definition, since we cannot guarantee that G and BGP are  
> standardized apart. But the use of G' gives us enough slack to  
> require the necessary bnodeID separations in the actual definition  
> of G': and then we don't need to have that special machinery in the  
> definitions to handle this (now non-existent) case.

If you also want to leave out the scoping set B, the situation is  
much worst: also RDF and RDFS (and OWL, of course) entailments can  
not be defined without be incompatible with the SPARQL specs!

So, your proposal is going to define SPARQL in a incompatible way  
with *any* of its extensions.

--e.
Received on Monday, 23 January 2006 23:03:14 UTC