Re: Editorial thread for BGP matching from Pat Hayes on 2006-01-23 (public-rdf-dawg@w3.org from January to March 2006)

From: Pat Hayes <phayes@ihmc.us>
Date: Mon, 23 Jan 2006 16:23:56 -0600
To: Enrico Franconi <franconi@inf.unibz.it>
Cc: RDF Data Access Working Group <public-rdf-dawg@w3.org>
Message-Id: <p0623090abffb0023b43a@[10.100.0.23]>
>On 23 Jan 2006, at 18:17, Pat Hayes wrote:
>>>"In the case of simple entailment, if the scoping graph G' is such 
>>>that it does not share blank nodes with BGP, then the above 
>>>definition can be simplified to take the union between G' and BGP, 
>>>instead of an OrderedMerge."
>>
>>This works for any kind of entailment, which is why I prefer the 
>>simpler wording: and then we don't need this odd notion of ordered 
>>merging at all. In fact, I'd suggest that the simpler wording 
>>should be normative, as it expresses the intended meaning more 
>>directly and straightforwardly. Also, it keeps distinct issues 
>>separate. Exactly how an engine handles bnode scoping (what gets 
>>re-written, or maybe use hash-tables, whatever) really is an 
>>implementation decision. We shouldn't build into the entailment 
>>clause what is in effect an implicit decision about how to 
>>implement bnode scoping.
>
>This is not about irrelevant stuff, this is about being precise and 
>non-ambiguous.
>
>We have to be as precise as possible when writing down a definition, 
>and not leave it to the verbal part.

We have to be precise, but the spec document is not pure mathematics. 
The actual text is often centrally important, particularly when it 
uses words like "must" :-)  But in any case, the alternatives are 
equally precise.

>That's why I am proposing to have *my* precise definitions, and 
>adding a verbal part explaining with *your* simpler wording: since 
>they are equivalent for simple entailment, everybody is happy.

The ownership is less important than the clarity of the wording.

>I repeat that we need all the ingredients in the spec, since they 
>allow us to introduce a terminology that future user have to refer 
>to in order to make their choices - if they want to say that they 
>are (backward) compatible with SPARQL. Future implementors have to 
>declare, for example, what is their kind of entailment *and* their 
>scoping set B, since they are in the spec.
>
>So again, why don't you like my proposal of having precise 
>definitions with the simple wording explaining them?

Both styles of wording are equally precise. The choice is, I agree, 
purely aesthetic at this point. I find the definition using union 
simpler and easier to understand, partly because it does not require 
following the implicit reasoning behind the directionality of the 
merge, but mostly because it seems to keep separate issues better 
separated. To be honest, I don't think I fully understood the 
ordered-merge definition myself until I figured out that it defined 
essentially the same as the simpler one.

There is the issue of how to keep the three bnode scopes (dataset, 
query and answer set) clearly distinct, and we can do that, given 
that we have the scoping set G' available to be defined, simply by 
requiring that G' and BGP share no bnodes. We should say this 
explicitly when defining the scoping graph, as part of the 
definition. Since the exact bnodeID vocabulary of G' is otherwise 
unconstrained, there is no loss of generality in this requirement; it 
is quite precise; it is easy to understand (and its motivation); and 
it is already familiar to anyone dealing with multiple RDF documents. 
One often needs to take such care over bnodeID scopes when dealing 
with multiple sources of RDF content. Now, given this bnode 
separation, the two definitions are indeed equivalent: but now it is 
simply shorter, easier to write and to understand (G' union S(BGP)) 
than S(G' order-merge BGP). The former doesn't require defining 
order-merge. To fully understand the latter requires a reader to 
understand why the S needs to be applied to a G' which contains no 
variables (puzzle#1), why merging needs to be done at all (puzzle #2) 
and why it needs to be ordered (puzzle #3). Whereas, if anyone is 
puzzled why the first wording uses union rather than merge, the 
answer is also the answer to the question why G' is needed at all 
(puzzle #4, for both wordings), viz. that we want S(BGP) to be in the 
scope of G' because G' defines the scope of answer bnodes (bnodes in 
answer bindings): that is the very reason for having it there, to 
ensure that bnodeIDs are used in the answer set in ways that conform 
with their uses in other answers in the answer set. This also makes 
intuitive sense of the identification of the bnode scope of G' with 
that of the answer sequence document, since one can think of G' as a 
'virtual copy' of the data graph G which is 'virtually included' in 
the answer document; and this intuitive picture then implies all the 
rest of the structure that one needs to understand. The ordered-merge 
machinery was needed when we were including G rather than G' in the 
definition, since we cannot guarantee that G and BGP are standardized 
apart. But the use of G' gives us enough slack to require the 
necessary bnodeID separations in the actual definition of G': and 
then we don't need to have that special machinery in the definitions 
to handle this (now non-existent) case.

Pat

>--e.


-- 
---------------------------------------------------------------------
IHMC		(850)434 8903 or (650)494 3973   home
40 South Alcaniz St.	(850)202 4416   office
Pensacola			(850)202 4440   fax
FL 32502			(850)291 0667    cell
phayesAT-SIGNihmc.us       http://www.ihmc.us/users/phayes
Received on Monday, 23 January 2006 22:24:22 UTC