Re: Editorial changes in Section 2.5 from Pat Hayes on 2006-02-01 (public-rdf-dawg@w3.org from January to March 2006)

From: Pat Hayes <phayes@ihmc.us>
Date: Wed, 1 Feb 2006 12:12:46 -0600
To: tessaris <tessaris@inf.unibz.it>
Cc: Enrico Franconi <franconi@gmail.com>, RDF Data Access Working Group <public-rdf-dawg@w3.org>
Message-Id: <p06230911c006a07fbc63@[10.100.0.23]>
>Pat Hayes wrote:
>>
>>>  On 31 Jan 2006, at 18:31, Pat Hayes wrote:
>>
>>  Do you understand the point about bnodes vs. bnode identifiers in
>>  documents? And do you see how this makes it unnecessary to introduce
>>  BGP', since the simpler definition is already fully general? And do you
>>  see that allowing BGP (not BGP') and G' to share bnodes can introduce
>>  unintended scope errors when one tries to form a CONSTRUCT graph by
>>  applying the answer binding (introducing bnodes from G') to copies of BGP?
>
>Pat, I understand that the current formulation doesn't forbid
>explicitely that a bnode in a query BGP appears in the answer set. But
>this is harmfull

/harmless, I presume. But it is in fact harmful, read on.

>, because bnodes inside a WHERE don't have any scope
>other than the BGP in which they appear (to me, this is consist with the
>RDF(S) view of the universe).

No, it's not consistent. Ignore the bnode IDs for the moment, OK? 
BnodeIDs are lexical units in some lexicalization, and the scoping 
rules that apply to those IDs depend on how that lexicalization is 
defined. We (DAWG, that is) are defining the lexicalization for 
returning answer sets, so it is up to us to decide how the bnodeIDs 
in answer set *documents* are determined. But in the RDF abstract 
(graph) syntax, bnodes are real mathematical things, with a 
mathematical identity. There is no notion, strictly speaking, of 
blank node scope: actual blank nodes - not blank node IDs, but the 
blank nodes themselves - have a global 'scope'; they are just 
entities in some huge Platonic set which is disjoint from IRIs and 
literals. This is why we can define the abstract syntax in purely 
mathematical language. So if two RDF graphs (or query patterns) share 
a bnode, then they really, really do have that actual bnode in 
common, as a mathematical fact. There is no protective notion of 
bnode scope to allow us to pretend that they don't have the bnode in 
common, under this circumstance. If one were to render the two graphs 
(or query patterns) as node-arc diagrams - which by the way is a 
perfectly valid way to render RDF content -  they would be a single 
connected diagram, linked by that blank node that they share. 
Basically, scoping is a notion that is inherently lexical, having to 
do with lexical labels: but the abstract graph syntax of RDF is 
mathematical, not lexical.

We set RDF up this way deliberately, and tried to explain it in
http://www.w3.org/TR/2004/REC-rdf-concepts-20040210/#section-Graph-syntax
and http://www.w3.org/TR/rdf-mt/#graphsyntax. It says there " While 
node identifiers such as '_:xxx' serve to identify blank nodes in the 
surface syntax, these expressions are not considered to be the label 
of the graph node they identify; they are not names, and do not occur 
in the actual graph. In particular, the RDF graphs described by two 
N-Triples documents which differ only by re-naming their node 
identifiers will be understood to be equivalent . This re-naming 
convention should be understood as applying only to whole documents, 
since re-naming the node identifiers in part of a document may result 
in a document describing a different RDF graph."
Note the careful distinction between graph and document.

The abstract graph syntax gives us a purely mathematical way to talk 
about 'scoping' just by talking about membership in sets, without 
getting all involved with issues of bound versus free names, name 
binders, etc.. All we have to do, when talking about RDF graphs is to 
refer to sharing (or not) of actual bnodes, like the difference 
between union and merge. This is in fact quite liberating, if one is 
familiar (as Im sure you guys are) with the usual need to be so 
finicky about not accidentally capturing free names with intermediate 
name binders, etc.. The abstract syntax also means that widely 
different surface forms can all be considered to be the same syntax 
at some useful level. So for example, RDF/XML, Turtle and COE concept 
maps are all perfectly valid renderings of one single abstract syntax.

>My feeling is that you confuse the BGPs in the WHERE body with the
>template in the CONSTRUCT. It's true that a template looks like a BGP,
>but is not used as a BGP at all. In particular, bnodes in templates are
>*always* renamed:

Not at present. Read on...

>"""
>Definition: CONSTRUCT
>
>Let Q = (GP, DS, SM, CONSTRUCT T) where
>
>     * GP is a graph pattern
>     * DS is an RDF Dataset
>     * SM is a set of solution modifiers
>     * T is a set of triple patterns
>
>then, if QS is the set of solutions formed by matching dataset DS with
>graph pattern GP, then write SM(QS) = { Si | i = 1,2 ... n }.
>
>Let Ti be a sequence of sets of triple patterns, such that each Ti is
>basic graph pattern equivalent to T and no Ti have a blank node in common.

...what is to prevent one of these being T itself? This does not 
exclude that possibility. It is easy to exclude it, but it should not 
be necessary. In the (common) case where there is only one solution 
modifier, why should it be necessary to replace the blank nodes in 
that one answer?

But the real point is that if one understands blank node scoping 
properly, there was no reason to introduce BGP' in the first place. 
The previous, simpler, version of the definition, using BGP, has a 
simple, clear and accurate intuition behind it. There are three 
distinct bnodeID scopes, all associated with distinct document 
boundaries: the dataset graph, the query, and the answer set. These 
correspond exactly to three structures in the definition: G, BGP and 
G'. The exclusion condition between G' and BGP corresponds to the 
only document boundary (between query and answer set) which is 
visible to the user, and mirrors exactly the scoping condition which 
we have imposed on answer set formats, and the commonality of G' 
across all the answers corresponds exactly to the document scope of 
the answer set. The unnecessary complication arising from the 
addition of BGP' destroys this nice picture (there is now no 
mathematical object corresponding to the bnode scope of the answer 
document), does not correspond to the RDF abstract syntax model, and 
provides for no extra generality.

>Let Ri be the RDF graph formed from SC(Si, Ti).
>
>The CONSTRUCT result is the RDF graph formed by the union of Ri.
>"""
>
>Do you agree, or am I missing something?

See above. You decide :-)

Pat

>
>
>BTW Andy: T_i shouldn't have any bnode in common with bnodes used in
>SM(QS) too; otherwise, unwanted co-references might appear in the
>resulting graph (this, regadless the BGP/BGP' business).
>
>--sergio


-- 
---------------------------------------------------------------------
IHMC		(850)434 8903 or (650)494 3973   home
40 South Alcaniz St.	(850)202 4416   office
Pensacola			(850)202 4440   fax
FL 32502			(850)291 0667    cell
phayesAT-SIGNihmc.us       http://www.ihmc.us/users/phayes
Received on Wednesday, 1 February 2006 18:13:09 UTC