Re: Blank nodes, leaning, and the OWA from Pat Hayes on 2011-03-28 (semantic-web@w3.org from March 2011)

From: Pat Hayes <phayes@ihmc.us>
Date: Mon, 28 Mar 2011 17:12:14 -0500
To: Gregg Reynolds <dev@mobileink.com>
Cc: SW-forum Web <semantic-web@w3.org>
Message-Id: <E63378AE-42B4-47D7-8F1B-952EEB04C90D@ihmc.us>
On Mar 28, 2011, at 3:57 AM, Gregg Reynolds wrote:

> 
> 
> On Mon, Mar 28, 2011 at 12:12 AM, Pat Hayes <phayes@ihmc.us> wrote:
> 
> On Mar 27, 2011, at 10:16 PM, Gregg Reynolds wrote:
> 
>> On Sun, Mar 27, 2011 at 10:16 AM, Pat Hayes <phayes@ihmc.us> wrote:
>> 
>> On Mar 27, 2011, at 12:13 AM, Gregg Reynolds wrote:
>> ... 
>>> [1] a)  <ex:Pedro ex:owns _:x>, <_:x rdf:type ex:Donkey>, <_:x ex:name ex:Daisy>
>>>      b)  <ex:Pedro ex:owns _:y>, <_:y rdf:type ex:Donkey>, <_:y ex:name ex:Maisy>
>> 
>> ... 
>>> Is the graph of [1] lean?  It seems to me that under the OWA the answer must be that we do not know,
>> 
>> 
>> It is lean, and we do know this. Leanness is a syntactic property of the graph. You can determine it algorithmically. 
>> ... 
>> You apparently do not understand it. Check out the definitions in the specs, they are given quite unambiguously. A graph is lean when it is has no instance which is a proper subgraph of itself. The graph [1] does not have such an instance, so it is lean. Nothing to do with models!
>> 
>> Trust me, I've spent more hours than I care to count trying to decipher the specs.  Whatever they are intended to convey may be unambiguous; what the text actually says is another matter.  For example, I find no syntactic rules that allow me to map a piece of concrete syntax to the abstract syntax.
> 
> See http://www.w3.org/TR/2004/REC-rdf-testcases-20040210/#ntriples and the table given there.
> 
> The table that maps N-Triple productions to RDF *Concepts*, and nodeID to "Identifier for a blank node"?  Concepts are not syntax

"Concepts" is the title of the document which defines the RDF abstract graph syntax, among other things. 

> , and blank nodes in RDF graphs do not have identifiers.

Well, strictly speaking, they can have. They are simply defined as some set disjoint from URIs and literals. As the spec says, "Otherwise, this set of blank nodes is arbitrary." What one should not do is to identify a blank node with its identifier, but this is not to say it cannot have one. In the Ntriples syntax, for example, blank nodes do have identifiers.

>> Nothing that says explicitly that e.g. every _:x should map to one blank node, nor that distinct bnode IDs cannot map to one bnode.
> 
> Is that not utterly obvious? Why else would one call it an "identifier for a blank node" ?
> 
> I wish it were as obvious as it looks.  But since the specs are inconsistent in their use of "blank node" and "blank node identifier" (among other things), what looks "obvious" is not to be trusted.

I do not believe they are inconsistent. If you can locate an inconsistency, please document it, as the current WG is tasked (among other things) with fixing errata in the current RDF documents. 

In particular, if you can think of a way to explain the idea of a blank node identifier in less ambiguous or clearer language, I would be delighted to read it. I have to say, you are the first reader to have reported this particular difficulty, as far as I know. 

>>  For that matter, nothing that says the mapping must preserve URIs.
> 
> The URIs used in the n-triples notation (used throughout the semantics document) are *identical* to the URIs in the abstract syntax. I fail to see how they can fail to be 'preserved' under these circumstances.  
> 
> See note on "obvious" above. 
> 
>>  Also no *syntactic* rule that allows me to map
>> 
>> <ex:a> <ex:p> _:x .
>> _:y <ex:p> _:x .
>> 
>> to <ex:a> <ex:p> _:x .  This looks like it should be some kind of syntactic reduction step, but I find nothing to justify elimination of the second clause except semantic considerations.
> 
> It follows from the fact that a graph is defined to be a set of triples. The set {a, a} is the same as the set {a}. 
> 
> Ok.  It's not what I think of when I see the term "abstract syntax", but I grant that it works (modulo instantiation, see below). 
> 
>>  The definition of instance upon which the definition of leaning depends only mentions "replacing some or all blank nodes"; it doesn't say which ones to replace
> 
> You can replace any (or indeed none) of them, and you still have an instance. 
> 
>> , it places no constraints on the replacement (except that they be bnodes, literals, or URI refs), and it says nothing about *removing* nodes.
> 
> Indeed, nodes do not get removed by an instantiation. 
> 
>>  In fact as I read it getting from a graph to an "instance which is a proper subgraph" is not even possible syntactically, since we have no syntactic rule for eliminating triples.
> 
> A graph is a set of triples. Instantiation is defined as substituting a (blank node, URI or literal) for a blank node. Take the above graph and substitute the URI <ex:a> for the blank node  _:y. This does not affect the second triple, which does not contain that blank node, but it makes the first triple identical to the second triple. The resulting set therefore contains a single triple:
> 
> <ex:a> <ex:p> _:x .
> 
> which is a subgraph of the original graph. 
> 
> By the definition of instantiation, it is also legal to replace _:y by *any* URI.  

Indeed, replacing a blank node with any URI would yield an instance, by the definition of "instance". 

> In particular, it is legal to use <ex:a> to replace _:x in
> 
> <ex:a> <ex:p> _:x .
> _:x <ex:p> _:x .
> 
> yielding <ex:a ex:p ex:a> .

True. This is also an instance of the original graph. 

>  It may not be a proper subgraph of the original graph, but it is a legal mapping of it.

I am not sure what you mean by 'legal' here. None of these definitions specify anything to be legal or illegal, they simply define notions like instance, subgraph, etc.., which are used throughout the specification documents. 

But to get back to the example, it is an instance of the original graph but not a subgraph of it. What is your point in drawing attention to this?

>  Maybe a definition of graph normalization would be useful.
> 
> Make sense now?
> 
> Yes, I see the ideas behind the prose, and thank you for taking the time.  Do you see the ambiguities?  

No, I do not.

> Strictly speaking, RDF Abstract Syntax is not syntax, since it has no symbol for blank nodes.  No syntax without symbols.

This is a slogan with very little content. If you prefer, think of each blank node as being a distinct symbol. 

>  But even if we overlook that, RDF's notion of "abstract syntax" is unorthodox.    At least some of your readers (and when I say some, I mean at least one, i.e. me) will think of concrete v. abstract syntax in terms of parse trees and abstract syntax trees (cf Aho Seti and Ullman's Dragon book), which is not how RDF uses the terms.

No indeed. Our usage is much simpler that that. And we do define all our terms, so a reference to a work which defines things differently for a different purpose would not seem to be particularly relevant. The term 'abstract syntax' dates back at least to McCarthy's unpublished but influential memorandum   http://www-formal.stanford.edu/jmc/towards/node12.html  (there dated 1996, but I know was about a decade old when I read it in the early 1970s)

But in any case, you would not go far wrong by thinking of the RDF graph as analogous to a parse tree. People often refer to the process of inputting an RDF/XML document and generating a triple store as 'parsing'. It is not necessarily a *tree*, of course, but then parsers are not obliged to produce trees. 

>  RDF abstract syntax looks much more like semantics to me.  Do you see why I say that?

No, I'm afraid I do not. 

>  Again, this is a question of how the text works, not the ideas motivating the text.  Every reader comes to the text with ideas about what certain terms mean; usage that deviates from "standard" (or at least widespread) usage should be explicitly motivated.

I would say that, when reading as 'formal' a document as a technical specification, a reader should be prepared to follow the definitions and motivations of the document itself, rather then presume that the authors of the document will have followed the particular conventions and presumptions that he or she, the reader, might have brought with them. 

The authors of a document like the RDF specs are in an impossible situation if they must accommodate to every set of ideas that a reader might bring to bear on the text, because there is a huge variety of such expectations and presumptions about what is "standard", and they are often widely incompatible with one another. There is no single "standard" set of presumptions shared by all readers of such a document. This is why the documents give their own, perhaps idiosyncratic, definitions of terms in such exhausting detail. 

Pat

> 
> Thanks,
> 
> Gregg
> 

------------------------------------------------------------
IHMC                                     (850)434 8903 or (650)494 3973   
40 South Alcaniz St.           (850)202 4416   office
Pensacola                            (850)202 4440   fax
FL 32502                              (850)291 0667   mobile
phayesAT-SIGNihmc.us       http://www.ihmc.us/users/phayes
Received on Monday, 28 March 2011 22:12:53 UTC