Re: Blank nodes, leaning, and the OWA from Pat Hayes on 2011-03-28 (semantic-web@w3.org from March 2011)

From: Pat Hayes <phayes@ihmc.us>
Date: Sun, 27 Mar 2011 20:47:36 -0500
To: Adrian Walker <adriandwalker@gmail.com>
Cc: SW-forum Web <semantic-web@w3.org>
Message-Id: <16F6B6CB-BE76-4816-B4C9-9B4DB64B9735@ihmc.us>
On Mar 27, 2011, at 11:15 AM, Adrian Walker wrote:

> Hi Pat & All,
> 
> Pat wrote:
> 
> The OWA is simply that whatever you already know (however much RDF you have found) there could be more of it out there. You can never be sure that you have exhausted the totality of things that are being said about some topic. So you can't infer that something is false just because you havn't been told it. 
> 
> Quite so, the web is always changing.  Yet applying the above blindly leads to all sorts of difficulties** -- e.g. you can never count how many papers Pat has published!
> 
> There's a commonsense way out of this tar pit.  Use CWA, but be careful to qualify results with a time stamp --  e.g. "Pat Hayes has 503 published papers as of 201103271208"

Hah, I wish. But this is still dangerous, in general. It really turns on whether or not you have a good reason to believe that your information is complete. If you do, then the CWA makes sense; if not, it doesn't. Databases usually are complete in this sense, for data that they record (which is why people make them, of course.)  Random scraping of Web information is not usually complete. A google search of my publications on the Web is probably an edge case. 

Pat


> 
> HTH,  -- Adrian
> 
> Internet Business Logic
> A Wiki and SOA Endpoint for Executable Open Vocabulary English Q/A over SQL and RDF
> Online at www.reengineeringllc.com    
> Shared use is free, and there are no advertisements
> 
> Adrian Walker
> Reengineering
> 
> ** as Gregg and many others have described in many forums over the years.
> 
> On Sun, Mar 27, 2011 at 11:16 AM, Pat Hayes <phayes@ihmc.us> wrote:
> 
> On Mar 27, 2011, at 12:13 AM, Gregg Reynolds wrote:
> 
>> I'm having trouble reconciling RDF's handling of blank nodes and the Open World Assumption.  I suppose I'm still not entirely grokking leaning and/or OWA.  I searched the archives and didn't find anything addressing my questions.  My reasoning follows; where are the flaws?
>> 
>> As I understand the OWA, if we have a node, all we know is that we have a node; we do not know what properties it may have.
> 
> Not exactly. First, its not the node that has a property, it is the thing the node denotes. So when you see an RDF triple
> 
> x:A x:prop x:B .
> 
> what it is saying is that some *thing* called 'A' has a property called 'p' with the value called 'B'. 
> 
> But leaving that aside.... The OWA is simply that whatever you already know (however much RDF you have found) there could be more of it out there. You can never be sure that you have exhausted the totality of things that are being said about some topic. So you can't infer that something is false just because you havn't been told it. 
> 
>>  If we also know that it has property A, we cannot infer that it does not also have property B for any B.
> 
> True. (Although OWL sometimes allows you draw conclusions like this, RDF plain does not.)
> 
>>  Followed to its logical conclusion, this line of reasoning leads to the conclusion that there can only ever be one blank node in any graph.
> 
> No, it does not. 
> 
>> 
>> For example, suppose 
>> 
>> [1] a)  <ex:Pedro ex:owns _:x>, <_:x rdf:type ex:Donkey>, <_:x ex:name ex:Daisy>
>>      b)  <ex:Pedro ex:owns _:y>, <_:y rdf:type ex:Donkey>, <_:y ex:name ex:Maisy>
>> 
>> then we've made two assertions (ok, six), but we have not necessarily asserted that Pedro owns two donkeys.  
> 
> True. Maybe Maisy is Daisy. In fact, maybe Maisy is Pedro, for that matter. 
> 
>> By the OWA we cannot infer _:x != _:y.  It could be that Pedro owns two donkeys, named Daisy and Maisy, respectively, but it also could be that Pedro owns one donkey with two names.
>> 
>> Is the graph of [1] lean?  It seems to me that under the OWA the answer must be that we do not know,
> 
> It is lean, and we do know this. Leanness is a syntactic property of the graph. You can determine it algorithmically. 
> 
>> just as we don't know if Pedro owns one or two donkeys.  Under RDF semantics the answer could be yes or no, depending on which model we choose.  But the principle of leaning as written (if I understand it) compels us to treat it as non-lean, since a model with a single node named both Daisy and Maisy works for both [1] a) and [1] b);
> 
> You apparently do not understand it. Check out the definitions in the specs, they are given quite unambiguously. A graph is lean when it is has no instance which is a proper subgraph of itself. The graph [1] does not have such an instance, so it is lean. Nothing to do with models!
> 
>> the leaned version would look like:
>> 
>> [2]  <ex:Pedro ex:owns _:x>, <_:x rdf:type ex:Donkey>, <_:x ex:name ex:Daisy>, <_:x ex:name ex:Maisy>
> 
> That is a different graph, which asserts that daisy and maisy are the same donkey. That is a different assertion, not equivalent to the first graph. It is not a leaning of the first graph. (It is an instance of it, but it is not a subgraph of it.)
> 
>> Similar considerations would apply to all blank node IDs in a graph: a model with a single (blank) node (with appropriate properties) would work for each such blank node ID.
>> 
>> If this is not the case -- if graphs like [1] are construed as lean, with _:x != _:y -- then it looks to me like leaning involves an implicit Closed World Assumption.  I.e. if _:x is named Daisy it is not also named Maisy.
> 
> Not so. It simply keeps that possibility open. 
> 
>> Now consider
>> 
>> [3] <ex:Pedro ex:owns _:x>, <ex:Pedro ex:owns _:y>
>> 
>> According the RDF semantics [1] is not lean,
> 
> I assume you mean [3] is not lean
> 
>> so it can be reduced to a single triple <ex:Pedro ex:owns _:z>.  That's because the model theoretic semantics mean that a single node can satisfy both clauses of [1].
> 
> Not really, and this not the right way to think about it. It is because [3] and the single triple (call it [3a]) *entail* each other, i.e. *every* model of one also satisfies the other. And this, in turn, is because [3] does not assert that there are two things Pedro owns, and [3a] does not deny the possibility of Pedro owning two (or more) things. They both just say that Pedro owns something, but [3a] says it more economically than [3] does.
> 
>>  But under OWA, we cannot infer that no properties have been asserted of _:x and _:y; which must mean that the node satisfying [1] can have any properties or no properties.
> 
> What it CAN have is irrelevant to what is in this ACTUAL graph. The OWA just says you might find some more RDF somewhere. But given the RDF you actually have in your graph, one can ask questions about that RDF in particular: what it entails, whether it is lean, and so forth. This graph [3] is, in fact, not lean as it stands. But you could add another triple to it to get a new graph which is lean. For example
> 
> [4] <ex:Pedro ex:owns _:x>, <ex:Pedro ex:owns _:y> , <_:y rdf:type ex:Penguin>
> 
> is now lean, intuitively because it says enough about _:y and not about _:x to distinguish them. Of course, we might find out still more:
> 
> [5] <ex:Pedro ex:owns _:x>, <ex:Pedro ex:owns _:y> , <_:y rdf:type ex:Penguin>, <_:x rdf:type ex:Penguin>
> 
> and now the graph is lean again, since now there is again nothing to distinguish _:x from _:y: the graph says exactly as much about one as about the other (if indeed it is another).
> 
> So, you might well ask, what is the content of the OWA, if this is all it means? Well, contrast it with the CWA. Under the CWA, if you know, say, the RDF in your example [1], then you automatically know that Pedro, for example, is *not* an employee of CitiBank and *not* a citizen of Venezuala, simply because that graph does not say he is. This works well when consulting databases of employees and citizens, but it obviously is not a good rule to use on the open Web. 
> 
>>  The same principle must apply wherever blank node IDs occur, which again leads to the conclusion that all blank node IDs in a graph can be collapsed as it were to a single node.  In other words, it looks to me like the principle of leaning, if valid, implies a maximum of one blank node per graph.
> 
> It really does not mean that at all, and this does not even remotely follow from the definition of lean graph in the specs. You would do better to actually read the specs carefully.
> 
>> 
>> I'm afraid my language is a little awkward but I hope you can see what I mean.
>> 
>> A related question:  RDF Semantics says this is lean:
>> 
>> [4]  <ex:a> <ex:p> _:x .
>>        _:x <ex:p> _:x .
>> 
>> But <ex:a ex:p ex:a> seems to fit the definition of proper subgraph as used to define "lean"
> 
> Obviously not. That triple does not occur in that graph. Again, follow the definitions as given. 
> 
>> , and semantically to satisfy [4]
> 
> Not that this is relevant to questions of leanness, but this statement does not make sense. An interpretation may satisfy a graph, but a triple cannot satisfy anything. 
> 
> Hope this all helps.
> 
> Pat
> 
> 
>> , so [4] would not be lean.
>> 
>> Thanks,
>> 
>> Gregg
> 
> ------------------------------------------------------------
> IHMC                                     (850)434 8903 or (650)494 3973   
> 40 South Alcaniz St.           (850)202 4416   office
> Pensacola                            (850)202 4440   fax
> FL 32502                              (850)291 0667   mobile
> phayesAT-SIGNihmc.us       http://www.ihmc.us/users/phayes
> 
> 
> 
> 
> 

------------------------------------------------------------
IHMC                                     (850)434 8903 or (650)494 3973   
40 South Alcaniz St.           (850)202 4416   office
Pensacola                            (850)202 4440   fax
FL 32502                              (850)291 0667   mobile
phayesAT-SIGNihmc.us       http://www.ihmc.us/users/phayes
Received on Monday, 28 March 2011 01:48:14 UTC