Re: Blank nodes, leaning, and the OWA from Adrian Walker on 2011-03-28 (semantic-web@w3.org from March 2011)

From: Adrian Walker <adriandwalker@gmail.com>
Date: Mon, 28 Mar 2011 08:08:40 -0400
To: Pat Hayes <phayes@ihmc.us>
Cc: SW-forum Web <semantic-web@w3.org>
Message-ID: <AANLkTinU5Tu5Ddhmk6uyDSPTHC=AVcNXmMcQ+zri9fLv@mail.gmail.com>
Hi Pat,

You wrote...

But this is still dangerous, in general. It really turns on whether or not
you have a good reason to believe that your information is complete. If you
do, then the CWA makes sense; if not, it doesn't. Databases usually are
complete in this sense, for data that they record (which is why people make
them, of course.)  Random scraping of Web information is not usually
complete. A google search of my publications on the Web is probably an edge
case.

So, just hedge the answer a bit more, e.g.

*"Based on a possibly incomplete web search, it appears that Pat Hayes has
503 published papers as of 201103271208"*

This is easy to do in the executable English system [1], and in many other
systems.

The OWA alternative -- not being able to count -- seems to cripple Q/A over
RDF.  But perhaps there is a useful middle way?  Using local
circumscription, and making that part of the explanation of an answer?

                              Cheers,  -- Adrian


[1] Internet Business Logic
A Wiki and SOA Endpoint for Executable Open Vocabulary English Q/A over SQL
and RDF
Online at www.reengineeringllc.com
Shared use is free, and there are no advertisements

Adrian Walker
Reengineering

On Sun, Mar 27, 2011 at 9:47 PM, Pat Hayes <phayes@ihmc.us> wrote:

>
> On Mar 27, 2011, at 11:15 AM, Adrian Walker wrote:
>
> Hi Pat & All,
>
> Pat wrote:
>
> *The OWA is simply that whatever you already know (however much RDF you
> have found) there could be more of it out there. You can never be sure that
> you have exhausted the totality of things that are being said about some
> topic. So you can't infer that something is false just because you havn't
> been told it. *
>
> Quite so, the web is always changing.  Yet applying the above blindly leads
> to all sorts of difficulties** -- e.g. you can never count how many papers
> Pat has published!
>
> There's a commonsense way out of this tar pit.  Use CWA, but be careful to
> qualify results with a time stamp --  e.g. "Pat Hayes has 503 published
> papers as of 201103271208"
>
>
> Hah, I wish. But this is still dangerous, in general. It really turns on
> whether or not you have a good reason to believe that your information is
> complete. If you do, then the CWA makes sense; if not, it doesn't. Databases
> usually are complete in this sense, for data that they record (which is why
> people make them, of course.)  Random scraping of Web information is not
> usually complete. A google search of my publications on the Web is probably
> an edge case.
>
> Pat
>
>
>
> HTH,  -- Adrian
>
> Internet Business Logic
> A Wiki and SOA Endpoint for Executable Open Vocabulary English Q/A over SQL
> and RDF
> Online at www.reengineeringllc.com
> Shared use is free, and there are no advertisements
>
> Adrian Walker
> Reengineering
>
> ** as Gregg and many others have described in many forums over the years.
>
> On Sun, Mar 27, 2011 at 11:16 AM, Pat Hayes <phayes@ihmc.us> wrote:
>
>>
>> On Mar 27, 2011, at 12:13 AM, Gregg Reynolds wrote:
>>
>> I'm having trouble reconciling RDF's handling of blank nodes and the Open
>> World Assumption.  I suppose I'm still not entirely grokking leaning and/or
>> OWA.  I searched the archives and didn't find anything addressing my
>> questions.  My reasoning follows; where are the flaws?
>>
>> As I understand the OWA, if we have a node, all we know is that we have a
>> node; we do not know what properties it may have.
>>
>>
>> Not exactly. First, its not the node that has a property, it is the thing
>> the node denotes. So when you see an RDF triple
>>
>> x:A x:prop x:B .
>>
>> what it is saying is that some *thing* called 'A' has a property called
>> 'p' with the value called 'B'.
>>
>> But leaving that aside.... The OWA is simply that whatever you already
>> know (however much RDF you have found) there could be more of it out there.
>> You can never be sure that you have exhausted the totality of things that
>> are being said about some topic. So you can't infer that something is false
>> just because you havn't been told it.
>>
>>  If we also know that it has property A, we cannot infer that it does not
>> also have property B for any B.
>>
>>
>> True. (Although OWL sometimes allows you draw conclusions like this, RDF
>> plain does not.)
>>
>>  Followed to its logical conclusion, this line of reasoning leads to the
>> conclusion that there can only ever be one blank node in any graph.
>>
>>
>> No, it does not.
>>
>>
>> For example, suppose
>>
>> [1] a)  <ex:Pedro ex:owns _:x>, <_:x rdf:type ex:Donkey>, <_:x ex:name
>> ex:Daisy>
>>      b)  <ex:Pedro ex:owns _:y>, <_:y rdf:type ex:Donkey>, <_:y ex:name
>> ex:Maisy>
>>
>> then we've made two assertions (ok, six), but we have not necessarily
>> asserted that Pedro owns two donkeys.
>>
>>
>> True. Maybe Maisy is Daisy. In fact, maybe Maisy is Pedro, for that
>> matter.
>>
>> By the OWA we cannot infer _:x != _:y.  It could be that Pedro owns two
>> donkeys, named Daisy and Maisy, respectively, but it also could be that
>> Pedro owns one donkey with two names.
>>
>> Is the graph of [1] lean?  It seems to me that under the OWA the answer
>> must be that we do not know,
>>
>>
>> It is lean, and we do know this. Leanness is a syntactic property of the
>> graph. You can determine it algorithmically.
>>
>> just as we don't know if Pedro owns one or two donkeys.  Under RDF
>> semantics the answer could be yes or no, depending on which model we choose.
>>  But the principle of leaning as written (if I understand it) compels us to
>> treat it as non-lean, since a model with a single node named both Daisy and
>> Maisy works for both [1] a) and [1] b);
>>
>>
>> You apparently do not understand it. Check out the definitions in the
>> specs, they are given quite unambiguously. A graph is lean when it is has no
>> instance which is a proper subgraph of itself. The graph [1] does not have
>> such an instance, so it is lean. Nothing to do with models!
>>
>> the leaned version would look like:
>>
>> [2]  <ex:Pedro ex:owns _:x>, <_:x rdf:type ex:Donkey>, <_:x ex:name
>> ex:Daisy>, <_:x ex:name ex:Maisy>
>>
>>
>> That is a different graph, which asserts that daisy and maisy are the same
>> donkey. That is a different assertion, not equivalent to the first graph. It
>> is not a leaning of the first graph. (It is an instance of it, but it is not
>> a subgraph of it.)
>>
>> Similar considerations would apply to all blank node IDs in a graph: a
>> model with a single (blank) node (with appropriate properties) would work
>> for each such blank node ID.
>>
>> If this is not the case -- if graphs like [1] are construed as lean, with
>> _:x != _:y -- then it looks to me like leaning involves an implicit Closed
>> World Assumption.  I.e. if _:x is named Daisy it is not also named Maisy.
>>
>>
>> Not so. It simply keeps that possibility open.
>>
>> Now consider
>>
>> [3] <ex:Pedro ex:owns _:x>, <ex:Pedro ex:owns _:y>
>>
>> According the RDF semantics [1] is not lean,
>>
>>
>> I assume you mean [3] is not lean
>>
>> so it can be reduced to a single triple <ex:Pedro ex:owns _:z>.  That's
>> because the model theoretic semantics mean that a single node can satisfy
>> both clauses of [1].
>>
>>
>> Not really, and this not the right way to think about it. It is because
>> [3] and the single triple (call it [3a]) *entail* each other, i.e. *every*
>> model of one also satisfies the other. And this, in turn, is because [3]
>> does not assert that there are two things Pedro owns, and [3a] does not deny
>> the possibility of Pedro owning two (or more) things. They both just say
>> that Pedro owns something, but [3a] says it more economically than [3] does.
>>
>>  But under OWA, we cannot infer that no properties have been asserted of
>> _:x and _:y; which must mean that the node satisfying [1] can have any
>> properties or no properties.
>>
>>
>> What it CAN have is irrelevant to what is in this ACTUAL graph. The OWA
>> just says you might find some more RDF somewhere. But given the RDF you
>> actually have in your graph, one can ask questions about that RDF in
>> particular: what it entails, whether it is lean, and so forth. This graph
>> [3] is, in fact, not lean as it stands. But you could add another triple to
>> it to get a new graph which is lean. For example
>>
>> [4] <ex:Pedro ex:owns _:x>, <ex:Pedro ex:owns _:y> , <_:y rdf:type
>> ex:Penguin>
>>
>> is now lean, intuitively because it says enough about _:y and not about
>> _:x to distinguish them. Of course, we might find out still more:
>>
>> [5] <ex:Pedro ex:owns _:x>, <ex:Pedro ex:owns _:y> , <_:y rdf:type
>> ex:Penguin>, <_:x rdf:type ex:Penguin>
>>
>> and now the graph is lean again, since now there is again nothing to
>> distinguish _:x from _:y: the graph says exactly as much about one as about
>> the other (if indeed it is another).
>>
>> So, you might well ask, what is the content of the OWA, if this is all it
>> means? Well, contrast it with the CWA. Under the CWA, if you know, say, the
>> RDF in your example [1], then you automatically know that Pedro, for
>> example, is *not* an employee of CitiBank and *not* a citizen of Venezuala,
>> simply because that graph does not say he is. This works well when
>> consulting databases of employees and citizens, but it obviously is not a
>> good rule to use on the open Web.
>>
>>  The same principle must apply wherever blank node IDs occur, which again
>> leads to the conclusion that all blank node IDs in a graph can be collapsed
>> as it were to a single node.  In other words, it looks to me like the
>> principle of leaning, if valid, implies a maximum of one blank node per
>> graph.
>>
>>
>> It really does not mean that at all, and this does not even remotely
>> follow from the definition of lean graph in the specs. You would do better
>> to actually read the specs carefully.
>>
>>
>> I'm afraid my language is a little awkward but I hope you can see what I
>> mean.
>>
>> A related question:  RDF Semantics says this is lean:
>>
>> [4]  <ex:a> <ex:p> _:x .
>>        _:x <ex:p> _:x .
>>
>> But <ex:a ex:p ex:a> seems to fit the definition of proper subgraph as
>> used to define "lean"
>>
>>
>> Obviously not. That triple does not occur in that graph. Again, follow the
>> definitions as given.
>>
>> , and semantically to satisfy [4]
>>
>>
>> Not that this is relevant to questions of leanness, but this statement
>> does not make sense. An interpretation may satisfy a graph, but a triple
>> cannot satisfy anything.
>>
>> Hope this all helps.
>>
>> Pat
>>
>>
>> , so [4] would not be lean.
>>
>> Thanks,
>>
>> Gregg
>>
>>
>>  ------------------------------------------------------------
>> IHMC                                     (850)434 8903 or (650)494 3973
>> 40 South Alcaniz St.           (850)202 4416   office
>> Pensacola                            (850)202 4440   fax
>> FL 32502                              (850)291 0667   mobile
>> phayesAT-SIGNihmc.us       http://www.ihmc.us/users/phayes
>>
>>
>>
>>
>>
>
> ------------------------------------------------------------
> IHMC                                     (850)434 8903 or (650)494 3973
> 40 South Alcaniz St.           (850)202 4416   office
> Pensacola                            (850)202 4440   fax
> FL 32502                              (850)291 0667   mobile
> phayesAT-SIGNihmc.us       http://www.ihmc.us/users/phayes
>
>
>
>
>
Received on Monday, 28 March 2011 12:09:15 UTC