- From: Adrian Walker <adriandwalker@gmail.com>
- Date: Mon, 28 Mar 2011 08:08:40 -0400
- To: Pat Hayes <phayes@ihmc.us>
- Cc: SW-forum Web <semantic-web@w3.org>
- Message-ID: <AANLkTinU5Tu5Ddhmk6uyDSPTHC=AVcNXmMcQ+zri9fLv@mail.gmail.com>
Hi Pat, You wrote... But this is still dangerous, in general. It really turns on whether or not you have a good reason to believe that your information is complete. If you do, then the CWA makes sense; if not, it doesn't. Databases usually are complete in this sense, for data that they record (which is why people make them, of course.) Random scraping of Web information is not usually complete. A google search of my publications on the Web is probably an edge case. So, just hedge the answer a bit more, e.g. *"Based on a possibly incomplete web search, it appears that Pat Hayes has 503 published papers as of 201103271208"* This is easy to do in the executable English system [1], and in many other systems. The OWA alternative -- not being able to count -- seems to cripple Q/A over RDF. But perhaps there is a useful middle way? Using local circumscription, and making that part of the explanation of an answer? Cheers, -- Adrian [1] Internet Business Logic A Wiki and SOA Endpoint for Executable Open Vocabulary English Q/A over SQL and RDF Online at www.reengineeringllc.com Shared use is free, and there are no advertisements Adrian Walker Reengineering On Sun, Mar 27, 2011 at 9:47 PM, Pat Hayes <phayes@ihmc.us> wrote: > > On Mar 27, 2011, at 11:15 AM, Adrian Walker wrote: > > Hi Pat & All, > > Pat wrote: > > *The OWA is simply that whatever you already know (however much RDF you > have found) there could be more of it out there. You can never be sure that > you have exhausted the totality of things that are being said about some > topic. So you can't infer that something is false just because you havn't > been told it. * > > Quite so, the web is always changing. Yet applying the above blindly leads > to all sorts of difficulties** -- e.g. you can never count how many papers > Pat has published! > > There's a commonsense way out of this tar pit. Use CWA, but be careful to > qualify results with a time stamp -- e.g. "Pat Hayes has 503 published > papers as of 201103271208" > > > Hah, I wish. But this is still dangerous, in general. It really turns on > whether or not you have a good reason to believe that your information is > complete. If you do, then the CWA makes sense; if not, it doesn't. Databases > usually are complete in this sense, for data that they record (which is why > people make them, of course.) Random scraping of Web information is not > usually complete. A google search of my publications on the Web is probably > an edge case. > > Pat > > > > HTH, -- Adrian > > Internet Business Logic > A Wiki and SOA Endpoint for Executable Open Vocabulary English Q/A over SQL > and RDF > Online at www.reengineeringllc.com > Shared use is free, and there are no advertisements > > Adrian Walker > Reengineering > > ** as Gregg and many others have described in many forums over the years. > > On Sun, Mar 27, 2011 at 11:16 AM, Pat Hayes <phayes@ihmc.us> wrote: > >> >> On Mar 27, 2011, at 12:13 AM, Gregg Reynolds wrote: >> >> I'm having trouble reconciling RDF's handling of blank nodes and the Open >> World Assumption. I suppose I'm still not entirely grokking leaning and/or >> OWA. I searched the archives and didn't find anything addressing my >> questions. My reasoning follows; where are the flaws? >> >> As I understand the OWA, if we have a node, all we know is that we have a >> node; we do not know what properties it may have. >> >> >> Not exactly. First, its not the node that has a property, it is the thing >> the node denotes. So when you see an RDF triple >> >> x:A x:prop x:B . >> >> what it is saying is that some *thing* called 'A' has a property called >> 'p' with the value called 'B'. >> >> But leaving that aside.... The OWA is simply that whatever you already >> know (however much RDF you have found) there could be more of it out there. >> You can never be sure that you have exhausted the totality of things that >> are being said about some topic. So you can't infer that something is false >> just because you havn't been told it. >> >> If we also know that it has property A, we cannot infer that it does not >> also have property B for any B. >> >> >> True. (Although OWL sometimes allows you draw conclusions like this, RDF >> plain does not.) >> >> Followed to its logical conclusion, this line of reasoning leads to the >> conclusion that there can only ever be one blank node in any graph. >> >> >> No, it does not. >> >> >> For example, suppose >> >> [1] a) <ex:Pedro ex:owns _:x>, <_:x rdf:type ex:Donkey>, <_:x ex:name >> ex:Daisy> >> b) <ex:Pedro ex:owns _:y>, <_:y rdf:type ex:Donkey>, <_:y ex:name >> ex:Maisy> >> >> then we've made two assertions (ok, six), but we have not necessarily >> asserted that Pedro owns two donkeys. >> >> >> True. Maybe Maisy is Daisy. In fact, maybe Maisy is Pedro, for that >> matter. >> >> By the OWA we cannot infer _:x != _:y. It could be that Pedro owns two >> donkeys, named Daisy and Maisy, respectively, but it also could be that >> Pedro owns one donkey with two names. >> >> Is the graph of [1] lean? It seems to me that under the OWA the answer >> must be that we do not know, >> >> >> It is lean, and we do know this. Leanness is a syntactic property of the >> graph. You can determine it algorithmically. >> >> just as we don't know if Pedro owns one or two donkeys. Under RDF >> semantics the answer could be yes or no, depending on which model we choose. >> But the principle of leaning as written (if I understand it) compels us to >> treat it as non-lean, since a model with a single node named both Daisy and >> Maisy works for both [1] a) and [1] b); >> >> >> You apparently do not understand it. Check out the definitions in the >> specs, they are given quite unambiguously. A graph is lean when it is has no >> instance which is a proper subgraph of itself. The graph [1] does not have >> such an instance, so it is lean. Nothing to do with models! >> >> the leaned version would look like: >> >> [2] <ex:Pedro ex:owns _:x>, <_:x rdf:type ex:Donkey>, <_:x ex:name >> ex:Daisy>, <_:x ex:name ex:Maisy> >> >> >> That is a different graph, which asserts that daisy and maisy are the same >> donkey. That is a different assertion, not equivalent to the first graph. It >> is not a leaning of the first graph. (It is an instance of it, but it is not >> a subgraph of it.) >> >> Similar considerations would apply to all blank node IDs in a graph: a >> model with a single (blank) node (with appropriate properties) would work >> for each such blank node ID. >> >> If this is not the case -- if graphs like [1] are construed as lean, with >> _:x != _:y -- then it looks to me like leaning involves an implicit Closed >> World Assumption. I.e. if _:x is named Daisy it is not also named Maisy. >> >> >> Not so. It simply keeps that possibility open. >> >> Now consider >> >> [3] <ex:Pedro ex:owns _:x>, <ex:Pedro ex:owns _:y> >> >> According the RDF semantics [1] is not lean, >> >> >> I assume you mean [3] is not lean >> >> so it can be reduced to a single triple <ex:Pedro ex:owns _:z>. That's >> because the model theoretic semantics mean that a single node can satisfy >> both clauses of [1]. >> >> >> Not really, and this not the right way to think about it. It is because >> [3] and the single triple (call it [3a]) *entail* each other, i.e. *every* >> model of one also satisfies the other. And this, in turn, is because [3] >> does not assert that there are two things Pedro owns, and [3a] does not deny >> the possibility of Pedro owning two (or more) things. They both just say >> that Pedro owns something, but [3a] says it more economically than [3] does. >> >> But under OWA, we cannot infer that no properties have been asserted of >> _:x and _:y; which must mean that the node satisfying [1] can have any >> properties or no properties. >> >> >> What it CAN have is irrelevant to what is in this ACTUAL graph. The OWA >> just says you might find some more RDF somewhere. But given the RDF you >> actually have in your graph, one can ask questions about that RDF in >> particular: what it entails, whether it is lean, and so forth. This graph >> [3] is, in fact, not lean as it stands. But you could add another triple to >> it to get a new graph which is lean. For example >> >> [4] <ex:Pedro ex:owns _:x>, <ex:Pedro ex:owns _:y> , <_:y rdf:type >> ex:Penguin> >> >> is now lean, intuitively because it says enough about _:y and not about >> _:x to distinguish them. Of course, we might find out still more: >> >> [5] <ex:Pedro ex:owns _:x>, <ex:Pedro ex:owns _:y> , <_:y rdf:type >> ex:Penguin>, <_:x rdf:type ex:Penguin> >> >> and now the graph is lean again, since now there is again nothing to >> distinguish _:x from _:y: the graph says exactly as much about one as about >> the other (if indeed it is another). >> >> So, you might well ask, what is the content of the OWA, if this is all it >> means? Well, contrast it with the CWA. Under the CWA, if you know, say, the >> RDF in your example [1], then you automatically know that Pedro, for >> example, is *not* an employee of CitiBank and *not* a citizen of Venezuala, >> simply because that graph does not say he is. This works well when >> consulting databases of employees and citizens, but it obviously is not a >> good rule to use on the open Web. >> >> The same principle must apply wherever blank node IDs occur, which again >> leads to the conclusion that all blank node IDs in a graph can be collapsed >> as it were to a single node. In other words, it looks to me like the >> principle of leaning, if valid, implies a maximum of one blank node per >> graph. >> >> >> It really does not mean that at all, and this does not even remotely >> follow from the definition of lean graph in the specs. You would do better >> to actually read the specs carefully. >> >> >> I'm afraid my language is a little awkward but I hope you can see what I >> mean. >> >> A related question: RDF Semantics says this is lean: >> >> [4] <ex:a> <ex:p> _:x . >> _:x <ex:p> _:x . >> >> But <ex:a ex:p ex:a> seems to fit the definition of proper subgraph as >> used to define "lean" >> >> >> Obviously not. That triple does not occur in that graph. Again, follow the >> definitions as given. >> >> , and semantically to satisfy [4] >> >> >> Not that this is relevant to questions of leanness, but this statement >> does not make sense. An interpretation may satisfy a graph, but a triple >> cannot satisfy anything. >> >> Hope this all helps. >> >> Pat >> >> >> , so [4] would not be lean. >> >> Thanks, >> >> Gregg >> >> >> ------------------------------------------------------------ >> IHMC (850)434 8903 or (650)494 3973 >> 40 South Alcaniz St. (850)202 4416 office >> Pensacola (850)202 4440 fax >> FL 32502 (850)291 0667 mobile >> phayesAT-SIGNihmc.us http://www.ihmc.us/users/phayes >> >> >> >> >> > > ------------------------------------------------------------ > IHMC (850)434 8903 or (650)494 3973 > 40 South Alcaniz St. (850)202 4416 office > Pensacola (850)202 4440 fax > FL 32502 (850)291 0667 mobile > phayesAT-SIGNihmc.us http://www.ihmc.us/users/phayes > > > > >
Received on Monday, 28 March 2011 12:09:15 UTC