Re: Is Model theory the only way to get predictability? (was Re: rdf inclusion)

Guha, to Bill Andersen:
>I did not say that Cyc has a model theory. Nor did I say that it is 
>predictable in its current form. However, it can be made 
>predictable, etc. with good software engineering, i.e., without a 
>model theory. There are lots of very complex software artifacts, 
>which have nice predictable properties which don't have a model 
>theory.
>
>Google is nice and predictable because of good software engineering 
>and not model theory. Somehow, your message implicitly equates model 
>theory with nice and predictable. You have to make the case for that.

Let me try to make out a case. It depends on the assumption that SW 
'engines' will be both drawing conclusions and publishing them for 
use by other engines.

As Guha says, many complex pieces of software don't have a model 
theory. (There are those who would argue that predictable software is 
always written in a programming language that has a clear 
*semantics*, even if that semantics isn't strictly model-theoretic; 
but that claim is controversial and I don't want to get involved in 
that debate here.) But the issue, seems to me, isn't that the 
*software* has an MT, but that the *content language* which is used 
to exchange information on the SW has a model theory. The software 
can do what it likes: it can make random guesses or use invalid means 
of inference or whatever. It can be written in Java or LISP or Perl 
or assembler; none of that matters, it's all inside the engine's 
black box, and there is no way to legislate it in any case. All that 
matters is that the inputs to, and outputs from, the software are 
written in a public content interchange language (CIL) which has a 
publicly agreed semantics. The reason the CIL needs a semantics is 
that the publisher has no idea what use the reader is going to make 
of it.  So it can't be thought of as code, to be run; it has to be 
thought of as content, to be used (in some way that cannot even be 
predicted.) And a semantics of content is a model theory, pretty much 
by definition.

One might argue that there is no need for a semantics at all. The 
case for the CIL having a semantics is that the SW will achieve 
inferential depth whether we like it or not. Even if each engine is 
only doing some shallow reasoning that can be done on an ad-hoc basis 
(eg Google), if its conclusions get published and then used by other 
engines, and their shallow conclusions get published, and so on, then 
the SW will be covered in assertions whose derivations may be 
arbitrarily long, and which may be arbitrarily far away from their 
secure antecedents (even supposing that they have secure 
antecedents); and this is exactly the kind of deep inference search 
that requires the maximum clarity in assigning meaning, since small 
errors in meaning can propagate arbitrarily far. As soon as the web 
becomes full of deep inferences - that is, conclusions that have been 
derived by long chains of inference - then the need for a common, 
precise, semantics is paramount. And it will inevitably become full 
of deep inferences as soon as it has shallow inferences (well, maybe 
about a week later) since the very process of web publication will 
introduce the inferential depth even if the inference engines are 
themselves very shallow.  Even as simple a language as RDF can have 
surprising conclusions which the writer of the RDF may not have 
considered. In fact, were it not so, there would be very little point 
in having SW markup: one of the main use cases involves using 
inference to derive connections which were not immediately obvious, 
in order to improve the focus of web searching.

BTW, to refer to another thread between Guha and me, I tend to think 
more and more that the primary medium of interchange should be 
thought of as derivations rather than simply assertions. A simple 
assertion is then a trivial derivation (or, if you prefer, an empty 
derivation.). This would mean that part of the publication process 
would be making public the *reasoning* that underlies the conclusion. 
Now, of course, a reader is free to ignore this, but that would be 
taking a risk; in fact, it would be trusting the publisher to have 
checked its own sources adequately. Notice that what counts as a 
inference step in such a derivation hasn't been specified, and it 
could include things like '<uriOfAgent> asserts P' which is an 
implicit trusting of the agent in question. If A uses such trust in B 
to derive its conclusions, it probably should make this clear when it 
publishes them, since C may not trust B as much as A does. Another 
kind of inference step could be just a claim that the conclusion is 
entailed by the antecedents (relative to a certain notion of 
entailment), which could be used to summarize a much longer 
derivation, and could be checked by a completely different method 
than the one that was used to derive it.

Pat Hayes

>BTW, for the sake of history and the many people who gave up 
>significant portions of their lives for Cyc, the architecture of 
>CycL, as you describe it, is documented in [1] was the collaborative 
>work of many, including David Wallace, Mark Derthick, Dexter Pratt, 
>John Huffman and me. Keith is the implementor of the current 
>version. It is important to be correct about these attributions.
>
>guha
>
>[1] Lenat, D. B. and R. V. Guha. "Enabling Agents to Work Together." 
>Communications of the ACM 37, no. 7 (July 1994).
>
>
>Bill Andersen wrote:
>
>>On 5/23/02 13:29, "R.V.Guha" <mailto:guha@guha.com><guha@guha.com> wrote:
>>
>>
>>
>>>[to Jeff Heflin]
>>>
>>>About the issue of RDF & RDFS being hard to extend --- let us be
>>>*very*  clear on this. RDF & RDFS were designed to be Cyc like systems
>>>[1]. They were *not* designed to be DL like systems. You are finding it
>>>hard to reconcile the two. Cyc-like systems are extensible and have been
>>>extended, though not in a fashion that is consistent with DL
>>>model-theories. Yes, the clothes  don't fit the person. Maybe the
>>>problem is with the clothes and not the person.
>>>    
>>>
>>
>>Hi all..
>>
>>It's unclear what Cyc's model theory is at all.  So if you pick some model
>>theory T, it's a fair bet that Cyc's model theory, whatever it is, is
>>inconsistent with T.  At it's base, the Cyc engine is a resolution theorem
>>prover augmented with special purpose modules (many of which have fixpoint
>>semantics) and the argumentation system for NM reasoning, so you have some
>>minimal model stuff thrown in.  I would defy anyone, even including Keith
>>Goolsbey who wrote the thing, to tell me what all of that *combined* means.
>>In my view, this is nothing to crow about.
>>
>>Cyc is an amazing system - it does lots of incredible things.  But what is
>>unclear is what it doesn't do or what it gets wrong, or how long it takes to
>>do some given inference, etc.  All of which are undesirable properties,
>>IMHO, for the Semantic Web.  When I go to Google, I have some reasonable
>>expectation that what its crawler has seen, I will find, assuming I use the
>>right terms.  I would like something of the same assurance with the Semantic
>>Web and I wouldn't bank on a Cyc-like system giving that to me.
>>
>>  .bill
>>


-- 
---------------------------------------------------------------------
IHMC					(850)434 8903   home
40 South Alcaniz St.			(850)202 4416   office
Pensacola,  FL 32501			(850)202 4440   fax
phayes@ai.uwf.edu 
http://www.coginst.uwf.edu/~phayes

Received on Wednesday, 29 May 2002 11:38:21 UTC