Re: Grist for layering discussion from Pat Hayes on 2002-01-15 (www-archive@w3.org from January 2002)

From: Pat Hayes <phayes@ai.uwf.edu>
Date: Tue, 15 Jan 2002 11:21:19 -0600
To: Sandro Hawke <sandro@w3.org>
Cc: hendler@cs.umd.edu, timbl@w3.org, las@olin.edu, connolly@w3.org, w3c-semweb-ad@w3.org, www-archive@w3.org, "Peter F. Patel-Schneider" <pfps@research.bell-labs.com>
Message-Id: <p05101033b869084fd210@[65.212.118.208]>
....
>  > >>
>>  >>  Now is everything OK?  NO!  There are two problems:
>>  >>
>>  >>  1/ Because RDFO is an extension of RDFS RDFO retrieval will also produce
>>  >>
>>  >>	_:x married Susan .
>>  >>
>>  >>     from IB2.  In fact, every RDFO disjunction also creates several
>>  >>     extraneous consequences.
>>  >>
>>  >>     Well you might argue that the
>>  >>
>>  >>	John rdfn:or _:x .
>>  >>
>>  >>     consequence is benign because it mentions the special RDFO property.
>>  >>     However, the other consequences do not mention any special RDFO
>>  >>     properties and they are definitely not benign.  RDFO has 
>>failed in its
>>  >>     goal of capturing related disjunctions.
>>  >
>>  >Right.  Clearly that was a bad design.  The exact design principle
>>  >violated here I have not seen clearly stated, but it's important.  I
>>  >think TimBL didn't see this problem when he started n3 logic, which
>>  >makes it kind of broken in this same way.  But I believe he does
>>  >understand it now.   We usually talk about it in the realm of
>>  >open/closed world and "that's all their is",
>>
>>  That is a different issue. Don't get them confused. The 'that's all'
>>  problem is to do with how to encode finite structures in purely
>>  descriptive format. [The honest answer is that it can't be done
>>  (because of the second recursion theorem), but there are ways to hack
>>  it if your tastes run to hacking.]  But that has nothing (directly)
>>  to do with the central issue being talked about here, which is that
>>  when you DO describe the syntax, you are making assertions that are
>>  different from what that syntax was making.
>
>But the assertions are merely assertions about syntactic structures.

Ah, no; wait. You know that, and maybe I know that, if we were 
involved in making those assertions. But nothing in the RDF says 
this. That is my central worry.  RDF is supposed to be used to 
publish content on web pages. Something reads the RDF, and extracts 
as much content as it can manage to get out of it, presumably by 
using RDF-valid processes of some kind. But it is (literally) 
impossible to extract from a piece of RDF anything about what the 
user who wrote that RDF *intended* it to mean or not mean, unless 
that information is somehow also encoded in RDF. And RDF has no way 
to encode something that means "this piece of RDF isn't intended to 
be actually understood as being real RDF, of course; its just 
assertions about syntactic structures in some other language."

>They don't say anything about anything except about these objects in
>the domain of discourse which happen to be syntactic structures of
>other languages.

How does it say that some of the objects are syntactic structures of 
other languages, and which are not? How is that notion of 'syntactic 
structure' itself formalized in RDF? How is the notion of 'other 
language' conveyed by RDF?

>And one of them probably says one of those structure
>is true.

Oh, I really do hope it doesn't try to say that. That is like hoping 
there is a lit fuse in a firework factory.

But more seriously, there is an issue here: true *in what language*? 
The point is that truth in RDF might not amount to the same thing as 
truth in WOL. But it seems to be asking a lot of RDF to expect it to 
assert truth in some other language altogether.

>  > >but perhaps it's a
>>  >different peice of the same problem.   (Which, as Pat pointed out, is
>>  >one of the heavy prices of moving from a serial syntax to a graph
>>  >syntax.   No debate there.)
>  > >
>>  >>  2/ RDFO is non-monotonic.  The retrievals from IB2 include 
>>information th
>>  at
>>  >>     cannot be retrieved from IB1.
>>  >>
>>  >>  It is possible to overcome these problems, at least partly, by 
>>exploiting
>>  >  > the reflective properties of RDF.  We can encode the disjunctions using
>>  a
>>  >>  special construction, something like
>>  >>
>  > >>  IB3	John rdfor:or _:l1 .
>>  >>	_:l1 rdfor:fact _:f1 .
>>  >  >	_:l1 rdfor:rest _:l2 .
>>  >>	_:l2 rdfor:fact _:f2 .
>>  >>	_:l2 rdfor:rest rdfor:nil .
>>  >>	_:f1 rdfor:predicate married .
>  > >>	_:f1 rdfor:object Susan .
>>  >>	_:f2 rdfor:predicate friend .
>>  >  >	_:f2 rdfor:object John .
>>  >
>>  >Yeah, this works.   It could be seen as going back to a serial syntax,
>>  >of course.
>>
>>  No, it does NOT work. If you publish that as RDF, you are not saying
>>  what the RDFOR was saying. There is no way to get around this fact,
>>  since the RDF spec itself includes the RDF MT, so the meaning of that
>>  RDF is *required by the W3C spec* to be what it is. You might have
>>  something else in mind; but that is irrelevant, since once those RDF
>>  triples are set loose on the web, your state of mind or intentions
>>  are lost; something reading this only has the triples and the RDF
>>  spec to go on. And with RDF in its current state, there is no way to
>>  indicate *in RDF* that you mean it to say anything other than what it
>>  says in RDF.
>>
>>  >  > Now retrieval for RDFOR can (probably) be designed so that
>>  >>  1/ All the extra consequences involve special the RDFOR 
>>constructs, and s
>>  o
>>  >>     can be regarded as benign.
>>  >>  2/ RDFOR is monotonic.
>>  >>
>>  >>  Have we succeeded?  Partly, but at three prices, two that show up right
>>  >>  away and one that shows up in other extensions.
>>  >>
>>  >>  The first price is that the construction is much more complicated than
>>  >>  a syntax extension.
>>  >
>>  >Alas, yes, but that's just because you're looking at it in N-Triples.
>>  >LISP syntax is rather elegant unless you put all the dotted pairs back
>>  >in, then it's about this ugly.
>>
>>  Not quite, in fact. But look: it *is* Ntriples. There isn't any
>>  'other way' to look at it (except RDF/XML, ie). This is the level at
>>  which the meaning is attached. You may have in mind that it is being
>>  used for some other purpose, but that's not what the published spec.
>>  says.
>>
>>  >  Any object representation system is
>>  >pretty ugly at the bit level.
>>  >
>>  >>  The second price is that the construction adds a lot of extra 
>>consequence
>>  s.
>>  >>  These consequences can be considered to be benign, but they are still
>>  >>  there.  To make the formalism work correctly in the presence of these
>>  >>  consequences requires a lot of work (and may not be possible, 
>>even here).
>>  >
>>  >Yes.   It will take some work make sure they stay benign.
>>
>>  I don't see how this can be possible. Who knows what consequences
>>  they might have in some other context, eg when added to some other
>>  set of triples from some other source? There certainly could be no
>>  way to guarantee that they might not, for example, accidentally
>>  combine to be an encoding of some other higher-level syntax (RDFX or
>>  RDFY) which could mean something else completely. When lists are
>>  decomposed into sets of triples, and any set entails all its subsets,
>>  and subsets can be combined freely, the whole idea of using triple
>>  stores as datastructure encodings strikes me as highly dubious.
>>
>>  >
>>  >>  The third price is that we have introduced a form of reification and a
>>  >>  construct that can assert the truth of reification constructs.  This
>>  >>  (probably) doesn't cause any problems here because the extension is so
>>  >>  expressively limited.  However, for more powerful extensions reification
>>  >>  produces paradoxes, and thus cannot be used.
>>  >
>>  >Two answers here.
>>  >
>>  >1.  I've heard some people say, "Who Cares?"  Operationally, what's
>>  >the problem with a paradox?
>>
>>  Oh dear God, I am inclined to give up at this point. Why don't y'all
>>  try making a semantic web which is freely paradoxical, and we go away
>>  and make one that preserves meaning, and we just see which of them is
>  > more use?
>
>How about you make one by publishing model theories and we make one
>with layman's language and running code?

Right, I might have expected that response. You seem to have an 
amateurs-of-the-world-unite attitude to model theory, which 
completely misses the point. I'm not arguing AGAINST running code and 
layman's language; but no amount of for-idiots introductions are 
going to provide the kind of security of globally coherence that is 
going to be vital for the success of the SW. (Evidence, if you want 
any: look at the confusion and nonterminating debates which 
surrounded the idea of 'anonymous nodes' in RDF, until a model theory 
was available.)  All that running code has to deal with the RDF that 
it finds, right? The point is only that the software designer has 
access only to the language spec to guide the design of the code. 
The software that processes some RDF has no access to the intentions 
of the writer of that RDF. If the SW depends on some RDF being 
interpreted one way and other RDF being interpreted another way, and 
there is no way to record *in RDF* what the difference is, then the 
whole thing isn't going to work properly. The SW can't depend on 
people making telephone calls to find out how to interpret the RDF on 
web pages. The point of a model theory is that it provides a single 
gold standard of meaning that is independent of the RDF writer's 
state of mind.

>  Seriously, I appreciate
>your efforts, and I'm sure you know I (and others) are trying to find
>ways we can reach a consensus design that's better than any of us
>could do alone.
>
>>  >  My guess is it will show up as infinite
>>  >loops and/or bottomless recursion, which is unpleasant but can be
>>  >managed as a resource-management problem.
>>
>>  No. It will show up as two pieces of software accessing the same DB
>>  but one deciding that you owe the bank $2000 and the other deciding
>>  that the bank owes you $3000, and they are *both right*. That is,
>>  they are both conforming to published specs which are supposed to
>>  guarantee that meaning is preserved, and they both use logically
>>  secure methods, they both have checked their proofs, and both proofs
>>  are guaranteed to be correct, and they use the same premises; but
>>  they disagree. That is what happens when DB reasoning hits paradoxes.
>>  That could happen right now if one of then uses RDF and the other
>>  uses DAML+OIL and they follow the letter of the published specs.
>>
>>  The point is that we are not here worrying about whether or not the
>>  software terminates. This isn't to do with computability; its to do
>>  with what the actual data *means*.
>>
>>  >  That is, in theory there's
>>  >a huge difference between a paradox and a problem that will simply
>>  >take 4 hours to terminate, but operationally they're both just systems
>>  >that go off into the weeds.  The user presses "stop" and everything's
>>  >fine again.
>>
>>  There is so much wrong with this that Im at a loss to cover it all.
>>  First, please stop talking in kiddie metaphors ("weeds" can mean
>>  nontermination, bad data, bad reasoning, goodness knows what else).
>
>Sorry, around these parts, I think it's a well understood term meaning
>(as Microsoft would say) the system (or application) stops responding.

OK, I see. But this only makes sense when applied to a situation 
where a user is interacting with an application. The SW isn't going 
to be like the PC situation, however: its going to consist of agents 
running without human intervention, taking significant - 
millions-of-dollars kind of - decisions, at electronic speeds. There 
simply won't be time to even see if there are any weeds there or not.

>It's a user-experience term.  Software can fail in two ways for a
>user: not responding or wrong results (if we lump in reporting an
>error when it should not, and crashing, as sorts of wrong results).
>You're right that wrong results is worse than not responding.

In my vision (which I absorbed from Tim B-L and Jim Hendler, chiefly) 
of the main business of the SW, the users aren't even around most of 
the time. They are like a kind of supreme court, who only get 
involved in the business under the most extreme conditions. Most of 
the activity happens too fast for any user to even know about it.

>  > Second, paradox doesn't mean nontermination, cf above. Third, there
>>  is no user to press "stop" on the SW, right? (Isn't that the whole
>>  point of the SW, as opposed to the WWW? Thats why the 'A' in 'DAML'
>>  is from 'agent'.)
>
>Agent behavior can be connected back to users in many situations.

Maybe this is the main point of disagreement. Of course all agent 
behavior can be 'connected back' to some human user in some sense or 
other, but the actual run-time activity of most agents isn't going to 
involve any human users at all. It cannot possibly do so; there 
simply isn't going to be enough time. That is what the SW is *for*; 
to enable software to do some of our business for us. That is the 
only way that a new economy can work. The incremental profit from any 
one of these transactions is going to be miniscule; fractions of a 
cent. The only way to make big bucks is to be able to do so many of 
them so fast that those tiny increments add up to large numbers, so 
that profitability is directly linked to communication speed (and to 
Moore's law, hopefully).

>  > Fourth, the issue is not stopping, but what to do
>>  with contradictory information that isn't in fact a real
>>  contradiction but arises from a contradictory specification.
>
>Yes.  I understand that problem.  I fail to see why it's unavoidable
>with my design.
>
>>  >2.  I don't really like systems going off into the weeds.  And I don't
>>  >see why they have to, if we're careful about the feedback loop.
>>
>>  What feedback loop? (What the hell are you talking about?)
>>
>>  >  That
>>  >is: reasoners should not look for their inputs in their own outputs.
>>
>>  1. Who said anything about this??
>
>I'm trying to have a discussion about operational, observable aspects
>of a system we are trying to design.  I find that kind of discussion
>much more productive in reaching consensus, and the history of the W3C
>and IETF supports this approach.
>
>>  2. Why should they not, in any
>>  case? If a reasoner uses valid reasoning processes to draw a
>>  conclusion, then why should it not go on to use that conclusion to
>>  draw other conclusions? (Ever hear of forward reasoning?)
>
>You know I've heard of forward reasoners, and I think we've even
>talking about some forward and backwards reasoners I've written.

Right, I was being sarcastic. Apologies.

>Here's the loop I'm talking about, which is different from normal
>chaining:

Indeed, this is a very different kind of 'loop'. I wouldn't even 
describe this as a loop, myself.

>
>    1. RDF triples describe FOL syntactic structures
>    2. Those syntactic structures are extracted and conjoined
>       with the simple structures in the RDF itself.

What does that mean? The FOL isn't RDF, right? So how can it be 
conjoined in the RDF (??) (Or do you mean, the FOL transcription of 
the RDF is conjoined (in FOL) to the extracted FOL?)

>    3. FOL reasoning is performed   [ with FOL it's a little unclear
>       what the goal might be, but I think that's not relevant here ]
>    4. That certain new RDF triples are true may be inferred; these
>       triples are made available to querying clients.   BUT IF YOU PUT
>       THEM BACK IN (1), where they are scanned again for new FOL
>       syntactic structures, THEN you raise the spectre of the truth
>       predicate and paradoxes.

I see what you mean, but that isn't the basic point. The basic point 
is that the published specs for RDF and for DAML both assign meanings 
to the same pieces of RDF, and these meanings disagree, which is a 
bug in the published specs.

But to return to your example, how are you going to ensure that they 
do not get 'back in' to the inference process? In general, once some 
assertions are made or some conclusions published, they stand on 
their own. The publisher has no control over how they will be used by 
any other reasoner, and can expect no more than that any conforming 
reasoner will perform valid inferences on them. There is no way to 
say in RDF 'do not insert these into an RDF reasoner again'.

There is an old slogan often attributed to Bob Kowalski: algorithm= 
logic plus control.  The point for us is that RDF is all logic and no 
control, so it is fundamentally unlike sending code around. You can't 
say what shall or shall not be done with it.

In any case, why should one not put them back in? After all, they 
were validly inferred, right? And if they describe FOL syntax, than 
those FOL expressions were validly inferred (in FOL). So on what 
basis do you insist that further conclusions should not be drawn from 
them?

BTW, another point about step 4 is that in FOL, one might be able to 
infer other things than that RDF triples are true. For example, you 
might be able to infer that they are false; or that some disjunction 
of them are true, or that some of them imply some others. How will 
you report those kind of results back as RDF triples?

>  > >Can that loop be avoided if there are two reasoners...?  Hm.  I think
>>  >so, but it might be expensive.
>>  >
>>  >If you still say that wont work, is there some system I can construct
>>  >(in code or just detailed specification) to demonstrate it will?  Like
>>  >finishing up my FOL-encoded-in-RDF system?  If I can have RDFS and FOL
>>  >reasoners properly attached to the same database, would that be
>>  >convincing?
>>
>>  That would convince me that I was right, since those two reasoners
>>  couldn't possibly draw the same conclusions from the DB.
>
>I'm saying such a system could be created that would not produce
>incorrect results.   (Incorrect in layman's terms, like the
>$2000/$3000 error.)

And no matter what RDF-valid operations were performed on it, 
possibly together with any piece of RDF from anywhere else?

>So if I did it, the burden would be on you to provide some input which
>would prodce obviously wrong outputs.

No, the burden would be on you to show that I couldn't possibly do 
that. I want to KNOW that the system will not produce wrong answers. 
Empirical evidence or lack of imagination is not good enough for B2B 
transactions that may involve millions of dollars per second.

>  And you are sure you could do
>it easily.  Right?

As a matter of fact, I think I probably could; but even if not, I 
wouldn't trust my bank account to it.

Pat
-- 
---------------------------------------------------------------------
IHMC					(850)434 8903   home
40 South Alcaniz St.			(850)202 4416   office
Pensacola,  FL 32501			(850)202 4440   fax
phayes@ai.uwf.edu 
http://www.coginst.uwf.edu/~phayes
Received on Tuesday, 15 January 2002 12:20:28 UTC