Re: added diagrams to "Using XML Schema Data Types..." from pat hayes on 2001-02-11 (www-rdf-logic@w3.org from February 2001)

From: pat hayes <phayes@ai.uwf.edu>
Date: Sat, 10 Feb 2001 19:37:58 -0600
To: Dan Connolly <connolly@w3.org>
Cc: www-rdf-logic@w3.org
Message-Id: <v04210101b6a9d452184a@[205.160.76.240]>
>
>[...]
> > >Another way to state this principle is that
> > >the knowledge contained in two documents, X and Y,
> > >is always the conjunction of the knowledge in X with
> > >the knowledge in Y. To allow X to change what Y says
> > >in some non-monotonic way doesn't seem scalable/workable
> > >to me.
> >
> > two points:
> >
> > (1) you need to get clear about what counts as a document and what
> > you take X and Y to be *about*. (content language or meta-language,
> > see above.) If 'documents' are any amount of information about
> > anything, then this is an impossible requirement. (Suppose I want to
> > say 'either X or Y'. If I conjoin X or Y then I've said too much; if
> > I don't include them then nobody is going to know what I'm talking
> > about.)
> >
> > (2) Jim Hendler is right. If the meaning of Y depends in part on the
> > meaning of X, that doesnt automatically produce nonmonotonicity. It
> > just means that one has to use X to figure out the meaning of Y. The
> > monotonicity would be produced, maybe, if one insisted on making
> > assumptions about the meaning of Y without knowing X, which might
> > later turn out to be wrong. But sometimes you can wait until you do
> > know, or report back to something else that you can't make progress
> > until you are told, or whatever. I agree these complexities make for
> > boring reading, but I don't think that there is any way out, in
> > general: we are going to have to deal with things like this whether
> > we want to or not, ultimately because INFORMATION is like this. It
> > just is, however we encode it. Life is not a monotonic conjunction of
> > cherries.
>
>I am not convinced; I'm not even quite sure we're talking
>about the same thing.

Maybe not (wouldnt be the first time :-)

>I'm not talking about any deep
>sense of what the formulas mean; I'm just talking about
>knowing what formula you're looking at without having
>to look anywhere else.

Ah, OK: but thats not the same as saying that all you can do is 
combine by conjunction. But even for this case I don't think your 
principle is going to work out in practice.

>Let me try a few mundane examples
>from programming langauges to illustrate...
>
>Consider ex1.c:
>
>	#include "otherThingy.h"
>	#include <stdio.h>
>
>	main(int argc; char **argv)
>	{
>		if(OTHER_THINGY_CONSTANT < 10){
>			printf("x");
>		}else{
>			printf("y");
>		}
>	}
>
>The meaning of ex1.c -- what program it contains, or how
>to compile it to ex1.o -- depends on otherThingy.h. It's
>either a program that prints "x" or a program that prints "y", but
>we're not sure which without looking in otherThingy.h.
>On the other hand, consider ex2.c:
>
>	int printf(const char*, ...);
>	int otherThing();
>
>	main(int argc)
>	{
>		if(otherThing() < 10){
>			printf("a");
>		}else{
>			printf("b");
>		}
>	}
>
>we can tell what this compilation unit means; we can
>compile it to ex2.o. It's a program that calls the otherThing()
>function, and depending on the result, prints "a" or "b".

Sorry if Im missing something blindingly obvious, but I don't see any 
clear difference between these two. In both cases, what happens 
depends on some piece of information from 'elsewhere' that has to be 
somehow got at before a conditional can be evaluated. What's the 
important difference?
[Later. Thinking about it more, maybe the point is that the 
*compiler* would be forced to go off and find other data in the first 
case, but not in the second (where it justs complies a runtime call, 
so that the *interpreter* will have to go off and find some other 
data when it runs, but that is it's problem.) Is that the point?

If so, I am interested in why you think the compile/interpret 
distinction is so important here. Suppose we just call it all 
'processing', so that the overall process starts with the actual 
program text, and ends ultimately with the something useful 
happening, with various stages in between, some of which require 
information from elsewhere. Some of these stages can't be done until 
others are completed, so there is a partial order on the set of the 
things that need to get done; but there isnt any principled 
distinction, in this picture, between compilation, interpretation, 
running code, optimisation, etc.; they are all processes that might 
need access to other information. You seem to be drawing a sharp line 
between some of these and others, but I don't see any deep reason to 
do that. As long as the overall process needs to access information 
from elsewhere, we are in the same basic pickle. Either we wait until 
we have it, or we guess and run the risk of guessing wrong.]

>I'm happy to have one RDF/DAML+OIL document interact with
>another in ex2 style, but I'm not happy for the syntactic
>meaning of an RDF/DAML+OIL document to be dependent on another
>in the way that ex1.c depends on otherThingy.h.
>
>I don't see how you can claim that there is something fundamental
>about information that forces our language to include
>garbage like the C preprocessor. Surely we can design our
>language so that the formulas contained in each document
>are syntactically evident, no?

In the logic world, you can as long as you don't have any semantic 
sorting: that is, as long as the vocabulary can be given a kind of 
universal categorisation. This is the standard way of doing logic in 
textbooks, where there is a global distinction betwen relation 
symbols, constant symbols and function symbols, and every symbol can 
be classified just by looking at it. No further declarations needed. 
But as soon as you want to be able to introduce new symbols 
relatively freely and *declare* what kind of use they are supposed to 
have (like in CycL, as in your example below) then how the parser is 
going to treat them at one place will depend on how they are declared 
at another place.

>KIF has this property, after all. You either write
>
>	(size x "10")
>or
>	(size x 10)
>
>and there's nothing that can be written in some other
>file (or some other part of this file, for that matter)
>that will change the formula that a given piece
>of KIF represents.

Well yes, but only because KIF is unsorted. (So in KIF you can write 
both of those and it will just draw its wierd conclusions quite 
happily.) And even then there are some oddities. What if you write
(size x "10") in one place but
(size x "10" "7") in another? KIF in fact allows that, but most 
logics don't. But in any case, a lot of the oddities of KIF actually 
arise from this syntactic looseness. It even allows you to use the 
same name as a relation and as a function. But it is a very peculiar 
language in this regard. Most logical languages are much more like 
CycL.
The revised KIF will have an unsorted 'core' and a sorted 
'extension'. I suspect many people will want to use the sorted 
version, and its parser will need to know the sort declarations for 
the vocabulary being used. I bet that people will do things like make 
files of useful declarations and then refer to them in the headings 
of ontologies, for example, just as a useful way to keep multiple 
axiom sets lexically consistent.

Even a language with implicit declarations (ie you just use the 
symbol in some way and the parser can figure out what it is supposed 
to be) will give rise to problems, since the same symbol might be 
used one way in one place and a different way in another, and when 
this kind of clash is found then something like a parser neeeds to be 
involved in the subsequent debate, since what the parser does in 
future will depend on the outcome.

>CycL is an example of making the choice the
>other way: the syntax of formulas depends on context;
>this
>	(both (tall Fred) (green TheCar))
>might be a two-place predicate applied to two
>function terms, or a new logical connective
>applied to two predicates, depending on whether
>(isa both Connective) is true when it's parsed.
>cf http://www.cyc.com/cycl.html#logical_connectives
>
>A related issue came up in the design of
>XML namespaces. It might seem more convenient
>if you could just open a bunch of namespaces
>and use names from any of them ala:
>
>	<aDoc xmlnss="http://example/fruits
>			http://example/vegetables
>			http://example/minerals">
>	<apple/>
>	<tomatoe/>
>	<sandstone/>
>	</aDoc>
>
>but that's no good because it's not syntactically
>evident wither tomatoe is a fruit or a vegetable
>(or... if you're a dumb computer: a mineral).

Well, I agree that is kind of dumb, since a name could be in  more 
than one namespace.

>This was part of a whole requirements document
>about extensible languages:
>
>  Lack of ambiguity
>
>  Some programming languages allow one to introduce
>  identifiers from new name spaces in such a way that
>  it is not possible to know which namespace a local identifier
>  belongs to without accessing both the module interface
>  specifications and checking which one has with the highest
>  priority, or  most recently in the document, redefined a given
>  local identifier.
>
>  This may have some uses in a programming language such
>  as Java[Java], but it has a serious flaw in that when one
>  module changes (without the knowledge of the designers of the
>  other module), it can unwittingly redefine a local identifier
>  used by the second module, completely changing the meaning of
>  a previously written document. Clearly, in the Web world in
>  which modules evolve but documents must have clearly defined
>  meanings, this is unacceptable.  Contrast with Modula-3,
>  where all names are either lexically scoped or fully
>  qualified [SPwM3].
>
>  -- http://www.w3.org/TR/NOTE-webarch-extlang#Ambiguity

You didnt quote the next two lines:

The syntax must unambiguously associate an identifier in a document 
with the related schema without requiring inspection of that or
  another schema.

Yes. It must associate the identifier with the appropriate schema; 
but then the *meaning* of the identifier might depend on what is in 
that schema, no?

BTW, this introduces yet another issue. The key word in the above is 
'unwittingly', right? That is, a 'reasonable' change to one module 
might have consequences for another module which werent intended by 
the agent making the change. But you seem (?) to be saying that you 
want it to be *impossible* for a change in one module to alter the 
meaning (?parsing?) of things in another module in any way at all. 
That's a much stronger requirement, and I think unworkable.

But again, I don't see any way around this in general, other than 
either making the overall langauge so loosely coupled that nothing 
depends on anything else (which means that all we can say are 
conjunctions of ground atoms) (hmmmm, like RDF....) or else insisting 
that the modules contain all the information necessary to interpret 
everything in them (and then who needs a web?)

> > > > In particular, it seems to me that your proposal has exactly the same
> > > > problem.  You also depend on external information on how 
>properties should
> > > > work.
> > >
> > >But the various bits of information accumulate in
> > >the normal monotonic fashion; I don't have
> > >the situation where I initially parsed it as
> > >a string, but then I discover I was wrong or something
> > >and I have to undo stuff.
> >
> > If you don't parse until you know how to parse, you won't get into
> > this position. So what you really want, I think, is to know (locally
> > and absolutely, ie monotonically with respect to new info. from
> > elsewhere) whether or not you do have enough information to parse
> > monotonically. (In many languages, for example, a parser on finding
> > an identifier will look it up in a table of declared forms and if it
> > isnt there, will post an error condition and refuse to compile.) But
> > that's not the same as requiring that you must have, locally, enough
> > information to parse. Same point applies to other things as well as
> > parsing, of course.
>
>Yes, we could go there without introducing the problem
>I'm trying to avoid, but I don't see sufficient reason to.

Well, how about allowing a syntactically typed ontology language, 
where someone can post a large collection of vocabulary-typing 
information somewhere, give it a URI, and tell people that for 
purposes of X they should use the vocabulary defined there? Im sure 
this kind of thing is going to happen on the semantic web, just like 
everywhere else in the data-standards world. A logical parser will 
need to be able to retrieve the info at that URI before it can 
completely parse something that is compliant in this way, but why 
would that give you heartburn?

Pat Hayes

PS. Another reason to think that this kind of thing is going to be 
widespread is that there are now a number of approaches to 'ontology 
merging' based on category-theory ideas ('informorphisms' between 
axiomatic theories and suchlike, eg check out 'specware' 
http://www.kestrel.edu/home/projects.html ) This kind of description 
assumes very richly sorted logics, since the mappings often depend on 
the sort declarations, eg specware will distinguish 0 from 0.0 
because otherwise its mappings get twisted (an oversimplification, 
but you get the point.)

---------------------------------------------------------------------
IHMC					(850)434 8903   home
40 South Alcaniz St.			(850)202 4416   office
Pensacola,  FL 32501			(850)202 4440   fax
phayes@ai.uwf.edu 
http://www.coginst.uwf.edu/~phayes
Received on Saturday, 10 February 2001 20:35:04 UTC