Re: What do the ontologists want from pat hayes on 2001-05-19 (www-rdf-logic@w3.org from May 2001)

From: pat hayes <phayes@ai.uwf.edu>
Date: Fri, 18 May 2001 22:16:58 -0500
To: Stefan Decker <stefan@db.stanford.edu>
Cc: www-rdf-logic@w3.org
Message-Id: <v04210138b72b8305c028@[205.160.76.183]>
>Hi Pat,
>
>>>Rather we focus on small subsets and worry how to make them interoperable.
>>
>>What exactly does 'interoperable' mean? Does it imply mutually 
>>consistency, for example? (If not, what does it mean?) If so, then 
>>it would seem to presume that the people/agents/thingies in these 
>>small subsets are at least using a language to communicate with one 
>>another that has a clear notion of mutual consistency. And that 
>>requires a semantics.
>You are arguing in the abstract. Let's look into a more concrete example,

OK, but I will hold you to that. Read on.

>eg. the scenario that Tim, Jim and Ora constructed in the Scientific 
>American example:
>http://www.scientificamerican.com/2001/0501issue/0501berners-lee.html
>
>"At the doctor's office, Lucy instructed her Semantic Web agent through her
>handheld Web browser. The agent promptly retrieved information about
>Mom's prescribed treatment from the doctor's agent, looked up several
>lists of providers, and checked for the ones in-plan for Mom's insurance
>within a 20-mile radius of her home and with a rating of excellent 
>or very good on trusted rating services."
>
>Lets translate the to the actual data flow on the web:

OK, but the data flow doesnt interest me; I was talking about the 
content being transmitted and how it is encoded.

>Lucys agent contacted the doctors agent.
>The doctors agent is a webservice the doctor offers on the Web.
>The webservices understands a certain query language and delivers 
>the resulting data in
>a single, simple data schema (read: Ontology).

OK; now, lets think about what that query language needs to be able 
to say. It can talk about distances from locations, insurance 
providers, prescribed treatments, and ratings, for a start. 
Presumably these concepts are not going to be incorporated into the 
very syntax of the scheme - if they were, these agents wouldnt be 
able to talk about anything else. So the notation through which this 
information is conveyed must be capable of supporting inferences 
involving some facts about quite a rich variety of entities. It must 
be capable of expressing disjunction (to be able to say that any of 
the providers in the list will do), quantification over numerical 
ranges (to get the 20 miles right), arithmetic (to compare distances 
and costs), and negation (to be able to infer that some providers are 
not in plan.) Maybe some of these can be hacked around in various 
ways, but this seems overall like enough of a semantic burden to take 
us well beyond current DAML+OIL expressiveness, say.

>Then Lucy's agent looked up several lists of providers, probably from a
>Yellow Pages webservice, which again understands a certain, simple 
>query language
>and provides data in a simple data format.

A Yellow Pages service must be capable of receiving queries about 
almost any topic under the sun. I would hesitate before calling this 
a 'simple data format'. This would be a large-scale research 
challenge, well beyond the current state of the art in ontology 
design.

>Then each provider is contacted. Same game: each provider 
>understands takes a simple
>query language and provides data in a simple data format.

What makes these so 'simple', in your view? I see hard problems 
everywhere here. If all the medical providers have agreed on a 
medical-provider-ML format, then of course things might be relatively 
simple for the agents, but the queries are likely to be pretty 
complex in any case, and something has to be able to translate from 
more general-purpose formats to this hypothetical medical-provider 
format. None of this is 'simple'.

>We are not talking about large, sophisticated ontologies - we are 
>talking about
>domain models for small domains and services.

General-purpose reasoning, even about small domains and services, is 
NOT simple, and it does require sophisticated ontologies. I think you 
are falling into a well-known fallacy that AI has learned to avoid: 
the idea that things that people find easy must be relatively easy to 
hack up in simple terms.

>The challenge is now: we have 1 Billion different simple query 
>languages and data structures.
>What is the common ground that we relate each data set to each other?
>We are not really talking about semantics here - that is another question.

No, you are talking about semantics. How do you think that a billion 
data formats are going to be made consistent without considering 
semantics? (Perhaps 'semantics' means something different in database 
land?)

>We are talking about the foundation that is necessary
>to relate one 1 Billion webservices to each other

That "relate" means SEMANTIC relationships. If all we need to do is 
connect them without caring what they mean, they can all just use 
HTML. The point is to connect their CONTENT.

>and that saves one to write
>converter from each of the 1 Billion webservices to each other.
>The solution is to come up with a joint data model  - and guess now 
>- yes, graphs.

Graphs are merely a notational device. They are not a data model (in 
any useful sense). They can represent anything, as we have all known 
for a very long time, but (for all but very simple data) the 
conventions that define the meanings of those representations are not 
themselves in the graph: they are encoded in the labellings of the 
nodes and arcs.

>The database community has given the answer a couple of years and
>came to graphs as a data representation mechanism.

Congratulations. You were about a century late, but I'm glad y'all 
finally made it.

>The basic idea is
>that every kind of data can be represented as graphs. Thus this provides
>a common ground and allows integration algorithms work easily with
>multiple sources. Of course this is NOT a solution to resolve semantical
>differences - it is just the necessary first step to provide a common
>infrastructure.
>Have I made clear, that I'm not talking about semantics here?

If you are not,  then that makes what you say irrelevant, since 
without semantic connections it is *trivial* to create a "common 
infrastructure". Plain ASCII text could be a "common infrastructure", 
if we don't care what it means. (Hey, it was good enough for 
Gutenberg, why not?) Or arbitrary graphs, as you seem to prefer, 
though it provides no great advantage over text strings. Any graph 
can be encoded as a table of textual triplets, after all, with only 
linear cost.

>Should I repeat it?

It won't get any better the second time.

......

>
>From database research, it is well known, that semi-structured data
>(a graph form) is useful for mapping between heterogeneous datasources.
>(see eg.

Thanks for the pointers, which I have (rapidly) scanned. As far as I 
can see, the following is mostly concerned with data in the sense 
that it is retrieved for human use, so that for example data formats 
should be 'self-explaining' in ordinary language text, ie 
self-documented. But this is of no interest for the Semantic Web, 
surely, which is supposed to be allowing interoperability between 
mecahincal inference engines, not human readers.

>Papakonstantinou, Y.; Garcia-Molina, H.; Widom, J.
>Object Exchange Across Heterogeneous Information Sources
>1994,ICDE '95
>http://dbpubs.stanford.edu/pub/1994-8

....

>This following paper shows that there is much more to do in this space.
>There is a lot of fine structure here, that needs to get exploited, which
>really helps to resolve semantic differences in a cost effective way.
>Also this paper hardly scratches the surface.
>A Layered Approach to Information Modeling and Interoperability on the Web
>by Sergey Melnik, Stefan Decker
>ECDL 2000 Workshop on the Semantic Web
>21 September 2000, Lisbon Portugal

As far as I can tell from reading this paper, we are in broad 
agreement. You also refer to the translations of more complex logical 
forms (such as lists) as "implementations" in triples, at any rate.

However, you say a number of things in this paper that you really 
shouldnt have said. For example:

" Terms and expressions in these languages are first-class objects 
that can be manipulated on the object layer. In this way, 
applications can dynamically learn the semantics of previously 
unknown languages. "

which is just fantasy. (But now I have some idea where Tim B-L gets 
some of his wilder ideas from, maybe?)

"Reification of links and associations..... These two kinds of 
reification provide the necessary prerequisites for computational 
reflection, i.e. the capability for a computational process to reason 
about itself [Smi96]."

Well, necessary, but nowhere even close to sufficient, so this is 
very misleading. You need, in addition, at least upward and downward 
reflection and a truth-predicate, an ability to quantify over the 
reified syntax, an ability to describe the structure of the reified 
syntax and probably some way to combine a least-fixed-point semantics 
for reflection termination with a model theory for the Krep language 
(a theoretical task that as far as I know is beyond even the 
Scott-Plotkin semantics for lambda-calculus.)  As far as I know, this 
has never been implemented in any working system. LISP is probably 
the closest, but itis purely a functional evaluation language, and 
doesnt have quantifiers, so to call it 'reasoning' is stretching the 
terminology.

Pat

---------------------------------------------------------------------
IHMC					(850)434 8903   home
40 South Alcaniz St.			(850)202 4416   office
Pensacola,  FL 32501			(850)202 4440   fax
phayes@ai.uwf.edu 
http://www.coginst.uwf.edu/~phayes
Received on Friday, 18 May 2001 23:16:59 UTC