Where is the SW going? (was Re: Expressiveness of RDF as Rule Conclusion Language (was Re: What is an RDF Query? ))

>(+cc:eric miller)
>

Hi Dan

>On Fri, 12 Oct 2001, Pat Hayes wrote:
>
>>  >On Fri, 12 Oct 2001, Peter F. Patel-Schneider wrote:
>>  >
>>  >>  And just how are RDF applications supposed to determine when to do this
>>  >>  merging?
>>  >>
>>  >>  peter
>>  >
>>  >By using all that DAML+OIL good stuff you've been slaving over, of
>>  >course :)
>>  >
>>  >All DAML+OIL instance data is RDF,
>>
>>  Actually I would say not, though it's only a matter of terminology.
>
>Yes, its a matter of terminology. I'm happy you didn't say branding,
>though doubtless that's involved too. But as well, it goes to the heart of
>some misunderstandings about the RDF project, about what we've been trying
>to do with the RDF and SemWeb effort, and about the relationship between
>the RDF core specs and the specs that will join it to flesh out the RDF
>family of specifications.

Wow, "RDF family"? That's a new term in my lexicon. Sounds like a TV series.

>For some, "RDF" is just the triples stuff, our
>"pipsqueak of a language"; for others, it is this whole (possibly insane)
>project of rolling out an (increasingly expressive) framework for
>describing stuff in the Web.

That last point of view seems crazy to me. Or rather, if that is what 
"RDF" means, then everything that has been done that has been called 
'RDF' is meaningless. With this grandiose understanding of 'RDF', 
there is no such thing as a grammar for RDF, a parser for RDF, a 
model theory for RDF, etc.. RDF, in this view, isn't a formalism or 
something that could possibly be standardized; its a kind of grand 
aspiration of the human spirit, or something. Whatever you are 
talking about, it isn't the RDF that I'm working on.

>  I'm firmly in the latter camp, perhaps
>because my ambitions for RDF have long included the things we're now
>calling "Semantic Web" technology, and perhaps because I prefer the phrase
>"resource description framework" to our new slogan "Semantic Web". For
>one, it lends itself better to nouning: "an RDF file" versus "a Semantic
>Web file".

There is no such thing as a 'semantic web file'; and with your 
interpretation of 'RDF', theres no such thing as an RDF file either, 
seems to me. Or better, there's no way to tell if a file is an 'RDF 
file' or not, in this way of talking.

>It is easier and more informative to say "all DAML+OIL files
>are RDF files"

Its easier, but it is dangerously misleading if not immediately qualified.

>than to say "all DAML+OIL files are Semantic Web files";

That doesn't mean anything.

What's wrong with saying they are DAML+OIL files? That is short, 
accurate, and informative.

>we
>need some umbrella terminology that goes beyond branding to say something
>about what all these components (of the description framework) have in
>common. For me, they're all RDF, and they all share the intentionally
>simplistic RDF worldview of resources, relationships, URIs etc.

Hold on. Relationships are used by virtually every formalism, graph 
syntax was invented by C.S.Peirce around 1880, and 'resource' just 
seems to be a W3C name for 'entity' or 'individual', so the only 
really distinctive thing about RDF is the use of urirefs. Is that 
what constitutes the 'worldview' you are referring to, or is there 
more to it?

More to the point, why must they all have 'something in common'? 
Nobody wants to know what HTTP and FTP have in common, because it 
doesn't matter. All the SW needs is ways to *translate* between 
different formalisms, not that they all be the same under the hood. 
They aren't the same, in any case, eg RDF1.0 and RDF/XML are distinct 
languages already, not to mention N3.

>But I can
>see that others are using the acronym differently. I guess it is for W3C
>to clear up the confusion; an update to the RDF FAQ is looming, as is the
>RDF Core primer.
>
>(anyway, here's my view...)
>
>
>>  It is encoded in RDF syntax, but its meaning isn't specified by RDF.
>
>We still call it RDF.

I don't!  And this is absolutely central. If the RDF specs don't 
specify a meaning, then that meaning is NOT in RDF. That's what 
'being in RDF' means. Now, a particular piece of RDF might in some 
broader sense 'have' some meaning that is invisible to an RDF 
processor - ie something that interprets RDF graphs according to 
their RDF-model-theory meanings and draws RDF-valid conclusions from 
them, say -  but it is very important not to say that this meaning in 
'in RDF', because in the only sense of meaning being 'in' a language 
*that is available to a mechanical process*, it isn't. At best you 
might say that it is RDF-encrypted, or something; its there, but 
completely hidden from the RDF layer.
One problem with saying that these 'hidden' meanings are 'in RDF' is 
that this phrase then becomes meaningless in isolation, since 
*anything* can be 'in RDF' in this sense. It can also be 'in PSML', 
where PSML is the language defined by the following BNF:
<psml> ::= <unicode-char>|<psml>*
(Proof: serialize your favorite notation into some subset of unicode 
and record the serialization as a character string. QED) So this is 
not a useful notion. Saying that some meaning is "in L", where L is 
some formal language with a formal semantics, is usually taken to 
mean that that meaning is accessible to an engine that knows (only) 
the semantic rules of L. You ought to be able to figure out the 
meaning from the L-expressions plus what you can learn from reading 
the L manual. If you need to go beyond what it says in the L manual 
to figure out the meaning, it's not "in L".

There is a very basic, almost philosophical, point underlying this. 
In a very real sense, on the SW, there IS NO CONTENT. There is only 
language; and for the SW that must be processable by software, there 
is only formal language. The "content" is what the writers and 
readers of the languages intend, but there is no way to send an 
intention along a wire. Now, people can intend all kinds of stuff, 
since people are very smart and very subtle. But when, as in the SW, 
at least some of the readers and writers are programs, they have no 
chance at all of guessing at all the subtleties that a human might 
have intended. All they can do is use the rules they have built into 
them to extract as much meaning from the marks we send them as they 
can. If we humans encode other stuff into those marks that go beyond 
the rules which were used to build the software agents, they haven't 
got a chance of knowing about it: we might as well ask them to be 
telepathic. So we have to be very careful what we say about which 
rules are supposed to be being used to interpret the formalisms. RDF 
and DAML+OIL are based on different assumptions; they are not the 
same, and there is no way to encode the latter in the former. (There 
is a way to *extend* the former to the latter, of course, but its a 
real extension. DAML+OIL goes beyond RDF. In fact, RDFS goes beyond 
RDF, which is why the semantic conditions on an RDFS interpretation 
need to be stated separately in the model theory.)

>I have some very simple pieces of content, both
>instance level and schema (see example below) whose meaning isn't
>captured by DAML+OIL.

OK, let me take you up on that. How IS it captured, then? (It has to 
be captured *somehow*, right?)

>For eg., we may want to use RDF/DAML to talk about
>"a util:Document whose dc:title is 'foo' and whose dc:creator is the
>foaf:Person whose foaf:mbox is mailto:webmaster@example.com". DAML+OIL
>can't distinguish between the cases where (at any one point in time) there
>is at most one entity with a given personal mailbox; and the case where
>across-all-time that property can only ever have a single value. But RDF
>tools, including but not limited to those that understand DAML+OIL, 
>can still do
>useful things with this kind of data, even if there are aspects of its
>meaning that are not captured in the RDF or DAML+OIL formalisms.

Oh, sure, of course. It may well be that *part* of a DAML+OIL 
formalization is RDF-accessible.  But that doesn't make the whole 
thing into RDF, any more than my quoting Pascal makes my entire essay 
French.

I would say that in this case, if I follow your example, that neither 
the RDF nor the DAML captures the intended meaning, since they both 
assume that urirefs denote like simple names. So although that 
time-relative 'meaning' might be in some human user's mind, it is not 
in fact in the RDF/DAML. If the human thinks it is, he or she is 
liable to be disappointed by the performance of the software.

>
>For "its meaning isn't specified by RDF" you may as well say "its meaning
>isn't specified by DAML+OIL" in many many cases. So we shrug and 
>admit that yeah
>sure, all aspects of meaning are not easily formalised at this stage of
>history.

Its more fundamental than that. We can PROVE that some aspects of 
meaning can NEVER be captured by very simple languages like RDF. (Eg 
you can't express disjunction or implication in RDF. There are 
extensions of RDF in which you can, of course. )

>  And we don't pin this on RDF, nor on DAML, it's just the way
>things are: meaning is only partialy captured by the mechanisms we're
>playing with here.

I would put it differently. The formalisms say what they say, and we 
can study that, and those are the only 'meanings' that we actually 
have available. In a very real sense, there is no other 'meaning' in 
the formalism. What you are calling 'meanings' are a kind of 
aspiration; those are things that we would *like* to be able to 
express in a machine-accessible way. But again, you can't send a 
research agenda along a wire.

The trouble with your way of speaking is that it suggests that 
'meaning' and 'content' are real stuff that can be somehow captured 
and put into a box, and the task is to get hold of more of it. I 
think that is seriously misleading. There is no end to 'meaning' in 
your sense. Whenever some aspect or part of it is formalized, it is 
always possible to think of some other aspect that is missing, 
because we are here really talking about something like human 
creativity.

>Despite all this, we need to deploy meaningful
>documents in the Web ASAP, without putting everything on hold while we
>wait for a formalism that can capture all of that meaning. We need a
>framework for getting incrementally better

Wait. That "incrementally better" sounds like progress towards some 
kind of ultimate goal. What is that goal? To capture ALL of meaning? 
Forgeddaboutit. To capture the same amount of meaning that a human 
could get by reading the web page? If so, then: 1. you are doing AI, 
see above, and I would advise against setting out to get AI done in 
the near future; 2. why bother? Humans are cheap these days anyway, 
and if the software was this smart then it could read  HTML; 3. 
surely what we want are things that are a bit less smart than humans 
but also a lot faster, less likely to get bored, more willing to do 
our bidding and not have ideas of their own, etc..; in fact, 'agents'.

>at describing resources in the
>Web; that is the 'resource description framework', RDF. The RDF approach
>to this has always been to sneak up on the problem bit by bit. The graph
>model provides a useful cartoon world view ("objects, types, properties,
>relationships; identifiers") that can be shared by more expressive parts
>of the system that get designed later.

Pity y'all didn't include connectives and quantifiers.

>DAML+OIL takes the RDF Schema world
>view of classes, properties and constraints, and it adds in a bunch of
>richness that reflects into the formalism things that RDF could
>previously carry but didn't explicitly acknowledge.

What does "could" mean? Are you saying that the RDF authors screwed 
up, or that they had DAML+OIL in mind all along, but just kind of 
forgot to mention all the picky little details?

(You know, to call the ideas of class, property and constraint "the 
RDF Schema world" is kind of silly. Surely nobody thinks that RDF 
*invented* these ideas, do they?)

>Now my point is just that DAML is in the exact same situation, there are
>meaningful constructs that can be carried through DAML without DAML
>realising the full meaning.

In a sense of 'carried through', that is correct; but its a trivial 
sense, because in this sense, any content can be carried through 
almost any language. All the language needs is strings and it can 
carry through anything by embedding PSML into it.

>And I'm not talking here about the
>reference/naming/denotation aspects of
>meaning that I've talked about before, though something similar can be
>said about that. Rather, I'm talking about aspects of the meaning of our
>content (eg. temporal issue raised below) which one might imagine _are_ in
>scope for some fancier Model Theory or Axioms to engage with. Just 
>in DAML 1.x we
>don't try. Does this mean that my friend-of-a-friend RDFWeb application,
>which uses the property whose URI is 'http://xmlns.com/foaf/0.1/mbox' is
>neither an RDF application nor a DAML application but something else yet
>to be named. Surely not.

If it relies on temporal changes in URI references, then yes, I would 
indeed say that really is using something not yet named, and not 
using RDF (though I concede that this is a very strict interpretation 
that I might be willing to relax in practice :-). What it would be 
really doing is something best described as MISusing RDF, ie using it 
in ways that are not sanctioned by its official meaning (and 
therefore are liable to be misunderstood by another RDF engine) but 
are nevertheless useful.

Let me hasten to add that I wouldn't want to stop people doing things 
like this; on the contrary, experimenting outside the box in this way 
is a very good way to discover the real limitations of any formalism 
and to get started on the process of designing a better one. But if, 
for example, your friend's application were to break because of those 
aspects that lie outside the current RDF formal model, and he were to 
sue the W3C on the grounds that RDF doesn't do what we said it would 
do, I would say that he had no case.

>Of course there are aspects of meaning, and
>specifically the meaning of that property, which neither RDF nor DAML
>captures. Nevertheless we can use the Web right now to successfully deploy
>and use descriptions of resources in RDF that employ the foaf:mbox
>property. Those descriptions don't cease to be RDF because the property is
>a particularly interesting one, or because there are rules that one might
>(eventually) formalise about its use which can't be written down in a
>Semantic Web language yet. That's why we called it a (description)
>Framework not a  (file) Format: it's a deployment strategy for easing all
>this stuff out of research labs

Well, this stuff has been out of the research labs for quite a long 
time now. Ever hear of 'databases' ;-) ?

>and into mainstream Web technology.  Slowly
>but surely... ;-)
>
>So when I say "all DAML+OIL instance data is RDF data", I mean
>(colloqially) that it has basically the same cartoon worldview: of objects
>identifiable by URIs, having URI-named classes and URI-named relationships
>to one another.

Oh, if that is all, then OK. But if "RDF" just refers to the use of 
URIs as names, then RDF1.0 isn't "RDF", since it allows anonymous 
resources.

>
>It is of course possible to produce such tangled representations
>(encodings of rules, queries etc for example) in RDF that the
>object/property/value worldview loses much of its utility. For that matter
>I could run something like "tar -c mailbox/* | binhex | gpg --encode >
>mail.txt", put that into a literal string in an RDF graph, then go around
>claiming that I had an RDF representation of my mailbox. I could, but I'd
>be silly rather than wrong. Sure at one level my mailbox is represented
>in, or carried through, RDF. It's just a rather useless representation.
>
>Similarly, there are representations-in-RDF (such as the person/mailbox
>thing below which (a) draw on aspects of Schema/ontology meaning that are
>yet to be formalised and (b) nevertheless make perfect sense as useful
>chunks of RDF instance data, couched in the objects/properties/values
>cartoon worldview.

Maybe I haven't been following you. If they make sense as RDF data, 
then *with that understanding of what they mean*, they are RDF. Sure, 
no problem with that. But that doesn't mean that DAML+OIL *is* RDF; 
it just means that a piece of DAML+OIL makes some kind of sense to an 
RDF engine. We set it up that way.  But the sense that the RDF engine 
gets out of it isn't the same sense that a DAML+OIL engine would get 
out of it.

Now in fact, DAML+OIL is rather better integrated with RDF than this 
suggests, in that their model theories also line up rather well, so 
that one can view RDF(S) (without reification) as being a sublanguage 
of DAML+OIL, rather than just a formalism into which DAML+OIL content 
is embedded in some opaque way. That means that anything that an RDF 
engine would do would in fact also make DAML+OIL sense (though still 
not the reverse). But this took a lot of work and care; it doesn't 
come easily or naturally, so don't expect that this is going to be 
the normal case.  Temporal sensitivity isn't going to be that easy, 
for example; I think it is going to require re-doing RDF from the 
ground up, rather than extending it.

It really is impossible to pre-guess all the things anyone is going 
to want to say, and invent a single basic notation that will never 
need to be modified, only extended. Even adding context-sensitive 
datatyping to RDF will involve extending the Ntriples syntax, for 
example. The nearest anyone has come to such a thing is conventional 
FOL, which is a very stable region in the space of all expressive 
assertional languages. But for some purposes, FOL isn't really what 
you want either.

>
>>  >and RDF apps that are built
>>  >to know about even a subset of DAML+OIL can make good use of that when
>>  >doing data merging.
>>
>>  Well, they can if they are DAML-savvy, but then why don't you call
>>  them DAML apps rather than RDF apps?
>>
>>  >
>>  >For eg., consider the property http://xmlns.com/foaf/0.1/mbox
>>  >from the namespace http://xmlns.com/foaf/0.1/
>>  >
>>  >	[[
>>  >	FOAF is expressed as an RDF Schema, annotated with DAML to express the
>>  >	fact that a foaf:mbox uniquely picks out an individual.
>>  >	]]
>>  >
>>  >Excerpting from that schema:
>>  >
>>  >       <rdf:Property rdf:about="http://xmlns.com/foaf/0.1/mbox"
>>  >	rdfs:label="Personal Mailbox"
>>  >	rdfs:comment="A web-identifiable Internet mailbox associated
>>  >with  exactly one owner.
>>  >	This property is a 'unique property' in the DAML+OIL sense, in  that
>>  > 	there is at most one individual that has any particular personal
>>  >	mailbox.">
>>  >
>>  >	<rdfs:domain rdf:resource="http://xmlns.com/foaf/0.1/Person" />
>>  >	<rdfs:range
>>  >rdf:resource="http://www.w3.org/2000/01/rdf-schema#Resource" />
>>  > 	<rdf:type
>>  >rdf:resource="http://www.daml.org/2001/03/daml+oil#UnambiguousProperty"/>
>>  >         <rdfs:isDefinedBy rdf:resource="http://xmlns.com/foaf/0.1/" />
>>  >        </rdf:Property>
>>  >
>>  >Since we say the property is of type
>>  >http://www.daml.org/2001/03/daml+oil#UnambiguousProperty
>  > >we can use this knowledge in RDF-based applications
>>
>>  How does the RDF application know what the DAML expressions mean?
>>  (Should it know about all the other extensions to RDF that havnt even
>>  been invented yet?)
>>
>>  >-- for example merging
>>  >blank nodes where each node has a property with the exact same resource as
>>  >its value. In this example, merging nodes that stand for the individual
>>  >whose presonal mailbox is mailto:foo@example.com, perhaps.
>>  >
>>  >Aside: I could complain here that DAML+OIL gives us no mechanism for
>>  >guaranteeing
>>  >that the at-most-one-ness remains static in the face of time and change,
>>  >but that's probably a can of worms best opened in a separate thread.
>>  >DAML+OIL's "worldview" isn't one that explicitly acknowledges time and
>>  >change, and there are good reasons for this being the case. How this
>>  >relates to the need to deploy DAML+OIL ontologies in the Web is something
>>  >that looms rapidly, imho.
>>
>>  Maybe, but you can hardly pin this on DAML; *nobody* has really
>>  tackled this issue yet, AFAIK.
>
>It's not a matter of blame, it's a matter of layering. This whole thing
>we're building, the graph stuff, the simple schema stuff, the fancier
>ontology language, perhaps a rules language... what we're assembling is a
>framework for describing resources in the Web. RDF. That picture is 
>only coming
>together slowly, and DAML+OIL is a key component. There will be others.

I would agree with all this except for the degree of integration 
implied by that word "component". Imagine someone in about 1957 
saying "this programming stuff is really coming together; we have 
several of the layers sorted out; there's the 704 assembly code (that 
youngster Minsky has proven that it is a universal machine) and 
there's FORTRAN and IPL ... Pretty soon we will have all the 
components for the Machine Programming Framework, and then things 
will really start humming." In a sense they would have been right, 
but they also would clearly have been missing something important. 
(In fact I suspect that many folks at IBM did have something close to 
this attitude, deep down, which is why Bill Gates was able to con 
them so easily.)

>So going back to my original claim that all DAML+OIL instance data is RDF
>instance data: what I'm getting at is that to deploy this stuff for real,
>on Web sites, in browsers, palm pilots, everywhere, we need some
>stability even when the complete resource description framework is not yet
>finalised.

I don't grok this notion of 'finalised'. That sounds like finalising 
evolution, or something. The whole thing seems more open-ended to me. 
We put out tools and people start to use them, then people put out 
better tools, and other people use them, and so on. God alone knows 
what will happen, but is going to be more like a stampede than like 
herding cattle.

>Maybe the RDF project never will be finalised, but always
>pushing to get things out of the lab and into the Web mainstream. I hope
>so, fwiw.
>
>
>The 'description logic meets temporal logic' work is still
>in the lab, but please lets not plan to tell the world that there are RDF
>instance files, and DAML/OIL/WebOnt instance files, and
>WebOnt-PlusTemoralLogic instance files and who knows what follows
>after.

Why not? That is exactly what the world needs to know. If we don't 
tell them this we would be dishonest, because this is the truth. 
There ARE all these formalisms, and they all have their uses and 
limitations.

I think you have a vision of the SW as a kind of single integrated 
system where content is flowing smoothly along pipes, all encoded in 
W3C-sanctioned RDF. I have a very different vision, more like a kind 
of bustling market or bazaar, where agents are busily brokering 
meanings, and many different languages are being spoken. I see a kind 
of market economy of meanings, all happening at electronic speeds. I 
can imagine all kinds of new economic opportunities in this virtual 
semantic web world. For example, for a (small) fee per thousand bytes 
of unicode, I ("I" here is a program, of course, but like a 
well-trained truffle-hound, it gives all its earnings to its human 
owner) will undertake to translate anything in any of these 
notations... into any of these other notations.... Or, I will (for no 
fee, but in return for some small favor, eg that you undertake to 
transmit some small cookie for me to every other agent you know) 
undertake to find you a service which can translate your notation 
into some other notation. Or, I will (for a very significant fee) 
read anything written in one of these notations, check it for 
internal consistency and agreement with any US federal database, and 
then warrant that it has been so checked, and maybe (for a truly 
astonishing fee) accept any risk arising from such warrant. Or, I 
have a new notation which you can use (for a fee schedule that we can 
negotiate) and it will provide you with all these advantages...

Now, for this to work, I want to see every file out there branded 
with something that tells me - and I'm a piece of software, remember 
-  *exactly* what semantic rules I am supposed to use to interpret 
it. That way, everyone knows whose fault it is when things go wrong. 
If I use the semantics it referred to, then its someone else's fault; 
if I used some other one, I have only myself to blame.

(Some questions and answers.
Q. What if I don't know those rules? A. Well, then you need to get 
the content translated into some rule format you do know. Find a 
translation service, or ask the other site if it can translate for 
you.
Q. What if I know some better rules? A. Well, go ahead. Maybe you 
know more about that agent's rules than it knows; but you are taking 
a risk.
.Q. What if I know that my rules are less powerful than its rules? A: 
Then you are safe; but you have a responsibility to not mess up any 
content that you might not understand, particularly if you are going 
to tell anyone *else* that you derived a conclusion from that agent's 
sources. )
Q. What if it doesn't have a brand, but I find that I can read it and 
make sense of it? A. Well you are free to do so, of course, but if 
something goes wrong as a result, its not clear whose fault it was. 
(My guess is that this kind of question will eventually end up being 
decided by case law in civil tort cases, and that in any case a kind 
of 'reasonable practice' code of usage will develop in order to 
enable e-commerce to work properly. )

>If and when for example the description-logic-meets-temporal stuff
>gets more fully baked, and perhaps submitted to W3C, we'll likely 
>have some way
>of annotating my RDF Schema (or Web Ontology) at
>http://xmlns.com/foaf/0.1/ to better represent the meaning of the classes
>and properties I name there. Does that mean that you'll want me to call my
>instance data files something other than "RDF files" (DAML/OIL/Webont
>files...).

Yes, most definitely. If you call them RDF files and my old, dumb, 
RDF inference engine can't understand them, or misinterprets them, I 
may be *very* unhappy with you. Your lawyers may hear from my lawyers.

>Or maybe they're not even RDF (DAML etc) files today, since the
>semantics are not full captured by any Semantic Web schema/ontology/rule
>language that I know of. Or we could just worry about something more
>interesting than categorising the flavours of instance data, and get used
>to calling all this stuff "RDF".

Why not just call it "Semantic Web Stuff" and forget about publishing 
specs? That provides about the same amount of useful information to 
someone trying to write code.

Pat

PS, I just saw this wonderful quote from John Milton:
"When there is much desire to learn,
there of necessity will be much arguing, much writing, many opinions; 
for opinion in good men is but knowledge in the making."

-- 
---------------------------------------------------------------------
IHMC					(850)434 8903   home
40 South Alcaniz St.			(850)202 4416   office
Pensacola,  FL 32501			(850)202 4440   fax
phayes@ai.uwf.edu 
http://www.coginst.uwf.edu/~phayes

Received on Monday, 15 October 2001 16:02:53 UTC