- From: Pat Hayes <phayes@ai.uwf.edu>
- Date: Mon, 15 Oct 2001 15:02:30 -0500
- To: Dan Brickley <danbri@w3.org>
- Cc: <www-rdf-rules@w3.org>, <em@w3.org>
- Message-Id: <p05101003b7f0ab89717d@[205.160.76.193]>
>(+cc:eric miller)
>
Hi Dan
>On Fri, 12 Oct 2001, Pat Hayes wrote:
>
>> >On Fri, 12 Oct 2001, Peter F. Patel-Schneider wrote:
>> >
>> >> And just how are RDF applications supposed to determine when to do this
>> >> merging?
>> >>
>> >> peter
>> >
>> >By using all that DAML+OIL good stuff you've been slaving over, of
>> >course :)
>> >
>> >All DAML+OIL instance data is RDF,
>>
>> Actually I would say not, though it's only a matter of terminology.
>
>Yes, its a matter of terminology. I'm happy you didn't say branding,
>though doubtless that's involved too. But as well, it goes to the heart of
>some misunderstandings about the RDF project, about what we've been trying
>to do with the RDF and SemWeb effort, and about the relationship between
>the RDF core specs and the specs that will join it to flesh out the RDF
>family of specifications.
Wow, "RDF family"? That's a new term in my lexicon. Sounds like a TV series.
>For some, "RDF" is just the triples stuff, our
>"pipsqueak of a language"; for others, it is this whole (possibly insane)
>project of rolling out an (increasingly expressive) framework for
>describing stuff in the Web.
That last point of view seems crazy to me. Or rather, if that is what
"RDF" means, then everything that has been done that has been called
'RDF' is meaningless. With this grandiose understanding of 'RDF',
there is no such thing as a grammar for RDF, a parser for RDF, a
model theory for RDF, etc.. RDF, in this view, isn't a formalism or
something that could possibly be standardized; its a kind of grand
aspiration of the human spirit, or something. Whatever you are
talking about, it isn't the RDF that I'm working on.
> I'm firmly in the latter camp, perhaps
>because my ambitions for RDF have long included the things we're now
>calling "Semantic Web" technology, and perhaps because I prefer the phrase
>"resource description framework" to our new slogan "Semantic Web". For
>one, it lends itself better to nouning: "an RDF file" versus "a Semantic
>Web file".
There is no such thing as a 'semantic web file'; and with your
interpretation of 'RDF', theres no such thing as an RDF file either,
seems to me. Or better, there's no way to tell if a file is an 'RDF
file' or not, in this way of talking.
>It is easier and more informative to say "all DAML+OIL files
>are RDF files"
Its easier, but it is dangerously misleading if not immediately qualified.
>than to say "all DAML+OIL files are Semantic Web files";
That doesn't mean anything.
What's wrong with saying they are DAML+OIL files? That is short,
accurate, and informative.
>we
>need some umbrella terminology that goes beyond branding to say something
>about what all these components (of the description framework) have in
>common. For me, they're all RDF, and they all share the intentionally
>simplistic RDF worldview of resources, relationships, URIs etc.
Hold on. Relationships are used by virtually every formalism, graph
syntax was invented by C.S.Peirce around 1880, and 'resource' just
seems to be a W3C name for 'entity' or 'individual', so the only
really distinctive thing about RDF is the use of urirefs. Is that
what constitutes the 'worldview' you are referring to, or is there
more to it?
More to the point, why must they all have 'something in common'?
Nobody wants to know what HTTP and FTP have in common, because it
doesn't matter. All the SW needs is ways to *translate* between
different formalisms, not that they all be the same under the hood.
They aren't the same, in any case, eg RDF1.0 and RDF/XML are distinct
languages already, not to mention N3.
>But I can
>see that others are using the acronym differently. I guess it is for W3C
>to clear up the confusion; an update to the RDF FAQ is looming, as is the
>RDF Core primer.
>
>(anyway, here's my view...)
>
>
>> It is encoded in RDF syntax, but its meaning isn't specified by RDF.
>
>We still call it RDF.
I don't! And this is absolutely central. If the RDF specs don't
specify a meaning, then that meaning is NOT in RDF. That's what
'being in RDF' means. Now, a particular piece of RDF might in some
broader sense 'have' some meaning that is invisible to an RDF
processor - ie something that interprets RDF graphs according to
their RDF-model-theory meanings and draws RDF-valid conclusions from
them, say - but it is very important not to say that this meaning in
'in RDF', because in the only sense of meaning being 'in' a language
*that is available to a mechanical process*, it isn't. At best you
might say that it is RDF-encrypted, or something; its there, but
completely hidden from the RDF layer.
One problem with saying that these 'hidden' meanings are 'in RDF' is
that this phrase then becomes meaningless in isolation, since
*anything* can be 'in RDF' in this sense. It can also be 'in PSML',
where PSML is the language defined by the following BNF:
<psml> ::= <unicode-char>|<psml>*
(Proof: serialize your favorite notation into some subset of unicode
and record the serialization as a character string. QED) So this is
not a useful notion. Saying that some meaning is "in L", where L is
some formal language with a formal semantics, is usually taken to
mean that that meaning is accessible to an engine that knows (only)
the semantic rules of L. You ought to be able to figure out the
meaning from the L-expressions plus what you can learn from reading
the L manual. If you need to go beyond what it says in the L manual
to figure out the meaning, it's not "in L".
There is a very basic, almost philosophical, point underlying this.
In a very real sense, on the SW, there IS NO CONTENT. There is only
language; and for the SW that must be processable by software, there
is only formal language. The "content" is what the writers and
readers of the languages intend, but there is no way to send an
intention along a wire. Now, people can intend all kinds of stuff,
since people are very smart and very subtle. But when, as in the SW,
at least some of the readers and writers are programs, they have no
chance at all of guessing at all the subtleties that a human might
have intended. All they can do is use the rules they have built into
them to extract as much meaning from the marks we send them as they
can. If we humans encode other stuff into those marks that go beyond
the rules which were used to build the software agents, they haven't
got a chance of knowing about it: we might as well ask them to be
telepathic. So we have to be very careful what we say about which
rules are supposed to be being used to interpret the formalisms. RDF
and DAML+OIL are based on different assumptions; they are not the
same, and there is no way to encode the latter in the former. (There
is a way to *extend* the former to the latter, of course, but its a
real extension. DAML+OIL goes beyond RDF. In fact, RDFS goes beyond
RDF, which is why the semantic conditions on an RDFS interpretation
need to be stated separately in the model theory.)
>I have some very simple pieces of content, both
>instance level and schema (see example below) whose meaning isn't
>captured by DAML+OIL.
OK, let me take you up on that. How IS it captured, then? (It has to
be captured *somehow*, right?)
>For eg., we may want to use RDF/DAML to talk about
>"a util:Document whose dc:title is 'foo' and whose dc:creator is the
>foaf:Person whose foaf:mbox is mailto:webmaster@example.com". DAML+OIL
>can't distinguish between the cases where (at any one point in time) there
>is at most one entity with a given personal mailbox; and the case where
>across-all-time that property can only ever have a single value. But RDF
>tools, including but not limited to those that understand DAML+OIL,
>can still do
>useful things with this kind of data, even if there are aspects of its
>meaning that are not captured in the RDF or DAML+OIL formalisms.
Oh, sure, of course. It may well be that *part* of a DAML+OIL
formalization is RDF-accessible. But that doesn't make the whole
thing into RDF, any more than my quoting Pascal makes my entire essay
French.
I would say that in this case, if I follow your example, that neither
the RDF nor the DAML captures the intended meaning, since they both
assume that urirefs denote like simple names. So although that
time-relative 'meaning' might be in some human user's mind, it is not
in fact in the RDF/DAML. If the human thinks it is, he or she is
liable to be disappointed by the performance of the software.
>
>For "its meaning isn't specified by RDF" you may as well say "its meaning
>isn't specified by DAML+OIL" in many many cases. So we shrug and
>admit that yeah
>sure, all aspects of meaning are not easily formalised at this stage of
>history.
Its more fundamental than that. We can PROVE that some aspects of
meaning can NEVER be captured by very simple languages like RDF. (Eg
you can't express disjunction or implication in RDF. There are
extensions of RDF in which you can, of course. )
> And we don't pin this on RDF, nor on DAML, it's just the way
>things are: meaning is only partialy captured by the mechanisms we're
>playing with here.
I would put it differently. The formalisms say what they say, and we
can study that, and those are the only 'meanings' that we actually
have available. In a very real sense, there is no other 'meaning' in
the formalism. What you are calling 'meanings' are a kind of
aspiration; those are things that we would *like* to be able to
express in a machine-accessible way. But again, you can't send a
research agenda along a wire.
The trouble with your way of speaking is that it suggests that
'meaning' and 'content' are real stuff that can be somehow captured
and put into a box, and the task is to get hold of more of it. I
think that is seriously misleading. There is no end to 'meaning' in
your sense. Whenever some aspect or part of it is formalized, it is
always possible to think of some other aspect that is missing,
because we are here really talking about something like human
creativity.
>Despite all this, we need to deploy meaningful
>documents in the Web ASAP, without putting everything on hold while we
>wait for a formalism that can capture all of that meaning. We need a
>framework for getting incrementally better
Wait. That "incrementally better" sounds like progress towards some
kind of ultimate goal. What is that goal? To capture ALL of meaning?
Forgeddaboutit. To capture the same amount of meaning that a human
could get by reading the web page? If so, then: 1. you are doing AI,
see above, and I would advise against setting out to get AI done in
the near future; 2. why bother? Humans are cheap these days anyway,
and if the software was this smart then it could read HTML; 3.
surely what we want are things that are a bit less smart than humans
but also a lot faster, less likely to get bored, more willing to do
our bidding and not have ideas of their own, etc..; in fact, 'agents'.
>at describing resources in the
>Web; that is the 'resource description framework', RDF. The RDF approach
>to this has always been to sneak up on the problem bit by bit. The graph
>model provides a useful cartoon world view ("objects, types, properties,
>relationships; identifiers") that can be shared by more expressive parts
>of the system that get designed later.
Pity y'all didn't include connectives and quantifiers.
>DAML+OIL takes the RDF Schema world
>view of classes, properties and constraints, and it adds in a bunch of
>richness that reflects into the formalism things that RDF could
>previously carry but didn't explicitly acknowledge.
What does "could" mean? Are you saying that the RDF authors screwed
up, or that they had DAML+OIL in mind all along, but just kind of
forgot to mention all the picky little details?
(You know, to call the ideas of class, property and constraint "the
RDF Schema world" is kind of silly. Surely nobody thinks that RDF
*invented* these ideas, do they?)
>Now my point is just that DAML is in the exact same situation, there are
>meaningful constructs that can be carried through DAML without DAML
>realising the full meaning.
In a sense of 'carried through', that is correct; but its a trivial
sense, because in this sense, any content can be carried through
almost any language. All the language needs is strings and it can
carry through anything by embedding PSML into it.
>And I'm not talking here about the
>reference/naming/denotation aspects of
>meaning that I've talked about before, though something similar can be
>said about that. Rather, I'm talking about aspects of the meaning of our
>content (eg. temporal issue raised below) which one might imagine _are_ in
>scope for some fancier Model Theory or Axioms to engage with. Just
>in DAML 1.x we
>don't try. Does this mean that my friend-of-a-friend RDFWeb application,
>which uses the property whose URI is 'http://xmlns.com/foaf/0.1/mbox' is
>neither an RDF application nor a DAML application but something else yet
>to be named. Surely not.
If it relies on temporal changes in URI references, then yes, I would
indeed say that really is using something not yet named, and not
using RDF (though I concede that this is a very strict interpretation
that I might be willing to relax in practice :-). What it would be
really doing is something best described as MISusing RDF, ie using it
in ways that are not sanctioned by its official meaning (and
therefore are liable to be misunderstood by another RDF engine) but
are nevertheless useful.
Let me hasten to add that I wouldn't want to stop people doing things
like this; on the contrary, experimenting outside the box in this way
is a very good way to discover the real limitations of any formalism
and to get started on the process of designing a better one. But if,
for example, your friend's application were to break because of those
aspects that lie outside the current RDF formal model, and he were to
sue the W3C on the grounds that RDF doesn't do what we said it would
do, I would say that he had no case.
>Of course there are aspects of meaning, and
>specifically the meaning of that property, which neither RDF nor DAML
>captures. Nevertheless we can use the Web right now to successfully deploy
>and use descriptions of resources in RDF that employ the foaf:mbox
>property. Those descriptions don't cease to be RDF because the property is
>a particularly interesting one, or because there are rules that one might
>(eventually) formalise about its use which can't be written down in a
>Semantic Web language yet. That's why we called it a (description)
>Framework not a (file) Format: it's a deployment strategy for easing all
>this stuff out of research labs
Well, this stuff has been out of the research labs for quite a long
time now. Ever hear of 'databases' ;-) ?
>and into mainstream Web technology. Slowly
>but surely... ;-)
>
>So when I say "all DAML+OIL instance data is RDF data", I mean
>(colloqially) that it has basically the same cartoon worldview: of objects
>identifiable by URIs, having URI-named classes and URI-named relationships
>to one another.
Oh, if that is all, then OK. But if "RDF" just refers to the use of
URIs as names, then RDF1.0 isn't "RDF", since it allows anonymous
resources.
>
>It is of course possible to produce such tangled representations
>(encodings of rules, queries etc for example) in RDF that the
>object/property/value worldview loses much of its utility. For that matter
>I could run something like "tar -c mailbox/* | binhex | gpg --encode >
>mail.txt", put that into a literal string in an RDF graph, then go around
>claiming that I had an RDF representation of my mailbox. I could, but I'd
>be silly rather than wrong. Sure at one level my mailbox is represented
>in, or carried through, RDF. It's just a rather useless representation.
>
>Similarly, there are representations-in-RDF (such as the person/mailbox
>thing below which (a) draw on aspects of Schema/ontology meaning that are
>yet to be formalised and (b) nevertheless make perfect sense as useful
>chunks of RDF instance data, couched in the objects/properties/values
>cartoon worldview.
Maybe I haven't been following you. If they make sense as RDF data,
then *with that understanding of what they mean*, they are RDF. Sure,
no problem with that. But that doesn't mean that DAML+OIL *is* RDF;
it just means that a piece of DAML+OIL makes some kind of sense to an
RDF engine. We set it up that way. But the sense that the RDF engine
gets out of it isn't the same sense that a DAML+OIL engine would get
out of it.
Now in fact, DAML+OIL is rather better integrated with RDF than this
suggests, in that their model theories also line up rather well, so
that one can view RDF(S) (without reification) as being a sublanguage
of DAML+OIL, rather than just a formalism into which DAML+OIL content
is embedded in some opaque way. That means that anything that an RDF
engine would do would in fact also make DAML+OIL sense (though still
not the reverse). But this took a lot of work and care; it doesn't
come easily or naturally, so don't expect that this is going to be
the normal case. Temporal sensitivity isn't going to be that easy,
for example; I think it is going to require re-doing RDF from the
ground up, rather than extending it.
It really is impossible to pre-guess all the things anyone is going
to want to say, and invent a single basic notation that will never
need to be modified, only extended. Even adding context-sensitive
datatyping to RDF will involve extending the Ntriples syntax, for
example. The nearest anyone has come to such a thing is conventional
FOL, which is a very stable region in the space of all expressive
assertional languages. But for some purposes, FOL isn't really what
you want either.
>
>> >and RDF apps that are built
>> >to know about even a subset of DAML+OIL can make good use of that when
>> >doing data merging.
>>
>> Well, they can if they are DAML-savvy, but then why don't you call
>> them DAML apps rather than RDF apps?
>>
>> >
>> >For eg., consider the property http://xmlns.com/foaf/0.1/mbox
>> >from the namespace http://xmlns.com/foaf/0.1/
>> >
>> > [[
>> > FOAF is expressed as an RDF Schema, annotated with DAML to express the
>> > fact that a foaf:mbox uniquely picks out an individual.
>> > ]]
>> >
>> >Excerpting from that schema:
>> >
>> > <rdf:Property rdf:about="http://xmlns.com/foaf/0.1/mbox"
>> > rdfs:label="Personal Mailbox"
>> > rdfs:comment="A web-identifiable Internet mailbox associated
>> >with exactly one owner.
>> > This property is a 'unique property' in the DAML+OIL sense, in that
>> > there is at most one individual that has any particular personal
>> > mailbox.">
>> >
>> > <rdfs:domain rdf:resource="http://xmlns.com/foaf/0.1/Person" />
>> > <rdfs:range
>> >rdf:resource="http://www.w3.org/2000/01/rdf-schema#Resource" />
>> > <rdf:type
>> >rdf:resource="http://www.daml.org/2001/03/daml+oil#UnambiguousProperty"/>
>> > <rdfs:isDefinedBy rdf:resource="http://xmlns.com/foaf/0.1/" />
>> > </rdf:Property>
>> >
>> >Since we say the property is of type
>> >http://www.daml.org/2001/03/daml+oil#UnambiguousProperty
> > >we can use this knowledge in RDF-based applications
>>
>> How does the RDF application know what the DAML expressions mean?
>> (Should it know about all the other extensions to RDF that havnt even
>> been invented yet?)
>>
>> >-- for example merging
>> >blank nodes where each node has a property with the exact same resource as
>> >its value. In this example, merging nodes that stand for the individual
>> >whose presonal mailbox is mailto:foo@example.com, perhaps.
>> >
>> >Aside: I could complain here that DAML+OIL gives us no mechanism for
>> >guaranteeing
>> >that the at-most-one-ness remains static in the face of time and change,
>> >but that's probably a can of worms best opened in a separate thread.
>> >DAML+OIL's "worldview" isn't one that explicitly acknowledges time and
>> >change, and there are good reasons for this being the case. How this
>> >relates to the need to deploy DAML+OIL ontologies in the Web is something
>> >that looms rapidly, imho.
>>
>> Maybe, but you can hardly pin this on DAML; *nobody* has really
>> tackled this issue yet, AFAIK.
>
>It's not a matter of blame, it's a matter of layering. This whole thing
>we're building, the graph stuff, the simple schema stuff, the fancier
>ontology language, perhaps a rules language... what we're assembling is a
>framework for describing resources in the Web. RDF. That picture is
>only coming
>together slowly, and DAML+OIL is a key component. There will be others.
I would agree with all this except for the degree of integration
implied by that word "component". Imagine someone in about 1957
saying "this programming stuff is really coming together; we have
several of the layers sorted out; there's the 704 assembly code (that
youngster Minsky has proven that it is a universal machine) and
there's FORTRAN and IPL ... Pretty soon we will have all the
components for the Machine Programming Framework, and then things
will really start humming." In a sense they would have been right,
but they also would clearly have been missing something important.
(In fact I suspect that many folks at IBM did have something close to
this attitude, deep down, which is why Bill Gates was able to con
them so easily.)
>So going back to my original claim that all DAML+OIL instance data is RDF
>instance data: what I'm getting at is that to deploy this stuff for real,
>on Web sites, in browsers, palm pilots, everywhere, we need some
>stability even when the complete resource description framework is not yet
>finalised.
I don't grok this notion of 'finalised'. That sounds like finalising
evolution, or something. The whole thing seems more open-ended to me.
We put out tools and people start to use them, then people put out
better tools, and other people use them, and so on. God alone knows
what will happen, but is going to be more like a stampede than like
herding cattle.
>Maybe the RDF project never will be finalised, but always
>pushing to get things out of the lab and into the Web mainstream. I hope
>so, fwiw.
>
>
>The 'description logic meets temporal logic' work is still
>in the lab, but please lets not plan to tell the world that there are RDF
>instance files, and DAML/OIL/WebOnt instance files, and
>WebOnt-PlusTemoralLogic instance files and who knows what follows
>after.
Why not? That is exactly what the world needs to know. If we don't
tell them this we would be dishonest, because this is the truth.
There ARE all these formalisms, and they all have their uses and
limitations.
I think you have a vision of the SW as a kind of single integrated
system where content is flowing smoothly along pipes, all encoded in
W3C-sanctioned RDF. I have a very different vision, more like a kind
of bustling market or bazaar, where agents are busily brokering
meanings, and many different languages are being spoken. I see a kind
of market economy of meanings, all happening at electronic speeds. I
can imagine all kinds of new economic opportunities in this virtual
semantic web world. For example, for a (small) fee per thousand bytes
of unicode, I ("I" here is a program, of course, but like a
well-trained truffle-hound, it gives all its earnings to its human
owner) will undertake to translate anything in any of these
notations... into any of these other notations.... Or, I will (for no
fee, but in return for some small favor, eg that you undertake to
transmit some small cookie for me to every other agent you know)
undertake to find you a service which can translate your notation
into some other notation. Or, I will (for a very significant fee)
read anything written in one of these notations, check it for
internal consistency and agreement with any US federal database, and
then warrant that it has been so checked, and maybe (for a truly
astonishing fee) accept any risk arising from such warrant. Or, I
have a new notation which you can use (for a fee schedule that we can
negotiate) and it will provide you with all these advantages...
Now, for this to work, I want to see every file out there branded
with something that tells me - and I'm a piece of software, remember
- *exactly* what semantic rules I am supposed to use to interpret
it. That way, everyone knows whose fault it is when things go wrong.
If I use the semantics it referred to, then its someone else's fault;
if I used some other one, I have only myself to blame.
(Some questions and answers.
Q. What if I don't know those rules? A. Well, then you need to get
the content translated into some rule format you do know. Find a
translation service, or ask the other site if it can translate for
you.
Q. What if I know some better rules? A. Well, go ahead. Maybe you
know more about that agent's rules than it knows; but you are taking
a risk.
.Q. What if I know that my rules are less powerful than its rules? A:
Then you are safe; but you have a responsibility to not mess up any
content that you might not understand, particularly if you are going
to tell anyone *else* that you derived a conclusion from that agent's
sources. )
Q. What if it doesn't have a brand, but I find that I can read it and
make sense of it? A. Well you are free to do so, of course, but if
something goes wrong as a result, its not clear whose fault it was.
(My guess is that this kind of question will eventually end up being
decided by case law in civil tort cases, and that in any case a kind
of 'reasonable practice' code of usage will develop in order to
enable e-commerce to work properly. )
>If and when for example the description-logic-meets-temporal stuff
>gets more fully baked, and perhaps submitted to W3C, we'll likely
>have some way
>of annotating my RDF Schema (or Web Ontology) at
>http://xmlns.com/foaf/0.1/ to better represent the meaning of the classes
>and properties I name there. Does that mean that you'll want me to call my
>instance data files something other than "RDF files" (DAML/OIL/Webont
>files...).
Yes, most definitely. If you call them RDF files and my old, dumb,
RDF inference engine can't understand them, or misinterprets them, I
may be *very* unhappy with you. Your lawyers may hear from my lawyers.
>Or maybe they're not even RDF (DAML etc) files today, since the
>semantics are not full captured by any Semantic Web schema/ontology/rule
>language that I know of. Or we could just worry about something more
>interesting than categorising the flavours of instance data, and get used
>to calling all this stuff "RDF".
Why not just call it "Semantic Web Stuff" and forget about publishing
specs? That provides about the same amount of useful information to
someone trying to write code.
Pat
PS, I just saw this wonderful quote from John Milton:
"When there is much desire to learn,
there of necessity will be much arguing, much writing, many opinions;
for opinion in good men is but knowledge in the making."
--
---------------------------------------------------------------------
IHMC (850)434 8903 home
40 South Alcaniz St. (850)202 4416 office
Pensacola, FL 32501 (850)202 4440 fax
phayes@ai.uwf.edu
http://www.coginst.uwf.edu/~phayes
Received on Monday, 15 October 2001 16:02:53 UTC