Re: Meaning of URIRefs (new test case, comments on Concepts draft)

>***** 1. New Introduction and Summary
>
>In the editor's draft of RDF-CONCEPTS [0], you've added a lot of text
>about the meaning of a URIRef coming from the web-content available at
>its URI-part.  It's an excellent and much-needed addition.
>
>I want to underscore how important it is by pointing out that
>social meaning is self-reinforcing.  If people start to doubt the
>importance of using URIRefs as they are defined (and begin to
>experiment with their own incompatible meanings), the RDF specs are
>likely to lose any authority in the matter.  People need tremendous
>confidence in the language in which they write their contracts if
>they are to be held to those contracts.  There must be very little
>window for people to argue about what the definition of "is" is.
>
>With that in mind, and with an eye towards prospects of automated
>reasoning, I'd like to propose this test case:
>
><?xml version="1.0"?>
><!DOCTYPE rdf:RDF [
><!ENTITY animals "http://www.w3.org/2002/10/meaning/animals">
><!ENTITY rdf     "http://www.w3.org/1999/02/22-rdf-syntax-ns">
>]>
><rdf:RDF xmlns:rdf="&rdf;#"
>          xmlns:animals="&animals;#">
>   <rdf:Description rdf:ID="spot">
>      <rdf:type rdf:resource="&animals;#Dog" />
>   </rdf:Description>
></rdf:RDF>
>     
>(I moved the hash-mark out of the entity for reasons which will be
>clear later.)
>
>This parses as:
>
>_:x <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> 
><http://www.w3.org/2002/10/meaning/animals#Dog> .
>
>and it should entail
>
>_:x <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> 
><http://www.w3.org/2002/10/meaning/animals#Mammal> .
>
>How?  Because the document at "http://www.w3.org/2002/10/meaning/animals"
>says that #Dog is an rdfs:subclassOf #Mammal.

Oooh, this reminds me of the argument we had at the recent webont F2F 
about 'imports'. There is a real problem (noticed first by Dan C.) in 
saying that A entails B because of what is in C. "A entails B" 
already has a meaning: it means that any interpretation which 
satisfies A also satisfies B. So if A entails B because of what C 
says, then A's *truth* must depend on what C says. Is that really 
what you want to say here? That the truth of

>_:x <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> 
><http://www.w3.org/2002/10/meaning/animals#Dog> .

depends on the interpretation of the document at

"http://www.w3.org/2002/10/meaning/animals" ?

This is a very odd notion of 'truth', since it recurses along URLs. 
In order to find out what a webpage is saying, now, I have to locate 
the transitive closure of all the implicit links from that page, and 
I have to consider them all as being asserted by the webpage itself. 
Not only does this seem implausible, it seems unnecessary. After all, 
they are already being asserted, right? Thats how I am able to find 
them, by their being published on the Web. If they weren't being 
asserted then I would get 404 errors. So why do I need to assert them 
*again*?

>
>Let me back up a little and clarify: we have three kinds of
>entailment:
>
>   (1) RDF simple entailment, as in the MT [2], which says
>       things like every RDF graph entails its subgraphs.
>       This kind of entailment pays no attention to URIRefs.
>   (2) Entailment with the "rdf" and "rdfs" vocabulary terms
>       reserved, as in MT [2].
>   (3) Entailment where every URIRef is constrained in meaning

No, its not *constrained*. It *asserts* that other content (again). 
But of course other people might be asserting other things; nothing 
constrains them from doing that.

>       according to the web content available at its URI part.
>
>Of course DAML+OIL defines its own entailment, as does OWL, as do my
>various layered logic languages [6], but these should all be seen as
>special cases of (3).  The terms used by Dublic Core, RSS, Creative
>Commons, and various other efforts may not define their meanings with
>model theories or first-order axioms, but their terms are also
>carefully defined, and in some cases their misuse would be
>intollerable (and in the case of CC, perhaps even illegally!).

Fine, but there is no need to incorporate that meaning into a notion 
of entailment. All we need to do is to say that *if* they mean 
something, in some sense of 'mean', by virtue of whatever 
social/legal/contextual process is considered to attach meanings to 
them, then those 'mean'ings are also inherited by their entailments. 
But entailment itself is a crisp notion defined in terms of the MT: 
no need to mess with that (and better not to try, in case it all 
falls apart.)

BTW, if these issues ever get to be of real importance, then I bet 
the the legal system will invent its own criteria no matter what we 
say about it, and I bet it will be messy and full of vagueness and 
case law. For example, suppose that the inference path from what I 
publish to something insulting is very long and recondite, and I 
argue that I didn't forsee those entailments and didn't intend to 
assert them; I bet that some jury will buy some such argument 
eventually. (Or suppose I set up an RDF website - actually it will 
need to be OWL - which has double meanings, where one set of meanings 
is very obvious and transparent, but the others are very indirect and 
unintuitive and hard to decipher without running special reasoners, 
and then I sue you for what you have 'said' by using one of my terms 
in a way that reflects only its obvious meaning.)

>
>Type (2) entailment above should also be subsumed into type (3), by
>putting normative pointers at the rdf and rdfs namespace addresses to
>the appropriate Recs (when the Recs happen).  In fact, the MT should
>be more clear in distinguishing between (1), (2), and (3).  (2) should
>probably be in a separate document.  Perhaps (1) and (3) should also
>be separated, but they remain to describe the meaning inherent in all
>RDF documents, regardless of any URIRefs which occur in it.
>
>The point here is that an RDF document must be taken to assert the
>truth of all the documents it names in the URI parts of its
>node-labeling URIRefs.

I disagree. It might in some sense presuppose them, and it (or its 
author) might be responsible for what *it* says when they are taken 
into account; but neither of those are the same thing as it 
re-asserting them.

>  If those documents are available to a reader,
>and the reader is capable of understanding them, the reader is fully
>entitled to infer facts from the conjunction of the author's documents
>and all the definitional documents.

The reader is already entitled to do that. No need to say anything to 
make this possible. If I read one thing in the Times and another 
thing in the Mirror, then I am entitled to draw my own conclusions.

>  Moreover, the reader can
>attribute these conclusions to the author; the author is responsible
>for chosing terms (eg comic, clown) whose definitions he accepts.

Right, but we need to be careful. I think the way to say this is, the 
author is responsible for choosing terms whose consequences (from 
what he says and what the owner of the term asserts) he is willing to 
stand by. But that is not (necessarily) to endorse everything that 
the owner asserts. For example, I might use a term in a dictionary 
without necessarily thereby agreeing that *every* definition in the 
dictionary is correct. I am only assenting to the part that I am 
using; and even there, I am not re-asserting it, only agreeing to it.

>
>There are many more details, below.  I first approached this topic
>without noticing the new text in the editor's draft, and spent more
>time arguing why using the URI for the semantics was important.  I'm
>going to leave that text here, because some people are still probably
>not convinced.   If you are convinced, feel free to skip sections 3
>and 4.
>
>**** 2. A Few Notes on RDF-CONCEPTS [0]
>
>I think you overplay the difference between formal and natural languages in
>2.3.3 in the example with
>
>   B:oneOfThem rdfs:comment "This means the same as rdfs:subClassOf".
>
>If we take rdfs:comment to provide normative natural language
>information about the subject (and if it doesn't we need some other
>property which does), then in fact C is still to blame for the insult
>to C:JohnSmith.  The failure of RDFS class reasoning to reach the
>insult does not mean the insult is not style-3 entailed, in this case
>via B:oneOfThem.

Well then that is a critical fault with your 'style-3' notion. Any 
notion of entailment which requires one to interpret NL comments is 
simply untenable. There really is a sharp distinction between NL and 
formal languages here, and we cannot allow it to get blurred: if we 
do, we are dead in the water before we even get started.

>
>I think 2.3.4 is wrong: the predicate needs no special status.  The
>situation you're trying to prevent here is prevented by accepting the
>namespace/URI owner as authoritative in defining the terms there.
>(see my definition of definition in section 5.y).

The issue is not about authority, but about what counts as a 'definition'.

>
>Section 2.3.5 is also misleading: there is RDF-Simple-Entailment ("1"
>above) and RDF-URI-Based-Entailment ("3" above), and that pretty much
>covers it.  At some URIs (eg OWL, RDF/RDFS, LX) you should find
>appeals to natural language and/or mathematical definitions which are
>not directly usable by machines, but the terms defined there can be
>used to define other terms in a way which *is* amenable to automated
>reasoning.  One could try to distinguish between natural language
>definitions and formal language definitions, but I'm not sure how that
>would help, since automated reasoners vary so much in what kind of
>formal languages they can handle.

You seem to be repeating a common beginners mistake. Sorry to be so 
blunt, but its important not to let this error run unchecked. Of 
course a *spec* is written in NL: its intended for human developers 
to read. But a spec is not a *translation*. The MTs for RDFS and OWL 
are written in what might be called 'NL math'; but the specifications 
they provide (of entailment and satisfaction) are completely 
unambiguous and exact, which is why they can be used as a guide to 
developers of inference systems. The fact that the METAlanguage of a 
formal language is readable by human beings does not imply that the 
language ITSELF is somehow tainted with informal or social meaning.

>
>***** 3. Older Introduction
>
>If I receive and believe an RDF document, D, saying that D:spot has
>rdf:type animals:Dog, and the animals schema says that animals:Dog is
>a subclass of animals:Mammal, would it be right of me to infer that
>D:spot has rdf:type animals:Mammal?

I would say that there is no answer to that question. It depends on 
what you accept. If you accept both documents, then yes, you should 
feel right about making the inference. If not, then maybe not. All 
that RDF can tell you is what follows from what: but what you choose 
to believe is up to you.

Logic 101: don't confuse 'valid argument' with 'correct conclusion'. 
Logic doesn't tell you which conclusions are correct or true: it only 
gives you ways to infer one thing from another.

>
>Your answer might be "never", "sometimes", or "always."  If you say
>"never," then I think you've missed the point of RDF and XML, with all
>these URIs and namespaces.  If you say "sometimes," then we need to
>talk about the qualities of those times.  If you say "always", we have
>some consequences which might be problematic.  (I will argue that the
>correct answer is "always" and that the problems are manageable.)
>
>In any case, I don't think the current working drafts are clear on
>this issue.

They should not be, any more than they should try to answer the Secret of Life.

>  RDF-CONCEPTS section 2.3 [1] suggests to me the answer is
>probably "always" and RDF-MT section 1.2 [2] says "sometimes" and that
>it depends which vocabulary you are reserving.  Such an answer from
>the MT, while true in a sense, is fairly useless.

Tough shit. I mean, it DOES depend on that. if that bothers you, get 
used to it, because this isn't going to change.

>  I need to know when
>I'm entitled to make the Dogs-are-Mammals inference,

You are always entitled to make that *inference*. The issue is 
whether or not you believe the antecedents. The rule is valid, but 
its up to you to decide whether or not to trigger it. RDF can't help 
you there. All it can do is convey the content from one place to 
another, but it can't guarantee verity.

>and I don't think
>out-of-band negotation of the "reserved" vocabulary for each RDF
>document is practical.
>
>I'd like to apologize for raising this issue so late in the process,
>but my understanding of it has only become clear in the past week.
>Previously, I had some vague notion that we could "float" the meaning
>of RDF identifiers, but I no longer think that is practical.  I am
>indebted to Pat Hayes, Jeff Heflin, David Booth, Larry Masinter, Dan
>Connolly, and especially Tim Berners-Lee for recent conversations
>helping me understand these issues (even when they disagreed with me).
>
>Last week at the DAML-PI meeting [3], TimBL said that we are not ready
>to "float the currency" of identifier meanings yet, and wont be for
>perhaps fifty years.

I'd say less, maybe ten. If the 'wild' free-wheeling social SW ever 
takes off, it will start to happen fairly quickly, Im sure, whether 
we like it or not.

>For now, he argued, we need to stay on the gold
>standard, where namespace owners have the non-negotiable right to
>dictate the meanings of the terms in their namespace.  This is like
>the US Government saying a US "dollar" is worth 1/35th of a Troy ounce
>of gold; it defines the US dollar in terms of other well-known
>concepts.  This makes sense when introducing a term; it makes less
>sense when everyone has developed a strong sense of what the term
>means.  Tim's point, I think, was that we're a long way from computers
>being able to navigate in a world of vague meaning.

Indeed.

>
>***** 4. Argument For Entailment
>
>Let's return to my Dog/Mammal example.  Let's bind the namespace
>"animals" to "http://www.w3.org/2002/10/meaning/animals#".  The
>document at that address (without the hash) is some RDF saying in RDFS
>that animals:Cat is, in fact, a subclass of animals:Mammal.
>
>Does this mean that the triple
>    _:x rdf:type animals:Cat.
>entails
>    _:x rdf:type animals:Mammal.

No. It means that  triple plus the owner's assertion about what it 
'means' together entail that. Isn't that enough?

>?
>
>There are some issues here about connectivity, trust, and
>change-over-time, but let's defer them for the moment.  Assume a
>static, always connected, always trustworthy web.
>
>Now, I claim that (following the "gold standard") the second triple
>follows logically from the first.

Again: "entails" is an English word with a meaning. Apply that 
meaning, and you are saying that the first triple *asserts* that cats 
are mammals. But it doesn't seem to me that it does assert that: and 
if it does, why do we even need to use the document at the unhashed 
address? According to your criteria, it is irrelevant.

>The author of the first chose to
>use the "animals" namespace, and by doing so acknowledged the
>definitions therein.

Acknowledging is not the same as asserting.

>  The author could have used some other namespace,
>or no namespace, but chose to use "animals" (by which I mean the
>longer URI above).  The author almost certainly chose to use the
>"animals" namespace so that others, doing later queries or merges,
>would connect his expressions with other expressions about animals.
>He wanted us to be able to infer that _:x was a mammal.

Quite possibly, but not by saying something that entailed it; rather, 
by saying something new which could be combined with something he 
referred to, so that *together* they entailed this.

>
>Did he want us to follow the gold standard, or did he want us to have
>to think carefully about which definition of animals to use?  He
>probably wanted us to use the gold standard, to use the definitions at
>the namespace address, because otherwise there's a chance we'd believe
>some foolish claim about cats being fishes, and totally misunderstand
>him.

Ah, but even your gold standard doesn't provide any security against 
that. Suppose that Joe asserted the 'real' definition, and some other 
schmuk, Mick, publishes some nonsensical claim *using Joe's term*. I 
might still stumble across Mike's nonsense and believe it; nothing 
that you or Joe can say or do is enough to prevent me from being 
misled by Mike. NOTHING.

This point also came up in the 'imports' discussion, and I think it 
reveals a basic flaw in a lot of reasoning about 'trust'. Let me 
shout this from the rooftops: PROVIDING A WAY TO INDICATE TRUST DOES 
NOT PROVIDE ANY WAY TO INDICATE MISTRUST. The only way in which this 
can happen is if there is a basic, universal assumption of mistrust - 
do not believe ANYTHING you read in RDF, its ALL LIES - unless it is 
explicitly overridden by an 'imports' or an explicit use of a term, 
as in your 'gold standard' rule. But surely that kind of 
institutionalized paranoia isn't the right way to start building a 
semantic web, is it? I wouldn't want to use a paranoid version of 
Google.

>
>So yes, granted the issues about connectivity, trust, and
>change-over-time, the above entailment should hold.  Now, let's
>address those issues:
>
>***** 5. Answers to Problems
>
>1.  Connectivity.  Connectivity does not affect entailment.  Whether
>     or not someone can get a copy of the "animals" definition document
>     does not change the fact that that document is the primary source
>     for the definitions of all the terms in the animals: namespace.
>     If you can't fetch the definitions, then your knowledge of the
>     terms is incomplete and your reasoning about them will be
>     incomplete.  Incomplete reasoning can be a problem, but it's
>     hardly a new problem or one which only arises when we bring in
>     connectivity issues.  If you can't fetch the document (and don't
>     have a current cached copy) then you know that you're missing some
>     information.  The monotonicity guarantee of RDF, however, allows
>     you to proceed with your partial information, which might be good
>     enough.
>
>2.  Trust (except for change-over-time).  This gold standard means
>     that the claims of an RDF document (which [1] says should have
>     legal weight) depend on the contents of other documents.  This is
>     more stable than saying such claims depend on social consensus,
>     but it still involves trust.  If I say my dog has rdf:type
>     animals:Dog and the animals document says that an animals:Dog was
>     once kicked by Ebenezer Scrooge, can I really be held to be saying
>     that Scrooge committed such an act?

No, because there is no inference path to that from anything that YOU 
have asserted (unless it was your dog he kicked, of course.). 
Entailment is actually quite useful when you morph it into proof 
theory, since it provides a much sharper tool to figure out what 
follows from what. For example, let me commend the Craig 
interpolation lemma to your attention. It says that if A follows from 
B (and neither of them is a logical truth or a logical contradiction) 
then there must be on 'interpolant' C which only contains names from 
the *intersection* of the namespaces of A and B such that A entails C 
entails B. From which it follows for example that if A and B have no 
names in common, then A cannot entail B (unless A is a contradiction 
or B is a tautology). That deals with your Scrooge example.

>  I think so; I haven't found a
>     solid line marking the parts of a definition which have bearing
>     solely on other things.  Perhaps the animals document means the
>     Scrooge clause to be the necessary and sufficient condition for
>     doghood!  So, a bit hesitantly, we have to say that all statements
>     in the definition document are asserted by any use of terms from
>     the document.
>
>     We can address the Scrooge issue by saying that using terms from a
>     document is a lot like signing it.  Don't do it unless you have
>     read the document and agree with it.  Of course you need to do
>     this recursively, following the definitions of any terms it uses.
>
>x.  (x is for extra) This brings up the issue of URIRefs "grounding
>     out" in natural language text (which may well make use of
>     mathematical notation).  Our "animals" document constrains the
>     meaning of animals:Dog (very slightly) by using the term
>     rdfs:subclassOf.  That term needs to be constrained by the
>     document at the rdfs namespace [4], which it sort of is.

No, its constrained by the *language spec*. Even if that spec was not 
on the web, it would still do its work by virtue of being the 
specification. It doesn't need to be read by active web agents in 
order to be a normative language specification. The meanings of 
rdf:type and rdfs:subClassOf are not PROVIDED at the URl 
http://www.w3.org/2000/01/rdf-schema; that uriref serves only to 
*identify* the RDF(S) spec. The actual meanings are defined in the 
documents which (we hope) will be read by software developers, not by 
the comments on a web page. I would expect any RDF-savvy software to 
be able to recognize the normative RDFS URI, but I wouldn't expect it 
to follow the link in order to *find out* what rdf:type means. It 
ought to just 'know' that,  in much the same sense that I 'know' how 
to breathe.

>  To
>     follow the gold standard, that document must make normative
>     reference to "http://www.w3.org/TR/rdf-schema/" which it currently
>     does not.  (We could exempt RDF and RDFS from this policy,
>     understanding that their meanings are acknowledged by the very use
>     of the RDF/XML data format.  There is little reason for this
>     special dispensation.)

I believe that any well-formed RDF/XML will contain such a normative 
reference in its header, right?

>
>     I don't see a proper way in the current spec to make this kind of
>     normative reference from an RDF/XML document to a human-readable
>     one.

Why does it have to be a reference to a human-readable document? For 
that matter, why does it have to be a reference to ANY document? 
Suppose that some web attack disables the W3C server so that this URL 
produces a 404 error: would all RDF reasoners on the planet grind to 
a halt because they wouldn't know what rdf:type meant?

>Perhaps it is sufficient for an rdfs:comment or
>     dc:description to claim, in its natural-language text, that it is
>     in fact normative.   That's a little loopy, but natural language
>     can probably handle it.    Better would be to make sure the RDFS
>     namespace document said that rdfs:comment contained true
>     natural-language statements about the subject.

It can say that all it wants, but it can't say it in RDF, so an RDF 
engine isn't going to be able to understand it.

>
>3.  Change-over-time is a special case of the "stewardship" issues.  It
>     doesn't necessarily involve time; it's possible for a web server
>     to offer one definition document to people who seem to be in France
>     and another to people who seem to be in England. 
>
>     Stewardship issues arise often: should one define one's input as
>     being Unicode 3.2 characters, or as being whatever characters set
>     is the latest approved by the Unicode Consortium?  Do you
>     advertize your program as running on "OS Version 9.1" or "OS
>     Version 9.1 or later"?  It all depends on whether you trust the
>     stewardship of the organization which controls the underlying
>     components. 
>
>     The solutions here are typical security solutions, because these
>     are fairly typical security problems.

[big snip] [Security issues are interesting but irrelevant to the 
current discussion]

>y.  (why not add an extra (rather philosophical) point?) I've been a
>     little vague about what a "definition" is.  I mean a "definition"
>     to be some declarative statement which uses the term and is true
>     only for certain meanings of that term.  An asserted (included,
>     imported) definition thus limits the possible valid
>     interpretations (models) of statements which use the term.

Fine, so a definition is just another assertion. I like that idea. 
You might add, that only the owner of a term is allowed to claim that 
any of the assertions made using it are definitional (definitive?) in 
the required sense.

>
>     A "strong" definition is a work of art which constrains
>     interpretation to the point where no observable differences
>     emerge.  For artificial terms, even stronger "perfect" definitions
>     can be written.  These are definitions in the mathematical sense,
>     "Let us define f to be...".  Compared to that, natural language
>     definitions and ontologies are usually mere descriptions.

Well, no, lots of NL depends on ostensive definitions which are 
rooted in contexts of use, eg almost all proper names are like this, 
as are natural kind terms like 'wet' or 'overcast' or 'wood'. These 
aren't descriptions in the usual sense.  NL is really *very* 
complicated; way too complicated to toss casually into a discussion 
like this.

>  Still,
>     I call them definitional documents in accordance with their intent
>     and common usage.
>
>     Definitions do not have to be perfect, or even strong, of course.
>     They can be "thin" ontologies like my Dog/Mammal one, which merely
>     offer a little helpful description.  The essense of the gold
>     standard is that, no matter whether a definition is thin, strong,
>     or perfect, you at least know which one everyone is supposed to
>     use.

Everyone is *supposed* to use? According to what authority?? Isnt 
that a bit like saying, the newspaper that everyone is supposed to 
read?

>
>
>***** 6. Older Conclusion
>
>I've tried hard to be clear and concise here, and I apologize for any
>failures.  I understand you're working under a looming deadline, but
>this issue is crucial to address as soon as possible, in this version
>of RDF.   I don't think this is a change in the basic intent of RDF,
>but if you Recommend the MT in its current form, you will have given RDF
>URIRefs only floating semantics. 
>
>I doubt the change from floating semantics back to namespace-document
>semantics can be made compatibly.  With floating semantics, people and
>machines reading RDF are required to use their own judgement in
>deciding which definitions to use.  Once they start doing that,
>authors will become used to it, and will no longer be obligated to
>adhere to original definitions.  Obligations cannot be imposed
>retroactively (in this kind of a free environment), so if
>namespace-document semantics are added later, they will have to be
>added in a language which is marked as having different semantics.
>But the difference is easy to miss; it's the difference that "now you
>have to use the terms as defined!" and if there's a reasonable doubt
>about authors understanding this change, then they really have no
>obligation (such as might stand up in court), and the change has not
>actually been made.
>
>Since floating semantics are not amenable to automated reasoning,

?? If I follow the above, then 'floating semantics' is the ONLY kind 
that is amenable to automated reasoning. Allowing meaning to depend 
on NL comments makes the SW unworkable for the forseeable future, and 
in any case makes the SW indistinguishable from the WWW, so why the 
hell are we wasting all this time and effort? The WWW exists, so we 
can just declare victory and go home.

>  if
>you pass on this issue now, you will have kept RDF (in its present
>form and probably all similar future forms) from being a viable
>Semantic Web language.

I disagree strongly. If we try to go the way you are urging, we will 
have defined the SW out of existence.

Pat


-- 
---------------------------------------------------------------------
IHMC					(850)434 8903   home
40 South Alcaniz St.			(850)202 4416   office
Pensacola              			(850)202 4440   fax
FL 32501           				(850)291 0667    cell
phayes@ai.uwf.edu	          http://www.coginst.uwf.edu/~phayes

Received on Thursday, 24 October 2002 21:08:51 UTC