Re: Meaning of URIRefs (new test case, comments on Concepts draft)

Sandro,

Wow!  That's a heavy message!

In short, I think you touch on a number of issues that are beyond the scope 
of the current RDF specifications.  I did note a couple of areas that might 
be improved in response to these comments.  I've summarized my take on 
these issues in the document issue description at:
    http://www.ninebynine.org/wip/DocIssues/RDF-Concepts/021-MeaningOfURIRefs.html

Have I missed any vital ingredients here?

#g
--

At 01:05 PM 10/24/02 -0400, Sandro Hawke wrote:


>***** 1. New Introduction and Summary
>
>In the editor's draft of RDF-CONCEPTS [0], you've added a lot of text
>about the meaning of a URIRef coming from the web-content available at
>its URI-part.  It's an excellent and much-needed addition.
>
>I want to underscore how important it is by pointing out that
>social meaning is self-reinforcing.  If people start to doubt the
>importance of using URIRefs as they are defined (and begin to
>experiment with their own incompatible meanings), the RDF specs are
>likely to lose any authority in the matter.  People need tremendous
>confidence in the language in which they write their contracts if
>they are to be held to those contracts.  There must be very little
>window for people to argue about what the definition of "is" is.
>
>With that in mind, and with an eye towards prospects of automated
>reasoning, I'd like to propose this test case:
>
><?xml version="1.0"?>
><!DOCTYPE rdf:RDF [
><!ENTITY animals "http://www.w3.org/2002/10/meaning/animals">
><!ENTITY rdf     "http://www.w3.org/1999/02/22-rdf-syntax-ns">
>]>
><rdf:RDF xmlns:rdf="&rdf;#"
>          xmlns:animals="&animals;#">
>   <rdf:Description rdf:ID="spot">
>      <rdf:type rdf:resource="&animals;#Dog" />
>   </rdf:Description>
></rdf:RDF>
>
>(I moved the hash-mark out of the entity for reasons which will be
>clear later.)
>
>This parses as:
>
>_:x <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> 
><http://www.w3.org/2002/10/meaning/animals#Dog> .
>
>and it should entail
>
>_:x <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> 
><http://www.w3.org/2002/10/meaning/animals#Mammal> .
>
>How?  Because the document at "http://www.w3.org/2002/10/meaning/animals"
>says that #Dog is an rdfs:subclassOf #Mammal.
>
>Let me back up a little and clarify: we have three kinds of
>entailment:
>
>   (1) RDF simple entailment, as in the MT [2], which says
>       things like every RDF graph entails its subgraphs.
>       This kind of entailment pays no attention to URIRefs.
>   (2) Entailment with the "rdf" and "rdfs" vocabulary terms
>       reserved, as in MT [2].
>   (3) Entailment where every URIRef is constrained in meaning
>       according to the web content available at its URI part.
>
>Of course DAML+OIL defines its own entailment, as does OWL, as do my
>various layered logic languages [6], but these should all be seen as
>special cases of (3).  The terms used by Dublic Core, RSS, Creative
>Commons, and various other efforts may not define their meanings with
>model theories or first-order axioms, but their terms are also
>carefully defined, and in some cases their misuse would be
>intollerable (and in the case of CC, perhaps even illegally!).
>
>Type (2) entailment above should also be subsumed into type (3), by
>putting normative pointers at the rdf and rdfs namespace addresses to
>the appropriate Recs (when the Recs happen).  In fact, the MT should
>be more clear in distinguishing between (1), (2), and (3).  (2) should
>probably be in a separate document.  Perhaps (1) and (3) should also
>be separated, but they remain to describe the meaning inherent in all
>RDF documents, regardless of any URIRefs which occur in it.
>
>The point here is that an RDF document must be taken to assert the
>truth of all the documents it names in the URI parts of its
>node-labeling URIRefs.  If those documents are available to a reader,
>and the reader is capable of understanding them, the reader is fully
>entitled to infer facts from the conjunction of the author's documents
>and all the definitional documents.  Moreover, the reader can
>attribute these conclusions to the author; the author is responsible
>for chosing terms (eg comic, clown) whose definitions he accepts.
>
>There are many more details, below.  I first approached this topic
>without noticing the new text in the editor's draft, and spent more
>time arguing why using the URI for the semantics was important.  I'm
>going to leave that text here, because some people are still probably
>not convinced.   If you are convinced, feel free to skip sections 3
>and 4.
>
>**** 2. A Few Notes on RDF-CONCEPTS [0]
>
>I think you overplay the difference between formal and natural languages in
>2.3.3 in the example with
>
>   B:oneOfThem rdfs:comment "This means the same as rdfs:subClassOf".
>
>If we take rdfs:comment to provide normative natural language
>information about the subject (and if it doesn't we need some other
>property which does), then in fact C is still to blame for the insult
>to C:JohnSmith.  The failure of RDFS class reasoning to reach the
>insult does not mean the insult is not style-3 entailed, in this case
>via B:oneOfThem.
>
>I think 2.3.4 is wrong: the predicate needs no special status.  The
>situation you're trying to prevent here is prevented by accepting the
>namespace/URI owner as authoritative in defining the terms there.
>(see my definition of definition in section 5.y).
>
>Section 2.3.5 is also misleading: there is RDF-Simple-Entailment ("1"
>above) and RDF-URI-Based-Entailment ("3" above), and that pretty much
>covers it.  At some URIs (eg OWL, RDF/RDFS, LX) you should find
>appeals to natural language and/or mathematical definitions which are
>not directly usable by machines, but the terms defined there can be
>used to define other terms in a way which *is* amenable to automated
>reasoning.  One could try to distinguish between natural language
>definitions and formal language definitions, but I'm not sure how that
>would help, since automated reasoners vary so much in what kind of
>formal languages they can handle.
>
>***** 3. Older Introduction
>
>If I receive and believe an RDF document, D, saying that D:spot has
>rdf:type animals:Dog, and the animals schema says that animals:Dog is
>a subclass of animals:Mammal, would it be right of me to infer that
>D:spot has rdf:type animals:Mammal?
>
>Your answer might be "never", "sometimes", or "always."  If you say
>"never," then I think you've missed the point of RDF and XML, with all
>these URIs and namespaces.  If you say "sometimes," then we need to
>talk about the qualities of those times.  If you say "always", we have
>some consequences which might be problematic.  (I will argue that the
>correct answer is "always" and that the problems are manageable.)
>
>In any case, I don't think the current working drafts are clear on
>this issue.  RDF-CONCEPTS section 2.3 [1] suggests to me the answer is
>probably "always" and RDF-MT section 1.2 [2] says "sometimes" and that
>it depends which vocabulary you are reserving.  Such an answer from
>the MT, while true in a sense, is fairly useless.  I need to know when
>I'm entitled to make the Dogs-are-Mammals inference, and I don't think
>out-of-band negotation of the "reserved" vocabulary for each RDF
>document is practical.
>
>I'd like to apologize for raising this issue so late in the process,
>but my understanding of it has only become clear in the past week.
>Previously, I had some vague notion that we could "float" the meaning
>of RDF identifiers, but I no longer think that is practical.  I am
>indebted to Pat Hayes, Jeff Heflin, David Booth, Larry Masinter, Dan
>Connolly, and especially Tim Berners-Lee for recent conversations
>helping me understand these issues (even when they disagreed with me).
>
>Last week at the DAML-PI meeting [3], TimBL said that we are not ready
>to "float the currency" of identifier meanings yet, and wont be for
>perhaps fifty years.  For now, he argued, we need to stay on the gold
>standard, where namespace owners have the non-negotiable right to
>dictate the meanings of the terms in their namespace.  This is like
>the US Government saying a US "dollar" is worth 1/35th of a Troy ounce
>of gold; it defines the US dollar in terms of other well-known
>concepts.  This makes sense when introducing a term; it makes less
>sense when everyone has developed a strong sense of what the term
>means.  Tim's point, I think, was that we're a long way from computers
>being able to navigate in a world of vague meaning.
>
>***** 4. Argument For Entailment
>
>Let's return to my Dog/Mammal example.  Let's bind the namespace
>"animals" to "http://www.w3.org/2002/10/meaning/animals#".  The
>document at that address (without the hash) is some RDF saying in RDFS
>that animals:Cat is, in fact, a subclass of animals:Mammal.
>
>Does this mean that the triple
>    _:x rdf:type animals:Cat.
>entails
>    _:x rdf:type animals:Mammal.
>?
>
>There are some issues here about connectivity, trust, and
>change-over-time, but let's defer them for the moment.  Assume a
>static, always connected, always trustworthy web.
>
>Now, I claim that (following the "gold standard") the second triple
>follows logically from the first.  The author of the first chose to
>use the "animals" namespace, and by doing so acknowledged the
>definitions therein.  The author could have used some other namespace,
>or no namespace, but chose to use "animals" (by which I mean the
>longer URI above).  The author almost certainly chose to use the
>"animals" namespace so that others, doing later queries or merges,
>would connect his expressions with other expressions about animals.
>He wanted us to be able to infer that _:x was a mammal.
>
>Did he want us to follow the gold standard, or did he want us to have
>to think carefully about which definition of animals to use?  He
>probably wanted us to use the gold standard, to use the definitions at
>the namespace address, because otherwise there's a chance we'd believe
>some foolish claim about cats being fishes, and totally misunderstand
>him.
>
>So yes, granted the issues about connectivity, trust, and
>change-over-time, the above entailment should hold.  Now, let's
>address those issues:
>
>***** 5. Answers to Problems
>
>1.  Connectivity.  Connectivity does not affect entailment.  Whether
>     or not someone can get a copy of the "animals" definition document
>     does not change the fact that that document is the primary source
>     for the definitions of all the terms in the animals: namespace.
>     If you can't fetch the definitions, then your knowledge of the
>     terms is incomplete and your reasoning about them will be
>     incomplete.  Incomplete reasoning can be a problem, but it's
>     hardly a new problem or one which only arises when we bring in
>     connectivity issues.  If you can't fetch the document (and don't
>     have a current cached copy) then you know that you're missing some
>     information.  The monotonicity guarantee of RDF, however, allows
>     you to proceed with your partial information, which might be good
>     enough.
>
>2.  Trust (except for change-over-time).  This gold standard means
>     that the claims of an RDF document (which [1] says should have
>     legal weight) depend on the contents of other documents.  This is
>     more stable than saying such claims depend on social consensus,
>     but it still involves trust.  If I say my dog has rdf:type
>     animals:Dog and the animals document says that an animals:Dog was
>     once kicked by Ebenezer Scrooge, can I really be held to be saying
>     that Scrooge committed such an act?  I think so; I haven't found a
>     solid line marking the parts of a definition which have bearing
>     solely on other things.  Perhaps the animals document means the
>     Scrooge clause to be the necessary and sufficient condition for
>     doghood!  So, a bit hesitantly, we have to say that all statements
>     in the definition document are asserted by any use of terms from
>     the document.
>
>     We can address the Scrooge issue by saying that using terms from a
>     document is a lot like signing it.  Don't do it unless you have
>     read the document and agree with it.  Of course you need to do
>     this recursively, following the definitions of any terms it uses.
>
>x.  (x is for extra) This brings up the issue of URIRefs "grounding
>     out" in natural language text (which may well make use of
>     mathematical notation).  Our "animals" document constrains the
>     meaning of animals:Dog (very slightly) by using the term
>     rdfs:subclassOf.  That term needs to be constrained by the
>     document at the rdfs namespace [4], which it sort of is.  To
>     follow the gold standard, that document must make normative
>     reference to "http://www.w3.org/TR/rdf-schema/" which it currently
>     does not.  (We could exempt RDF and RDFS from this policy,
>     understanding that their meanings are acknowledged by the very use
>     of the RDF/XML data format.  There is little reason for this
>     special dispensation.)
>
>     I don't see a proper way in the current spec to make this kind of
>     normative reference from an RDF/XML document to a human-readable
>     one.  Perhaps it is sufficient for an rdfs:comment or
>     dc:description to claim, in its natural-language text, that it is
>     in fact normative.   That's a little loopy, but natural language
>     can probably handle it.    Better would be to make sure the RDFS
>     namespace document said that rdfs:comment contained true
>     natural-language statements about the subject.
>
>3.  Change-over-time is a special case of the "stewardship" issues.  It
>     doesn't necessarily involve time; it's possible for a web server
>     to offer one definition document to people who seem to be in France
>     and another to people who seem to be in England.
>
>     Stewardship issues arise often: should one define one's input as
>     being Unicode 3.2 characters, or as being whatever characters set
>     is the latest approved by the Unicode Consortium?  Do you
>     advertize your program as running on "OS Version 9.1" or "OS
>     Version 9.1 or later"?  It all depends on whether you trust the
>     stewardship of the organization which controls the underlying
>     components.
>
>     The solutions here are typical security solutions, because these
>     are fairly typical security problems.  How do you know the
>     definitions are the same ones you agreed to?  (Secure hash
>     functions are a good approach.)  If you agreed to ongoing updates
>     by some steward, how do you know the updates are actually coming
>     from that entity?  (Public keys are a good approach.)
>
>     There's some interesting engineering to do here.   The simplest
>     solution would be for each RDF document to give the SHA1 checksums
>     of each of its namespace documents -- if a checksum is missing or
>     does not match, the definition is considered to be unfetchable.
>     Something like:
>
>       <?xml version="1.0"?>
>       <!DOCTYPE rdf:RDF [
>        <!ENTITY animals "http://www.w3.org/2002/10/meaning/animals">
>        <!ENTITY rdf     "http://www.w3.org/1999/02/22-rdf-syntax-ns">
>       ]>
>       <rdf:RDF xmlns:rdf="&rdf;#"
>                xmlns="&animals;#">
>         <rdf:Description rdf:ID="spot">
>           <rdf:type rdf:resource="&animals;#Dog" />
>         </rdf:Description>
>         <rdf:Description rdf:resource="&animals;">
>           <rdf:sha1>953365afbc5c24ecfe590c350ab1345bee2f7aee</rdf:sha1>
>         </rdf:Description>
>       </rdf:RDF>
>
>     It's not pretty; maybe someone has some better ideas.
>
>     The meaning of the SHA1 triple is a little tricky.  It does not
>     mandate importing the URI's contents as one might imagine, because
>     (I propose) the RDF Specs already mandate it.  Rather, it *allows*
>     it.  Without the SHA1 triple, you would know there was some web
>     content which gave you further true information about the subject
>     at hand, but you would not be allowed to read it.  With the SHA1
>     triple, if the content matches, you can go ahead and read and use
>     the additional content.   Perhaps authors who don't want to bother
>     with SHA1 could add an alternate triple saying, in effect, I trust
>     any definitions you get from the URIs I use.   (This might
>     be sufficient in any RDF document which is not cryptographically
>     signed.)
>
>     Eric Prud'hommeaux suggested the use of an HTTP header could allow
>     several documents to be served from the same URI, distinguished by
>     the SHA1 hash sent in the header.  This would allow author and
>     namespace owner to negotiate (at read-time) on the exact
>     definition text to use, facilitating migration and
>     content-negotiation.  This might be a nice feature, but it's not
>     necessary.  This proposal, as is, works for entirely-static
>     definitions which is all we really need.  (Since I calculated the
>     above checksum, I've twice resisted the urge to change the
>     definition in minor ways.)  If we want to allow continuing
>     stewardship, some additional mechanism (such as a public key & URI
>     in the hashed static document) will be needed.
>
>     Another approach is my sdh proposal [5], but that's a bigger
>     change for RDF, and is not necessary if the official definition of
>     RDF is updated to include these semantics that the definitions of
>     terms are considered to be asserted.
>
>y.  (why not add an extra (rather philosophical) point?) I've been a
>     little vague about what a "definition" is.  I mean a "definition"
>     to be some declarative statement which uses the term and is true
>     only for certain meanings of that term.  An asserted (included,
>     imported) definition thus limits the possible valid
>     interpretations (models) of statements which use the term.
>
>     A "strong" definition is a work of art which constrains
>     interpretation to the point where no observable differences
>     emerge.  For artificial terms, even stronger "perfect" definitions
>     can be written.  These are definitions in the mathematical sense,
>     "Let us define f to be...".  Compared to that, natural language
>     definitions and ontologies are usually mere descriptions.  Still,
>     I call them definitional documents in accordance with their intent
>     and common usage.
>
>     Definitions do not have to be perfect, or even strong, of course.
>     They can be "thin" ontologies like my Dog/Mammal one, which merely
>     offer a little helpful description.  The essense of the gold
>     standard is that, no matter whether a definition is thin, strong,
>     or perfect, you at least know which one everyone is supposed to
>     use.
>
>
>***** 6. Older Conclusion
>
>I've tried hard to be clear and concise here, and I apologize for any
>failures.  I understand you're working under a looming deadline, but
>this issue is crucial to address as soon as possible, in this version
>of RDF.   I don't think this is a change in the basic intent of RDF,
>but if you Recommend the MT in its current form, you will have given RDF
>URIRefs only floating semantics.
>
>I doubt the change from floating semantics back to namespace-document
>semantics can be made compatibly.  With floating semantics, people and
>machines reading RDF are required to use their own judgement in
>deciding which definitions to use.  Once they start doing that,
>authors will become used to it, and will no longer be obligated to
>adhere to original definitions.  Obligations cannot be imposed
>retroactively (in this kind of a free environment), so if
>namespace-document semantics are added later, they will have to be
>added in a language which is marked as having different semantics.
>But the difference is easy to miss; it's the difference that "now you
>have to use the terms as defined!" and if there's a reasonable doubt
>about authors understanding this change, then they really have no
>obligation (such as might stand up in court), and the change has not
>actually been made.
>
>Since floating semantics are not amenable to automated reasoning, if
>you pass on this issue now, you will have kept RDF (in its present
>form and probably all similar future forms) from being a viable
>Semantic Web language.   That would be unfortunate.
>
>If there is any further way I can assist in this matter, please let me
>know.
>
>     -- sandro                         http://www.w3.org/People/Sandro/
>
>[0] http://www.ninebynine.org/wip/RDF-concepts/2002-10-18/rdf-concepts.html
>[1] http://www.w3.org/TR/rdf-concepts/#section-Meaning
>[2] http://www.w3.org/TR/rdf-mt/#urisandlit
>[3] http://www.daml.org/meetings/2002/10/pi/
>[4] http://www.w3.org/2000/01/rdf-schema
>[5] http://www.w3.org/2002/09/sdh/
>[6] http://www.w3.org/2002/08/LX

-------------------
Graham Klyne
<GK@NineByNine.org>

Received on Friday, 25 October 2002 13:41:25 UTC