- From: Graham Klyne <GK@ninebynine.org>
- Date: Fri, 25 Oct 2002 19:06:22 +0100
- To: Sandro Hawke <sandro@w3.org>
- Cc: www-rdf-comments@w3.org
Sandro,
Wow! That's a heavy message!
In short, I think you touch on a number of issues that are beyond the scope
of the current RDF specifications. I did note a couple of areas that might
be improved in response to these comments. I've summarized my take on
these issues in the document issue description at:
http://www.ninebynine.org/wip/DocIssues/RDF-Concepts/021-MeaningOfURIRefs.html
Have I missed any vital ingredients here?
#g
--
At 01:05 PM 10/24/02 -0400, Sandro Hawke wrote:
>***** 1. New Introduction and Summary
>
>In the editor's draft of RDF-CONCEPTS [0], you've added a lot of text
>about the meaning of a URIRef coming from the web-content available at
>its URI-part. It's an excellent and much-needed addition.
>
>I want to underscore how important it is by pointing out that
>social meaning is self-reinforcing. If people start to doubt the
>importance of using URIRefs as they are defined (and begin to
>experiment with their own incompatible meanings), the RDF specs are
>likely to lose any authority in the matter. People need tremendous
>confidence in the language in which they write their contracts if
>they are to be held to those contracts. There must be very little
>window for people to argue about what the definition of "is" is.
>
>With that in mind, and with an eye towards prospects of automated
>reasoning, I'd like to propose this test case:
>
><?xml version="1.0"?>
><!DOCTYPE rdf:RDF [
><!ENTITY animals "http://www.w3.org/2002/10/meaning/animals">
><!ENTITY rdf "http://www.w3.org/1999/02/22-rdf-syntax-ns">
>]>
><rdf:RDF xmlns:rdf="&rdf;#"
> xmlns:animals="&animals;#">
> <rdf:Description rdf:ID="spot">
> <rdf:type rdf:resource="&animals;#Dog" />
> </rdf:Description>
></rdf:RDF>
>
>(I moved the hash-mark out of the entity for reasons which will be
>clear later.)
>
>This parses as:
>
>_:x <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
><http://www.w3.org/2002/10/meaning/animals#Dog> .
>
>and it should entail
>
>_:x <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
><http://www.w3.org/2002/10/meaning/animals#Mammal> .
>
>How? Because the document at "http://www.w3.org/2002/10/meaning/animals"
>says that #Dog is an rdfs:subclassOf #Mammal.
>
>Let me back up a little and clarify: we have three kinds of
>entailment:
>
> (1) RDF simple entailment, as in the MT [2], which says
> things like every RDF graph entails its subgraphs.
> This kind of entailment pays no attention to URIRefs.
> (2) Entailment with the "rdf" and "rdfs" vocabulary terms
> reserved, as in MT [2].
> (3) Entailment where every URIRef is constrained in meaning
> according to the web content available at its URI part.
>
>Of course DAML+OIL defines its own entailment, as does OWL, as do my
>various layered logic languages [6], but these should all be seen as
>special cases of (3). The terms used by Dublic Core, RSS, Creative
>Commons, and various other efforts may not define their meanings with
>model theories or first-order axioms, but their terms are also
>carefully defined, and in some cases their misuse would be
>intollerable (and in the case of CC, perhaps even illegally!).
>
>Type (2) entailment above should also be subsumed into type (3), by
>putting normative pointers at the rdf and rdfs namespace addresses to
>the appropriate Recs (when the Recs happen). In fact, the MT should
>be more clear in distinguishing between (1), (2), and (3). (2) should
>probably be in a separate document. Perhaps (1) and (3) should also
>be separated, but they remain to describe the meaning inherent in all
>RDF documents, regardless of any URIRefs which occur in it.
>
>The point here is that an RDF document must be taken to assert the
>truth of all the documents it names in the URI parts of its
>node-labeling URIRefs. If those documents are available to a reader,
>and the reader is capable of understanding them, the reader is fully
>entitled to infer facts from the conjunction of the author's documents
>and all the definitional documents. Moreover, the reader can
>attribute these conclusions to the author; the author is responsible
>for chosing terms (eg comic, clown) whose definitions he accepts.
>
>There are many more details, below. I first approached this topic
>without noticing the new text in the editor's draft, and spent more
>time arguing why using the URI for the semantics was important. I'm
>going to leave that text here, because some people are still probably
>not convinced. If you are convinced, feel free to skip sections 3
>and 4.
>
>**** 2. A Few Notes on RDF-CONCEPTS [0]
>
>I think you overplay the difference between formal and natural languages in
>2.3.3 in the example with
>
> B:oneOfThem rdfs:comment "This means the same as rdfs:subClassOf".
>
>If we take rdfs:comment to provide normative natural language
>information about the subject (and if it doesn't we need some other
>property which does), then in fact C is still to blame for the insult
>to C:JohnSmith. The failure of RDFS class reasoning to reach the
>insult does not mean the insult is not style-3 entailed, in this case
>via B:oneOfThem.
>
>I think 2.3.4 is wrong: the predicate needs no special status. The
>situation you're trying to prevent here is prevented by accepting the
>namespace/URI owner as authoritative in defining the terms there.
>(see my definition of definition in section 5.y).
>
>Section 2.3.5 is also misleading: there is RDF-Simple-Entailment ("1"
>above) and RDF-URI-Based-Entailment ("3" above), and that pretty much
>covers it. At some URIs (eg OWL, RDF/RDFS, LX) you should find
>appeals to natural language and/or mathematical definitions which are
>not directly usable by machines, but the terms defined there can be
>used to define other terms in a way which *is* amenable to automated
>reasoning. One could try to distinguish between natural language
>definitions and formal language definitions, but I'm not sure how that
>would help, since automated reasoners vary so much in what kind of
>formal languages they can handle.
>
>***** 3. Older Introduction
>
>If I receive and believe an RDF document, D, saying that D:spot has
>rdf:type animals:Dog, and the animals schema says that animals:Dog is
>a subclass of animals:Mammal, would it be right of me to infer that
>D:spot has rdf:type animals:Mammal?
>
>Your answer might be "never", "sometimes", or "always." If you say
>"never," then I think you've missed the point of RDF and XML, with all
>these URIs and namespaces. If you say "sometimes," then we need to
>talk about the qualities of those times. If you say "always", we have
>some consequences which might be problematic. (I will argue that the
>correct answer is "always" and that the problems are manageable.)
>
>In any case, I don't think the current working drafts are clear on
>this issue. RDF-CONCEPTS section 2.3 [1] suggests to me the answer is
>probably "always" and RDF-MT section 1.2 [2] says "sometimes" and that
>it depends which vocabulary you are reserving. Such an answer from
>the MT, while true in a sense, is fairly useless. I need to know when
>I'm entitled to make the Dogs-are-Mammals inference, and I don't think
>out-of-band negotation of the "reserved" vocabulary for each RDF
>document is practical.
>
>I'd like to apologize for raising this issue so late in the process,
>but my understanding of it has only become clear in the past week.
>Previously, I had some vague notion that we could "float" the meaning
>of RDF identifiers, but I no longer think that is practical. I am
>indebted to Pat Hayes, Jeff Heflin, David Booth, Larry Masinter, Dan
>Connolly, and especially Tim Berners-Lee for recent conversations
>helping me understand these issues (even when they disagreed with me).
>
>Last week at the DAML-PI meeting [3], TimBL said that we are not ready
>to "float the currency" of identifier meanings yet, and wont be for
>perhaps fifty years. For now, he argued, we need to stay on the gold
>standard, where namespace owners have the non-negotiable right to
>dictate the meanings of the terms in their namespace. This is like
>the US Government saying a US "dollar" is worth 1/35th of a Troy ounce
>of gold; it defines the US dollar in terms of other well-known
>concepts. This makes sense when introducing a term; it makes less
>sense when everyone has developed a strong sense of what the term
>means. Tim's point, I think, was that we're a long way from computers
>being able to navigate in a world of vague meaning.
>
>***** 4. Argument For Entailment
>
>Let's return to my Dog/Mammal example. Let's bind the namespace
>"animals" to "http://www.w3.org/2002/10/meaning/animals#". The
>document at that address (without the hash) is some RDF saying in RDFS
>that animals:Cat is, in fact, a subclass of animals:Mammal.
>
>Does this mean that the triple
> _:x rdf:type animals:Cat.
>entails
> _:x rdf:type animals:Mammal.
>?
>
>There are some issues here about connectivity, trust, and
>change-over-time, but let's defer them for the moment. Assume a
>static, always connected, always trustworthy web.
>
>Now, I claim that (following the "gold standard") the second triple
>follows logically from the first. The author of the first chose to
>use the "animals" namespace, and by doing so acknowledged the
>definitions therein. The author could have used some other namespace,
>or no namespace, but chose to use "animals" (by which I mean the
>longer URI above). The author almost certainly chose to use the
>"animals" namespace so that others, doing later queries or merges,
>would connect his expressions with other expressions about animals.
>He wanted us to be able to infer that _:x was a mammal.
>
>Did he want us to follow the gold standard, or did he want us to have
>to think carefully about which definition of animals to use? He
>probably wanted us to use the gold standard, to use the definitions at
>the namespace address, because otherwise there's a chance we'd believe
>some foolish claim about cats being fishes, and totally misunderstand
>him.
>
>So yes, granted the issues about connectivity, trust, and
>change-over-time, the above entailment should hold. Now, let's
>address those issues:
>
>***** 5. Answers to Problems
>
>1. Connectivity. Connectivity does not affect entailment. Whether
> or not someone can get a copy of the "animals" definition document
> does not change the fact that that document is the primary source
> for the definitions of all the terms in the animals: namespace.
> If you can't fetch the definitions, then your knowledge of the
> terms is incomplete and your reasoning about them will be
> incomplete. Incomplete reasoning can be a problem, but it's
> hardly a new problem or one which only arises when we bring in
> connectivity issues. If you can't fetch the document (and don't
> have a current cached copy) then you know that you're missing some
> information. The monotonicity guarantee of RDF, however, allows
> you to proceed with your partial information, which might be good
> enough.
>
>2. Trust (except for change-over-time). This gold standard means
> that the claims of an RDF document (which [1] says should have
> legal weight) depend on the contents of other documents. This is
> more stable than saying such claims depend on social consensus,
> but it still involves trust. If I say my dog has rdf:type
> animals:Dog and the animals document says that an animals:Dog was
> once kicked by Ebenezer Scrooge, can I really be held to be saying
> that Scrooge committed such an act? I think so; I haven't found a
> solid line marking the parts of a definition which have bearing
> solely on other things. Perhaps the animals document means the
> Scrooge clause to be the necessary and sufficient condition for
> doghood! So, a bit hesitantly, we have to say that all statements
> in the definition document are asserted by any use of terms from
> the document.
>
> We can address the Scrooge issue by saying that using terms from a
> document is a lot like signing it. Don't do it unless you have
> read the document and agree with it. Of course you need to do
> this recursively, following the definitions of any terms it uses.
>
>x. (x is for extra) This brings up the issue of URIRefs "grounding
> out" in natural language text (which may well make use of
> mathematical notation). Our "animals" document constrains the
> meaning of animals:Dog (very slightly) by using the term
> rdfs:subclassOf. That term needs to be constrained by the
> document at the rdfs namespace [4], which it sort of is. To
> follow the gold standard, that document must make normative
> reference to "http://www.w3.org/TR/rdf-schema/" which it currently
> does not. (We could exempt RDF and RDFS from this policy,
> understanding that their meanings are acknowledged by the very use
> of the RDF/XML data format. There is little reason for this
> special dispensation.)
>
> I don't see a proper way in the current spec to make this kind of
> normative reference from an RDF/XML document to a human-readable
> one. Perhaps it is sufficient for an rdfs:comment or
> dc:description to claim, in its natural-language text, that it is
> in fact normative. That's a little loopy, but natural language
> can probably handle it. Better would be to make sure the RDFS
> namespace document said that rdfs:comment contained true
> natural-language statements about the subject.
>
>3. Change-over-time is a special case of the "stewardship" issues. It
> doesn't necessarily involve time; it's possible for a web server
> to offer one definition document to people who seem to be in France
> and another to people who seem to be in England.
>
> Stewardship issues arise often: should one define one's input as
> being Unicode 3.2 characters, or as being whatever characters set
> is the latest approved by the Unicode Consortium? Do you
> advertize your program as running on "OS Version 9.1" or "OS
> Version 9.1 or later"? It all depends on whether you trust the
> stewardship of the organization which controls the underlying
> components.
>
> The solutions here are typical security solutions, because these
> are fairly typical security problems. How do you know the
> definitions are the same ones you agreed to? (Secure hash
> functions are a good approach.) If you agreed to ongoing updates
> by some steward, how do you know the updates are actually coming
> from that entity? (Public keys are a good approach.)
>
> There's some interesting engineering to do here. The simplest
> solution would be for each RDF document to give the SHA1 checksums
> of each of its namespace documents -- if a checksum is missing or
> does not match, the definition is considered to be unfetchable.
> Something like:
>
> <?xml version="1.0"?>
> <!DOCTYPE rdf:RDF [
> <!ENTITY animals "http://www.w3.org/2002/10/meaning/animals">
> <!ENTITY rdf "http://www.w3.org/1999/02/22-rdf-syntax-ns">
> ]>
> <rdf:RDF xmlns:rdf="&rdf;#"
> xmlns="&animals;#">
> <rdf:Description rdf:ID="spot">
> <rdf:type rdf:resource="&animals;#Dog" />
> </rdf:Description>
> <rdf:Description rdf:resource="&animals;">
> <rdf:sha1>953365afbc5c24ecfe590c350ab1345bee2f7aee</rdf:sha1>
> </rdf:Description>
> </rdf:RDF>
>
> It's not pretty; maybe someone has some better ideas.
>
> The meaning of the SHA1 triple is a little tricky. It does not
> mandate importing the URI's contents as one might imagine, because
> (I propose) the RDF Specs already mandate it. Rather, it *allows*
> it. Without the SHA1 triple, you would know there was some web
> content which gave you further true information about the subject
> at hand, but you would not be allowed to read it. With the SHA1
> triple, if the content matches, you can go ahead and read and use
> the additional content. Perhaps authors who don't want to bother
> with SHA1 could add an alternate triple saying, in effect, I trust
> any definitions you get from the URIs I use. (This might
> be sufficient in any RDF document which is not cryptographically
> signed.)
>
> Eric Prud'hommeaux suggested the use of an HTTP header could allow
> several documents to be served from the same URI, distinguished by
> the SHA1 hash sent in the header. This would allow author and
> namespace owner to negotiate (at read-time) on the exact
> definition text to use, facilitating migration and
> content-negotiation. This might be a nice feature, but it's not
> necessary. This proposal, as is, works for entirely-static
> definitions which is all we really need. (Since I calculated the
> above checksum, I've twice resisted the urge to change the
> definition in minor ways.) If we want to allow continuing
> stewardship, some additional mechanism (such as a public key & URI
> in the hashed static document) will be needed.
>
> Another approach is my sdh proposal [5], but that's a bigger
> change for RDF, and is not necessary if the official definition of
> RDF is updated to include these semantics that the definitions of
> terms are considered to be asserted.
>
>y. (why not add an extra (rather philosophical) point?) I've been a
> little vague about what a "definition" is. I mean a "definition"
> to be some declarative statement which uses the term and is true
> only for certain meanings of that term. An asserted (included,
> imported) definition thus limits the possible valid
> interpretations (models) of statements which use the term.
>
> A "strong" definition is a work of art which constrains
> interpretation to the point where no observable differences
> emerge. For artificial terms, even stronger "perfect" definitions
> can be written. These are definitions in the mathematical sense,
> "Let us define f to be...". Compared to that, natural language
> definitions and ontologies are usually mere descriptions. Still,
> I call them definitional documents in accordance with their intent
> and common usage.
>
> Definitions do not have to be perfect, or even strong, of course.
> They can be "thin" ontologies like my Dog/Mammal one, which merely
> offer a little helpful description. The essense of the gold
> standard is that, no matter whether a definition is thin, strong,
> or perfect, you at least know which one everyone is supposed to
> use.
>
>
>***** 6. Older Conclusion
>
>I've tried hard to be clear and concise here, and I apologize for any
>failures. I understand you're working under a looming deadline, but
>this issue is crucial to address as soon as possible, in this version
>of RDF. I don't think this is a change in the basic intent of RDF,
>but if you Recommend the MT in its current form, you will have given RDF
>URIRefs only floating semantics.
>
>I doubt the change from floating semantics back to namespace-document
>semantics can be made compatibly. With floating semantics, people and
>machines reading RDF are required to use their own judgement in
>deciding which definitions to use. Once they start doing that,
>authors will become used to it, and will no longer be obligated to
>adhere to original definitions. Obligations cannot be imposed
>retroactively (in this kind of a free environment), so if
>namespace-document semantics are added later, they will have to be
>added in a language which is marked as having different semantics.
>But the difference is easy to miss; it's the difference that "now you
>have to use the terms as defined!" and if there's a reasonable doubt
>about authors understanding this change, then they really have no
>obligation (such as might stand up in court), and the change has not
>actually been made.
>
>Since floating semantics are not amenable to automated reasoning, if
>you pass on this issue now, you will have kept RDF (in its present
>form and probably all similar future forms) from being a viable
>Semantic Web language. That would be unfortunate.
>
>If there is any further way I can assist in this matter, please let me
>know.
>
> -- sandro http://www.w3.org/People/Sandro/
>
>[0] http://www.ninebynine.org/wip/RDF-concepts/2002-10-18/rdf-concepts.html
>[1] http://www.w3.org/TR/rdf-concepts/#section-Meaning
>[2] http://www.w3.org/TR/rdf-mt/#urisandlit
>[3] http://www.daml.org/meetings/2002/10/pi/
>[4] http://www.w3.org/2000/01/rdf-schema
>[5] http://www.w3.org/2002/09/sdh/
>[6] http://www.w3.org/2002/08/LX
-------------------
Graham Klyne
<GK@NineByNine.org>
Received on Friday, 25 October 2002 13:41:25 UTC