- From: Graham Klyne <GK@ninebynine.org>
- Date: Fri, 25 Oct 2002 19:06:22 +0100
- To: Sandro Hawke <sandro@w3.org>
- Cc: www-rdf-comments@w3.org
Sandro, Wow! That's a heavy message! In short, I think you touch on a number of issues that are beyond the scope of the current RDF specifications. I did note a couple of areas that might be improved in response to these comments. I've summarized my take on these issues in the document issue description at: http://www.ninebynine.org/wip/DocIssues/RDF-Concepts/021-MeaningOfURIRefs.html Have I missed any vital ingredients here? #g -- At 01:05 PM 10/24/02 -0400, Sandro Hawke wrote: >***** 1. New Introduction and Summary > >In the editor's draft of RDF-CONCEPTS [0], you've added a lot of text >about the meaning of a URIRef coming from the web-content available at >its URI-part. It's an excellent and much-needed addition. > >I want to underscore how important it is by pointing out that >social meaning is self-reinforcing. If people start to doubt the >importance of using URIRefs as they are defined (and begin to >experiment with their own incompatible meanings), the RDF specs are >likely to lose any authority in the matter. People need tremendous >confidence in the language in which they write their contracts if >they are to be held to those contracts. There must be very little >window for people to argue about what the definition of "is" is. > >With that in mind, and with an eye towards prospects of automated >reasoning, I'd like to propose this test case: > ><?xml version="1.0"?> ><!DOCTYPE rdf:RDF [ ><!ENTITY animals "http://www.w3.org/2002/10/meaning/animals"> ><!ENTITY rdf "http://www.w3.org/1999/02/22-rdf-syntax-ns"> >]> ><rdf:RDF xmlns:rdf="&rdf;#" > xmlns:animals="&animals;#"> > <rdf:Description rdf:ID="spot"> > <rdf:type rdf:resource="&animals;#Dog" /> > </rdf:Description> ></rdf:RDF> > >(I moved the hash-mark out of the entity for reasons which will be >clear later.) > >This parses as: > >_:x <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> ><http://www.w3.org/2002/10/meaning/animals#Dog> . > >and it should entail > >_:x <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> ><http://www.w3.org/2002/10/meaning/animals#Mammal> . > >How? Because the document at "http://www.w3.org/2002/10/meaning/animals" >says that #Dog is an rdfs:subclassOf #Mammal. > >Let me back up a little and clarify: we have three kinds of >entailment: > > (1) RDF simple entailment, as in the MT [2], which says > things like every RDF graph entails its subgraphs. > This kind of entailment pays no attention to URIRefs. > (2) Entailment with the "rdf" and "rdfs" vocabulary terms > reserved, as in MT [2]. > (3) Entailment where every URIRef is constrained in meaning > according to the web content available at its URI part. > >Of course DAML+OIL defines its own entailment, as does OWL, as do my >various layered logic languages [6], but these should all be seen as >special cases of (3). The terms used by Dublic Core, RSS, Creative >Commons, and various other efforts may not define their meanings with >model theories or first-order axioms, but their terms are also >carefully defined, and in some cases their misuse would be >intollerable (and in the case of CC, perhaps even illegally!). > >Type (2) entailment above should also be subsumed into type (3), by >putting normative pointers at the rdf and rdfs namespace addresses to >the appropriate Recs (when the Recs happen). In fact, the MT should >be more clear in distinguishing between (1), (2), and (3). (2) should >probably be in a separate document. Perhaps (1) and (3) should also >be separated, but they remain to describe the meaning inherent in all >RDF documents, regardless of any URIRefs which occur in it. > >The point here is that an RDF document must be taken to assert the >truth of all the documents it names in the URI parts of its >node-labeling URIRefs. If those documents are available to a reader, >and the reader is capable of understanding them, the reader is fully >entitled to infer facts from the conjunction of the author's documents >and all the definitional documents. Moreover, the reader can >attribute these conclusions to the author; the author is responsible >for chosing terms (eg comic, clown) whose definitions he accepts. > >There are many more details, below. I first approached this topic >without noticing the new text in the editor's draft, and spent more >time arguing why using the URI for the semantics was important. I'm >going to leave that text here, because some people are still probably >not convinced. If you are convinced, feel free to skip sections 3 >and 4. > >**** 2. A Few Notes on RDF-CONCEPTS [0] > >I think you overplay the difference between formal and natural languages in >2.3.3 in the example with > > B:oneOfThem rdfs:comment "This means the same as rdfs:subClassOf". > >If we take rdfs:comment to provide normative natural language >information about the subject (and if it doesn't we need some other >property which does), then in fact C is still to blame for the insult >to C:JohnSmith. The failure of RDFS class reasoning to reach the >insult does not mean the insult is not style-3 entailed, in this case >via B:oneOfThem. > >I think 2.3.4 is wrong: the predicate needs no special status. The >situation you're trying to prevent here is prevented by accepting the >namespace/URI owner as authoritative in defining the terms there. >(see my definition of definition in section 5.y). > >Section 2.3.5 is also misleading: there is RDF-Simple-Entailment ("1" >above) and RDF-URI-Based-Entailment ("3" above), and that pretty much >covers it. At some URIs (eg OWL, RDF/RDFS, LX) you should find >appeals to natural language and/or mathematical definitions which are >not directly usable by machines, but the terms defined there can be >used to define other terms in a way which *is* amenable to automated >reasoning. One could try to distinguish between natural language >definitions and formal language definitions, but I'm not sure how that >would help, since automated reasoners vary so much in what kind of >formal languages they can handle. > >***** 3. Older Introduction > >If I receive and believe an RDF document, D, saying that D:spot has >rdf:type animals:Dog, and the animals schema says that animals:Dog is >a subclass of animals:Mammal, would it be right of me to infer that >D:spot has rdf:type animals:Mammal? > >Your answer might be "never", "sometimes", or "always." If you say >"never," then I think you've missed the point of RDF and XML, with all >these URIs and namespaces. If you say "sometimes," then we need to >talk about the qualities of those times. If you say "always", we have >some consequences which might be problematic. (I will argue that the >correct answer is "always" and that the problems are manageable.) > >In any case, I don't think the current working drafts are clear on >this issue. RDF-CONCEPTS section 2.3 [1] suggests to me the answer is >probably "always" and RDF-MT section 1.2 [2] says "sometimes" and that >it depends which vocabulary you are reserving. Such an answer from >the MT, while true in a sense, is fairly useless. I need to know when >I'm entitled to make the Dogs-are-Mammals inference, and I don't think >out-of-band negotation of the "reserved" vocabulary for each RDF >document is practical. > >I'd like to apologize for raising this issue so late in the process, >but my understanding of it has only become clear in the past week. >Previously, I had some vague notion that we could "float" the meaning >of RDF identifiers, but I no longer think that is practical. I am >indebted to Pat Hayes, Jeff Heflin, David Booth, Larry Masinter, Dan >Connolly, and especially Tim Berners-Lee for recent conversations >helping me understand these issues (even when they disagreed with me). > >Last week at the DAML-PI meeting [3], TimBL said that we are not ready >to "float the currency" of identifier meanings yet, and wont be for >perhaps fifty years. For now, he argued, we need to stay on the gold >standard, where namespace owners have the non-negotiable right to >dictate the meanings of the terms in their namespace. This is like >the US Government saying a US "dollar" is worth 1/35th of a Troy ounce >of gold; it defines the US dollar in terms of other well-known >concepts. This makes sense when introducing a term; it makes less >sense when everyone has developed a strong sense of what the term >means. Tim's point, I think, was that we're a long way from computers >being able to navigate in a world of vague meaning. > >***** 4. Argument For Entailment > >Let's return to my Dog/Mammal example. Let's bind the namespace >"animals" to "http://www.w3.org/2002/10/meaning/animals#". The >document at that address (without the hash) is some RDF saying in RDFS >that animals:Cat is, in fact, a subclass of animals:Mammal. > >Does this mean that the triple > _:x rdf:type animals:Cat. >entails > _:x rdf:type animals:Mammal. >? > >There are some issues here about connectivity, trust, and >change-over-time, but let's defer them for the moment. Assume a >static, always connected, always trustworthy web. > >Now, I claim that (following the "gold standard") the second triple >follows logically from the first. The author of the first chose to >use the "animals" namespace, and by doing so acknowledged the >definitions therein. The author could have used some other namespace, >or no namespace, but chose to use "animals" (by which I mean the >longer URI above). The author almost certainly chose to use the >"animals" namespace so that others, doing later queries or merges, >would connect his expressions with other expressions about animals. >He wanted us to be able to infer that _:x was a mammal. > >Did he want us to follow the gold standard, or did he want us to have >to think carefully about which definition of animals to use? He >probably wanted us to use the gold standard, to use the definitions at >the namespace address, because otherwise there's a chance we'd believe >some foolish claim about cats being fishes, and totally misunderstand >him. > >So yes, granted the issues about connectivity, trust, and >change-over-time, the above entailment should hold. Now, let's >address those issues: > >***** 5. Answers to Problems > >1. Connectivity. Connectivity does not affect entailment. Whether > or not someone can get a copy of the "animals" definition document > does not change the fact that that document is the primary source > for the definitions of all the terms in the animals: namespace. > If you can't fetch the definitions, then your knowledge of the > terms is incomplete and your reasoning about them will be > incomplete. Incomplete reasoning can be a problem, but it's > hardly a new problem or one which only arises when we bring in > connectivity issues. If you can't fetch the document (and don't > have a current cached copy) then you know that you're missing some > information. The monotonicity guarantee of RDF, however, allows > you to proceed with your partial information, which might be good > enough. > >2. Trust (except for change-over-time). This gold standard means > that the claims of an RDF document (which [1] says should have > legal weight) depend on the contents of other documents. This is > more stable than saying such claims depend on social consensus, > but it still involves trust. If I say my dog has rdf:type > animals:Dog and the animals document says that an animals:Dog was > once kicked by Ebenezer Scrooge, can I really be held to be saying > that Scrooge committed such an act? I think so; I haven't found a > solid line marking the parts of a definition which have bearing > solely on other things. Perhaps the animals document means the > Scrooge clause to be the necessary and sufficient condition for > doghood! So, a bit hesitantly, we have to say that all statements > in the definition document are asserted by any use of terms from > the document. > > We can address the Scrooge issue by saying that using terms from a > document is a lot like signing it. Don't do it unless you have > read the document and agree with it. Of course you need to do > this recursively, following the definitions of any terms it uses. > >x. (x is for extra) This brings up the issue of URIRefs "grounding > out" in natural language text (which may well make use of > mathematical notation). Our "animals" document constrains the > meaning of animals:Dog (very slightly) by using the term > rdfs:subclassOf. That term needs to be constrained by the > document at the rdfs namespace [4], which it sort of is. To > follow the gold standard, that document must make normative > reference to "http://www.w3.org/TR/rdf-schema/" which it currently > does not. (We could exempt RDF and RDFS from this policy, > understanding that their meanings are acknowledged by the very use > of the RDF/XML data format. There is little reason for this > special dispensation.) > > I don't see a proper way in the current spec to make this kind of > normative reference from an RDF/XML document to a human-readable > one. Perhaps it is sufficient for an rdfs:comment or > dc:description to claim, in its natural-language text, that it is > in fact normative. That's a little loopy, but natural language > can probably handle it. Better would be to make sure the RDFS > namespace document said that rdfs:comment contained true > natural-language statements about the subject. > >3. Change-over-time is a special case of the "stewardship" issues. It > doesn't necessarily involve time; it's possible for a web server > to offer one definition document to people who seem to be in France > and another to people who seem to be in England. > > Stewardship issues arise often: should one define one's input as > being Unicode 3.2 characters, or as being whatever characters set > is the latest approved by the Unicode Consortium? Do you > advertize your program as running on "OS Version 9.1" or "OS > Version 9.1 or later"? It all depends on whether you trust the > stewardship of the organization which controls the underlying > components. > > The solutions here are typical security solutions, because these > are fairly typical security problems. How do you know the > definitions are the same ones you agreed to? (Secure hash > functions are a good approach.) If you agreed to ongoing updates > by some steward, how do you know the updates are actually coming > from that entity? (Public keys are a good approach.) > > There's some interesting engineering to do here. The simplest > solution would be for each RDF document to give the SHA1 checksums > of each of its namespace documents -- if a checksum is missing or > does not match, the definition is considered to be unfetchable. > Something like: > > <?xml version="1.0"?> > <!DOCTYPE rdf:RDF [ > <!ENTITY animals "http://www.w3.org/2002/10/meaning/animals"> > <!ENTITY rdf "http://www.w3.org/1999/02/22-rdf-syntax-ns"> > ]> > <rdf:RDF xmlns:rdf="&rdf;#" > xmlns="&animals;#"> > <rdf:Description rdf:ID="spot"> > <rdf:type rdf:resource="&animals;#Dog" /> > </rdf:Description> > <rdf:Description rdf:resource="&animals;"> > <rdf:sha1>953365afbc5c24ecfe590c350ab1345bee2f7aee</rdf:sha1> > </rdf:Description> > </rdf:RDF> > > It's not pretty; maybe someone has some better ideas. > > The meaning of the SHA1 triple is a little tricky. It does not > mandate importing the URI's contents as one might imagine, because > (I propose) the RDF Specs already mandate it. Rather, it *allows* > it. Without the SHA1 triple, you would know there was some web > content which gave you further true information about the subject > at hand, but you would not be allowed to read it. With the SHA1 > triple, if the content matches, you can go ahead and read and use > the additional content. Perhaps authors who don't want to bother > with SHA1 could add an alternate triple saying, in effect, I trust > any definitions you get from the URIs I use. (This might > be sufficient in any RDF document which is not cryptographically > signed.) > > Eric Prud'hommeaux suggested the use of an HTTP header could allow > several documents to be served from the same URI, distinguished by > the SHA1 hash sent in the header. This would allow author and > namespace owner to negotiate (at read-time) on the exact > definition text to use, facilitating migration and > content-negotiation. This might be a nice feature, but it's not > necessary. This proposal, as is, works for entirely-static > definitions which is all we really need. (Since I calculated the > above checksum, I've twice resisted the urge to change the > definition in minor ways.) If we want to allow continuing > stewardship, some additional mechanism (such as a public key & URI > in the hashed static document) will be needed. > > Another approach is my sdh proposal [5], but that's a bigger > change for RDF, and is not necessary if the official definition of > RDF is updated to include these semantics that the definitions of > terms are considered to be asserted. > >y. (why not add an extra (rather philosophical) point?) I've been a > little vague about what a "definition" is. I mean a "definition" > to be some declarative statement which uses the term and is true > only for certain meanings of that term. An asserted (included, > imported) definition thus limits the possible valid > interpretations (models) of statements which use the term. > > A "strong" definition is a work of art which constrains > interpretation to the point where no observable differences > emerge. For artificial terms, even stronger "perfect" definitions > can be written. These are definitions in the mathematical sense, > "Let us define f to be...". Compared to that, natural language > definitions and ontologies are usually mere descriptions. Still, > I call them definitional documents in accordance with their intent > and common usage. > > Definitions do not have to be perfect, or even strong, of course. > They can be "thin" ontologies like my Dog/Mammal one, which merely > offer a little helpful description. The essense of the gold > standard is that, no matter whether a definition is thin, strong, > or perfect, you at least know which one everyone is supposed to > use. > > >***** 6. Older Conclusion > >I've tried hard to be clear and concise here, and I apologize for any >failures. I understand you're working under a looming deadline, but >this issue is crucial to address as soon as possible, in this version >of RDF. I don't think this is a change in the basic intent of RDF, >but if you Recommend the MT in its current form, you will have given RDF >URIRefs only floating semantics. > >I doubt the change from floating semantics back to namespace-document >semantics can be made compatibly. With floating semantics, people and >machines reading RDF are required to use their own judgement in >deciding which definitions to use. Once they start doing that, >authors will become used to it, and will no longer be obligated to >adhere to original definitions. Obligations cannot be imposed >retroactively (in this kind of a free environment), so if >namespace-document semantics are added later, they will have to be >added in a language which is marked as having different semantics. >But the difference is easy to miss; it's the difference that "now you >have to use the terms as defined!" and if there's a reasonable doubt >about authors understanding this change, then they really have no >obligation (such as might stand up in court), and the change has not >actually been made. > >Since floating semantics are not amenable to automated reasoning, if >you pass on this issue now, you will have kept RDF (in its present >form and probably all similar future forms) from being a viable >Semantic Web language. That would be unfortunate. > >If there is any further way I can assist in this matter, please let me >know. > > -- sandro http://www.w3.org/People/Sandro/ > >[0] http://www.ninebynine.org/wip/RDF-concepts/2002-10-18/rdf-concepts.html >[1] http://www.w3.org/TR/rdf-concepts/#section-Meaning >[2] http://www.w3.org/TR/rdf-mt/#urisandlit >[3] http://www.daml.org/meetings/2002/10/pi/ >[4] http://www.w3.org/2000/01/rdf-schema >[5] http://www.w3.org/2002/09/sdh/ >[6] http://www.w3.org/2002/08/LX ------------------- Graham Klyne <GK@NineByNine.org>
Received on Friday, 25 October 2002 13:41:25 UTC