- From: Pat Hayes <phayes@ai.uwf.edu>
- Date: Mon, 22 Oct 2001 14:22:30 -0500
- To: Dan Connolly <connolly@w3.org>
- Cc: w3c-rdfcore-wg@w3.org
>I'm having trouble following the literal model theory >stuff in detail, and my attention is diverted by >stuff like preparation for the upcoming AC meeting, >and I'll be out next week. > >But I propose the following requirement as a constraint >on solutions to this literal stuff: > >[[[ >Lack of ambiguity > >Some programming languages allow one to introduce identifiers >from new name spaces in such a way that it is not possible to >know which namespace a local identifier belongs to without >accessing both the module interface specifications and checking >which one has with the highest priority, or most recently in the >document, redefined a given local identifier. > >This may have some uses in a programming language such as >Java[Java], but it has a serious flaw in that when one module >changes (without the knowledge of the designers of the other >module), it can unwittingly redefine a local identifier used by the >second module, completely changing the meaning of a previously >written document. Clearly, in the Web world in which modules >evolve but documents must have clearly defined meanings, this is >unacceptable. >]]] > >-- Web Architecture: Extensible languages >http://www.w3.org/TR/NOTE-webarch-extlang#Ambiguity >W3C Note 10 Feb 1998 > >That is: it's essential that the interpretation of >an RDF document is a function of the document alone, >and doesn't vary according to the contents of other >documents. We can apply this criterion more or less strictly, however. Let me try applying it in several cases. There are three distinct classes of proposals, seems to me. I used to like the second, but have become persuaded that the first is best, and maybe the only viable option in the long run, so I would like to not rule it out. 1. Contextual datatyping. One (class of) proposals is to allow literals to be used in a such a way that the exact meaning of the literal depends on the immediate context of a triple it is in, for example by using range information from another triple, so that in: aaa isSmallerThan "567881962" . isSmallerThan rdfs:range xsd:Integer . the literal should denote an integer, but in: http://www.coginst.uwf.edu/~phayes#TheMan hasSSnumber "567881962" . hasSSnumber rdfs:range govsd:SocialSecurity . the very same literal would denote a social security number (which is, let us suppose, a different datatype). In other triples it might denote a string, a date, whatever. This is the class of proposals ('class' because there are several alternative ideas about exactly how to provide the 'contextual' information that is given here by the rdfs:range information) that Peter P-S and I have sorted out how to handle in an extension to the current MT, with some very small tweaks. (To emphasize, I am not urging that the WG adopt these extensions, or even consider them, but I would like to suggest that we take care not to lock in any restrictions that would make this future extension impossible.) The consequence of this is that the meaning of a 'bare' literal triple is in a sense not determined until one has enough information about the 'context' to decide what literal-typing scheme to use, eg if I read aaa bbb "567881962" . and I know nothing else about the range of bbb, and the literal occurrence itself has no associated typing information, then I don't know whether that third item is supposed to be a string, or an integer or a date or whatever, so I really don't know what the triple is asserting. (The MT extension would actually assign a meaning to the literal, but it would be a function from an as-yet-unknown datatype to a value, and we can't determine the truthvalue of the triple itself until we have some way to apply the function to a datatype. We could tweak the MT even further and regard such a bare triple as a kind of disjunction over all possible datatypes; that would essentially treat this as similar in meaning to the 'bnode' proposal, below.) Notice, this does not suffer from the really drastic problem mentioned in the above quote. The suggestion is not that the meaning of a bare triple might change as a result of what is said about it in some other module. There is no doubt that it is only the value of this particular occurrence of the literal that is being determined, and its interpretation depends only on information about the terms in the triple in which it occurs. That information - range information, in this case - might indeed be provided by another document, but it cannot be overridden or altered by another document, except in the general sense in which any piece of RDF might be overridden or updated. (As with any other RDF, two documents might not agree about the correct range information, and in that case the meaning of the triple would also become doubtful. For example if we had two assertions bbb rdfs:range xsd:Integer . bbb rdfs:range xsd:String . then as far as RDFS is concerned the range is both string and integer, but I presume that this would cause some datatyping engine to barf.) An extended treatment would be to invoke a 'default' datatyping scheme in which (in the absence of any other type information) the literal is interpreted as denoting itself, ie as a string. That has the merit of locking in current RDF M&S best practice, but it has the disadvantage of introducing a nonmonotonic construction in which adding information (eg from another document) does indeed change the interpretation of a triple (eg from being true of a string to being true of, say, an SS number). However, it is a very restricted and well-behaved form of nonmonotonicity, in a sense. (The new extended MT technique has as an automatic consequence that the 'string' interpretation has a special status, in that it is the only 'fixed' interpretation which can be composed with a later datatyping scheme in a transparent way. This corresponds exactly to the intuition that treating RDF literals as strings is just kind of telling the parser to treat them as 'blobs' whose meaning will be determined by something else; the something else can be another RDF document, is all. ) 2. Datatype-on-the-sleeve Another class of proposals takes very seriously the idea that any symbol must have a fixed meaning in any interpretation, and insists that this goes for literals as well. It follows therefore that if one literal means the integer 567881962 and another literal means the SS number 567-88-1962, then they must be *different* literals. So, on this account, the objects in the above example triples must in fact be distinct literals, and maybe they should be written differently, or something; in any case, this difference has to be somehow incorporated into the 'logical syntax' ie in our case, in the RDF graph. Maybe they would look like this: aaa isSmallerThan xsd:Integer:"567881962" . http://www.coginst.uwf.edu/~phayes#TheMan hasSSnumber govsd:SocialSecurity:"567881962" . where the datatype is somehow incorporated into the very syntactic form of the literal itself. (The exact form of such inclusion I leave to others to decide; this prefixing isn't intended as a serious proposal for a suitable syntax.) This kind of solution trades syntactic complexity (and rigidity) for conceptual simplicity. This is completely trivial for the model theory, and assigns datatyping to an essentially lexical issue. Notice that the 'range' information could still be given, but its only purpose now would be to enable the inference engine to check that the lexically assigned datatypes were consistent, ie it would be a kind of integrity check rather than providing any new information. Notice also in this case, the 'bare literal' case would be lexically ill-formed, and one would expect that the datatyping machinery (which lies outside RDF in this case) would have assigned some kind of default interpretation to any 'bare' literal, eg by prefixing it with xsd:string: , say. 3. Literals as blank nodes (This may be regarded simply a variant on the first class of proposals, but I think it is worth treating separately as it requires no extension to the MT, but it does require rewriting all RDF graphs which use literals.) This treats a literal an an existential assertion together with an assertion about the syntactic 'literal' form, as Ron Daniel suggested in http://lists.w3.org/Archives/Public/w3c-rdfcore-wg/2001Oct/0213.html. So the above examples would be rewritten in the following ways: aaa isSmallerThan _:thing1 . _:thing1 rdf:value "567881962" . _:thing1 rdf:type xsd:Integer . http://www.coginst.uwf.edu/~phayes#TheMan hasSSnumber _:thing2 . _:thing2 rdf:value "567881962" . _:thing2 rdf:type govsd:SocialSecurity . This has the merit of keeping all the datatyping information in the same document, but it suffers from triples-bloat. ------ >Any solution where the truth/falsehood of a document >(in some interpretation) depends on some range >contraint in another document would be a violoation >of what I suggest is a core RDF requirement. Well, you could say that the first class of proposals is ruled out by this, but I'm not sure. What if that range constraint was included in the scope of the interpretation? Pat PS. Just for clarification: the stuff about modifying Ntriples syntax to allow nodeIDs to be used on triples was to allow an slight modification to number 1 above where one asserts the datatyping directly as a property of the literal (which therefore has to be a subject), in effect short-circuiting the blank nodes in Ron's proposal by labelling the literal node itself. This maps the third proposal into the first: aaa isSmallerThan _:thing1:"567881962" . _:thing1 rdf:type xsd:Integer . thereby removing the triples-bloat and, incidentally, integrating this better into RDFS generally, since this rdf:type assertion is now giving exactly the same information that one could also get from an rdfs:range assertion, or maybe by some other means. The general point is that if you know something is in an rdfs class *which is defined by a datatype* and you also know its a literal, then you know how to interpret it, independently of *how* you happen to know that it is in that class. So datatypes are just a special kind of class, which is what got Peter so excited :-) -- --------------------------------------------------------------------- IHMC (850)434 8903 home 40 South Alcaniz St. (850)202 4416 office Pensacola, FL 32501 (850)202 4440 fax phayes@ai.uwf.edu http://www.coginst.uwf.edu/~phayes
Received on Monday, 22 October 2001 15:22:47 UTC