- From: Jeremy Carroll <jjc@hplb.hpl.hp.com>
- Date: Thu, 31 Jan 2002 11:00:22 -0000
- To: <w3c-rdfcore-wg@w3.org>
Thanks Pat for the detailed analysis. This message expands significantly on what in the TDL document I glossed over as "technical reasons". Short version: ============= I am not surprised that you were not happy with the TDL model theory. The basic choice to use pairs rather than just (typed) values in the model theory was due to limits of my technical competence. I tried and failed to come up with an account that uses values. As far as I know, no one actually advocates using pairs. If you can come up with an account that addresses your Q/Cs about pairs, using my hack (C16) for union datatypes, and otherwise resurrect the P++ model theory (I think), then please do. I will be relieved. I provide some motivation for literal-value pairs in my response to your Q8 & Q9 below. This is of the sort "I didn't want to use pairs, but I had to in order to ..." My response to C15, in a separate e-mail, gives some idea of why I don't see the use of literal-value pairs as disastrous, but merely an inconvenient technicality. Aside: ====== If I have understood your comments correctly you have not identified any new fundamental mistakes with my MT (like the self-entailment bug). As such I achieved my objective: giving an existence proof for there being at least one formal framework for Patrick's pairing, PD framework. I see your comments as listing other desirable characteristics that I failed to achieve. Detailed response: ================== > Q1. Definition of TDL refers to a 'pairing'. Does that mean some kind > of syntactic combination operation, or is this just a mathematical > definition of some abstract entity? And what exactly is a 'datatype > identity'? For Patrick, as I understand it, the key idea is that datatyping is about pairs. Each pair being a string from the input document and the type with which to interpret the string. I find this pairing a little simplistic, in that in the syntactic idiom of range constraints for example, many types can be applied. Thus if we wanted to treat the pairing of a string and its type formally, we would have to introduce a new anonymous type being the intersection of all the given types. I considered this and rejected it. I also considered and rejected suggesting to Patrick that the emphasis on string-type pairs should be dropped. As a first approximation to what is going on, I believe the string-type pairing idea helpful, particularly to the reader who does not wish to understand all the details. I also believe that the string-type pair is a very plausible implementation technique. I think the document suffers stylistically because my key input was also a pairing, but a different one! (The literal-value pair). > Q2. In the figure immediately below, what is meant by 'internal > value' and 'application value space'? I will leave this one to Patrick. Although see my (separately posted) response to C15, which I think touches on similar issues. > C[3. Style comment. I suggest it would be better to not modify the > definition of RDF interpretation, but to introduce a new notion of > 'datatyped interpretation' or whatever. That would enable us to keep > all the different notions of entailment straight. ] Good point. If we conclude that the doc is worth a major revision I will put this on the to-do list. > Q4. Terminology section refers to 'before' and 'after' datatyping. I > have trouble understanding what this means. Do you have in mind that > there is some kind of process which 'datatypes' an RDF graph? If so, > what is the difference between the graph before and the graph after > that operation? Or does this mean something else altogether? Yes the terminology is poor. Try: For clarity, we use disjoint terminology for literals in the graph syntax, and their interpretations and meanings both in the model theory's universe. [ bullet pointed defns] The literal-value pairs occur in the model theory's universe. The intent is that RDF applications may manipulate either or both of the Unicode string or the typed value. > C5. The literal-value pairs are welldefined but seem odd, since they > pair a unicode string with a semantic value, ie a denotation. Is that > really what you mean? If so, then a set of these things would be a > datatype mapping, right? This is all true. > C6. "....A datatype class corresponds to its map, ie a set of pairs..." > Well, OK, but this seems a very odd decision. First, the natural RDF > object corresponding to a set of pairs is a property (extension), not > a class. Second, while a property can of course have a class > extension as well as its property extension, there isnt any implied > connection between them in RDF, so if you treat a set of pairs as a > class then that amounts to saying that the fact that it is a class of > *pairs* is irrelevant to its behavior as a class. Third, there is in > fact no way to specify in RDF that any particular class is a class of > pairs; whereas if you had characterized this as a property, then the > RDF semantic conditions imply that it has a property extension (if it > is ever used in a triple). At no point does TDL use a datatype as a property as in S-A. Therefore it would be odd to include such properties in the model. I am wondering if you wrote your comments as you were going through or on a second reading. It feels to me as if in this para above you haven't really appreciated how I am trying to consistently follow what I agree was an odd decision (C5). Your later comments seem to have accepted this more. > Q/C7. Interpretation. "...the type information is checked by > requiring this pair to be a member of each class associated with this > node. " > What does 'associated with this node' mean?? I think what you mean is > 'each class which the denotation of the node is required to be a > member of', right? (That is what a range constrai...sorry, assertion > of a triple using rdfs:Range, would imply, for example.) If so, that > is what the RDF MT says already. But notice that according to your > convention about datatype classes, that says that the node labelled > with the unicode string denotes a pair, not the value inside the > pair. Is that really what you want it to say? That would mean that > the for example the 'same' date written using different date formats > are different dates, and so on. In fact, as far as identity is > concerned, it means that any two values from any two distinct > datatyping schemes are never the same value. Yes, yes, yes. This was intended to be consistent with your MT document in its usage of rdfs:Range. And yes, your date example illustrates a limitation of TDL (not shared by S-A). > Q8. That same paragraph refers to 'untyped Unicode nodes'. Does that > imply that there are two kinds of Unicode node? If so, how are they > distinguished in the syntax? No. There are only the same old (untidy) literals that we have always had. The untyped ones are just those that are not subject to any type constraints. So a straight triple with no constraints like: _:a <foo> "fred". The "fred" is an "untyped Unicode node" of this paragraph. Once we apply an rdfs:range constraint to <foo>, then "fred" is no longer untyped. Similarly the two triples: _:a <foo> _:b . _:b <rdf:value> "bar" . leave "bar" as untyped, but an rdf:type triple on _:b, along with the semantics of <rdf:value> can apply a type to "bar". I think this is an appropriate place to expand on the "technical reasons". The TDL MT is intended to systematically follow an open world assumption on both triples and types. So when seeing either the single "fred" triple or the two "bar" triple, without any other information, we want to have some set of interpretations. When we also have a range constraint available, and we "know" the datatype of the range constraint or we have a triple like _:b <rdf:type> <xsd:string> . (applying to the "bar" example). we wish to have a subset of interpretations. Hence, as I saw it, the set of possible typed values corresponding to an 'untyped Unicode node' (e.g. "fred" in the single triple example) is unbounded. As intellgible type information is added we monotonically reduce the set of possible typed values. The cardinality of the set of possibilities is unbounded in the untyped case and 1 or 0 in the typed case (excepting union datatypes). The literal-value pair was motivated partly because I wasn't prepared to stomach a wholly unrestricted interpretation of 'untyped Unicode nodes'. Thus the first component is always uniquely resticted and the second component is restricted or not depending on type information. > Q9. In section 3.1 example 1, the figure has this new kind of (green > hexagonal) node in it. What is this thing, exactly? (Is it an > extension to the RDF graph syntax? Or some kind of external addition > to the graph?? Or what? If it is just an annotation, then this is > one of the old proposals (called DC in > http://lists.w3.org/Archives/Public/w3c-rdfcore-wg/2001Nov/0295.html); > but it reinterprets the xsd: classes in an odd way that makes the > assertions wrong, for some reason best known to you guys :-) It is just an annotation and is not in the syntax. This idiom is the DC idiom. I understand the "makes the assertions wrong" part of the question as: "But why on earth are you interpreting the bnode as a pair rather than simply the typed value?" So answering my (possibly incorrect) rephrasing of your question. 1] I wanted a monotonic semantics. What does aaa eg:prop _:x . _:x rdf:value "10" . mean without an rdf:type? What does it mean with both an eg:decimalInteger and an eg:octalInteger type (assuming we have both available). aaa eg:prop _:x . _:x rdf:value "10" . _:x rdf:type eg:decimalInteger . _:x rdf:type eg:octalInteger . 2] (variant of [1]) I wouldn't want to rule out multiple types, because sometimes it works. 3] Even in the daml idiom aaa eg:prop _:x . _:x rdf:value "10" . _:x rdf:type xsd:integer . How are we to know that the rdf:value needs to use the xsd:integer mapping function, and not the eg:octal mapping function. Both would associate an integer to _:x, so either would satisfy _:x rdf:type xsd:integer . if we understand that as merely operating in the value space. 4] Both issues [1], [2], [3] can be replicated for examples based around aaa eg:prop "10" . and using range constraints on eg:prop. This is particularly important when we consider [2]. If we have a data document, a schema document, and a second alternative schema document all independently authored, all three documents may be usable together (consistent) despite a lack of direct collaboration. The ability to ignore minor unimportant variations in range constraints seems desirable. (And the ability to distinguish between important and unimportant variations in range constraints). I believe that the model theory I produced is monotonic and does cover the whole range of synatctic possibilities, not just the recommended idiom. I have not understood how Peter's work or your (unfinished) P++ work address [1], [2], [3], [4] above. > Q/C10. Model Theoretic Interpretation of local idiom. "...Hence x is > the integer 30." OK, but what this graph asserts is that the age of > Bob is the pair <"30",30>, right? Not that the age of Bob is 30. (If > that is wrong, how do the interpretation rules for the Bob ex:age ... > triple manage to extract the second item in the denoted pair?) Yes,you have understood. I always use the literal-value pair, everywhere. There are no bare values in the model, there are no bare literals in the model. The phrase quoted "Hence ..." was intended less formally, and perhaps better have mentioned something about an RDF application treating x as the value 30. > Also, if what you say about rdf:value is correct, then since the > unicode node has to have the same denotation as the blank node, and > since that denotation has to be in the class xsd:integer, it has to > be a pair; so the unicode node itself has to denote a pair. And the > first item of that pair is the unicode string itself, right? Yes. > C.11 Relevant to the above: look, you don't need to have *pairs* in > the class. They are just getting in the way. If the xsd:integer class > were the value space of the datatype (and if rdf:value were identity) > then this idiom would work just fine, and Bob's age would be 30. You > do need some semantic constraint to interpret the unicode strings > properly, but then you need that anyway. The pairs don't seem to help > any. I agree the pairs are inconvenient. In your view, how do we *not* get Bob's age to be 24 with an octal reading of the string "30"? > Q12. In section 3.2, global idiom: "Per the following, the lexical > form "30" is required to be a member of the lexical space of the > datatype xsd:integer". HOW? I really don't see how this works. Since > xsd:integer denotes a set of pairs, the range of ex:age must be a set > of pairs, so whatever the unicode node denotes must be a pair. But > you have it marked as 'representing a value'; and you also say that > the lexical form is thereby required to be a member of a lexical > space. As far as I can see, this is saying the following. The unicode > string denotes a pair <a,x> consisting of a unicode string and a > datatype value (eg <"13", 13>, which is a member of the extension of > the xsd:integer datatype mapping), and it thereby 'represents' a > value, and also simultaneously is 'required' to be a member of a > lexical space. BUt it can't be all three of "13", 13 and <"13",13> at > the same time, right? On the graph there is the literal label "13". In the model theoretic interpretation there is the pair <"13",13>. The application may well decide to only use the second component (13) of the pair from the model theoretic interpretation. In this sense it is all three at the same time. Now, I will try an explain HOW the constraint of the string to the lexical space happens. This supplements the subsection entitled "Model Theoretic Interpretation of Global Idiom". The text '"30" is required to be a member of the lexical space' was informal, intended for RDF/XML document authors. Clarifying what the 'requirement' is may be as follows: "When using this idiom you are required to use a string that is in the lexical space of the datatype, [ or else your document will be RDFS inconsistent ]" I take you to be interested in the last bit about RDFS inconsistency. I will use an example, in which instead of "30" we will have "foo" as the literal label. The interpretation of "foo" is required to be a pair < "foo", x > for some x. Using the schema closure rules from RDF model theory we effectively have a triple of the form: "foo" rdf:type xsd:integer . (the literal node "foo" is intended as the very same literal node that is the object of the ex:age triple. The schema rules as I understand them are all actually constraints concerning IEXT, but I find them much more difficult to state) This datatype class membership, as always in the TDL MT, is understood as being about the pair being in the map. There are no pairs < "foo", x > in the map of the datatype xsd:integer, and hence whatever x we choose the interpretation did not statisfy the schema constraints. Thus: For RDF (without schema), the two triples are consistent, with interpretations of the "foo" node being < "foo", x > for arbitrary x Whereas for RDFS, the schema processing makes the two triples inconsistent. As I understand it, the lexical space of xsd:integer is defined as the domain of the map. The example above shows, at least when we have RDF schema processing, the two triples are inconsistent unless the label of the literal node is in the domain of the map of the datatype. Hence the label is constrained to be in the lexical space of the datatype. > (Also, re. 'required': what if the unicode string is NOT a member > of the lexical space of the asserted datatype? Is that just an > inconsistency?) Yes. > C13. (Following on from C11). It seems clear from the diagram that > what this RDF is supposed to mean is that the range of ex:age is > integers (according to xsd:integer, ie the value space), and that the > age of Bob is 30. What's wrong with just declaring that that is > indeed what it does mean? See > http://lists.w3.org/Archives/Public/w3c-rdfcore-wg/2001Nov/0011.html. > The problem with the old P(++) proposals was the nasty flaw detected > by Patrick , which is that a super-datatype-class of a datatype class > might have a different lexical-to-value mapping. But your 'pairs' > proposal has exactly the same problem. In fact it is worse, since if > the lexical-to-value mapping is different, then the datatypes are not > even subclasses of one another, with your convention; so there isn't > even any way to *say* that one datatype is 'sub' another. (Of course, > just refusing to say that, say, xxsd:octal is a subclass of > xxsd:number, or whatever the example is that screws up datatype > inheritance, was always an option in the old P(++)-style proposal as > well. ) Ummm, no. I did look carefully at your referenced message, it's still in my browser cache! As indicated under Q9 I saw no way of keeping a model that interprets the strings solely in the value space while addressing the inconsistent lexical=>value mappings problem. I don't think the inconsistent mappings problem is restricted to subclasses, it occurs whenever two datatypes have overlapping value spaces. I believe my solution is consistent with XML Schema Datatypes class hierarchy which I understand as being essentially the subset hierarchy on the mappings. i.e. if A is an xsd subtype of B then A.map is subset of B.map If I have misunderstood that aspect of XML Schema Datatypes TDL model theory will at least need a revision. > C14. The paragraph "Whether the rdfs:range statement....property in > question." isn't going to work in ANY model theory, unless we > effectively redefine RDF syntax to provide some way to distinguish > local from global. The MT has to be defined on triples, not on > triples in some kind of undefined 'context'. (How far out do we have > to look in the graph, or on the web, to see if there is a 'more > local' assertion?) This paragraph is again intended for RDF/XML document & RDF schema authors. If a schema author uses a range constraint there has been a long standing discussion as to whether this is "a constraint" that documents must satisfy or a means for generating implicit triples. Of course, this long standing discussion is groundless. However, there is a community that expects clarity on this issue. For you, Pat, I will deconstruct the informal text, in model theoretic terms. However, I do not think it appropriate that our documents should be targetted at model theoreticians alone. The TDL document has clearly identified sections with model theory, the other sections are, and are intended as less formal. I have proof-read them to check that there is (IMO) a consistent reading with the model theory. "Whether the rdfs:range statement constitutes a constraint on the allowed datatypes depends on whether there exists any local (explicit) type assignment." Model theoretically when both global range constraints and local type triples are present in a graph both apply. "If there is no local typing for the literal value whatsoever, then rdfs:range can only serve as a global (implicit) type assignment." If there is no local typing, only global typing applies, and a contradition based on two types clashing is not possible. (This doesn't exclude a contradiction because the unicode string is not in the lexical space of the type). "However, if the literal has one or more types defined locally, and any locally specified datatype is not compatible with all datatypes globally implied by rdfs:range for the property, one can treat such a case as a contradition to a constraint on the expected or required datatype(s) for the property in question." When both local and global types are given, a possible contradition is that the intersection of the mappings of all applicable types is empty, (or does not include the unicode string in its domain). Consider when: - all the triples apart from the local type triple are consistent - all the triples including the local type triple are inconsistent. This constrains the local type in the sense that a different choice of local type by the RDF/XML document author would result in a consistent document. > C15. The list of Satisfactions looks good, but omits the one rather > central one which I guess people didnt think to write out explicitly: > that the idiom used actually means what it ought to mean. I understand this being a plea to use values rather than literal-value pairs in the model. See separate response. > C16. Neat hack for almost handling union datatypes, ie ignore the > part that gives all the trouble. If I can use that same hack, I can > do them too :-) Please do. Notice this is part of the systematic monotonicity of TDL model theory. See ya Jeremy
Received on Thursday, 31 January 2002 06:00:58 UTC