- From: Patrick Stickler <patrick.stickler@nokia.com>
- Date: Thu, 26 Sep 2002 10:58:25 +0300
- To: "ext Sergey Melnik" <melnik@db.stanford.edu>, "RDF Core" <w3c-rdfcore-wg@w3.org>
[Patrick Stickler, Nokia/Finland, (+358 40) 801 9690, patrick.stickler@nokia.com] ----- Original Message ----- From: "ext Sergey Melnik" <melnik@db.stanford.edu> To: "RDF Core" <w3c-rdfcore-wg@w3.org> Sent: 25 September, 2002 20:00 Subject: Tidy/untidy: that's all about assumptions, folks > > In the heat of our argument about tidiness, we seem to be forgetting > about a critical assumption that was suggested to justify untidy > literals. Below, I'm questioning this assumption. If it holds, than > untidy literals are a natural decision to make (and I voted for it last > time), if it does not, there is no sufficient justification for > introducing untidiness. Fair enough. > What I'm arguing for is that we simply have to > remove the prism we've been looking through recently, and untidiness > goes away. Well, as my comments below will suggest, I think it is the tidy view that is looking at things through a prism, based on the unfounded presumption or expectation of canonical lexical forms. > Recall the motivating example from the RDF 1.0 Spec: > > foo dc:Creator "John Smith" > > Is "John Smith" supposed to represent a person or a string? From a data markup perspective, we can say that the form of the expression is an ambiguous name, since it is not a URIref. From a knowledge representation perspective, I think it's pretty intuitive that we're talking about a person in the real world, and not a string. If RDF is for data markup, then fine, let's say the meaning of the object is a string. But then, why not just use XML... But, as I believe is the case, RDF is for knowledge representation, then it's rather odd (to use a polite word ;-) to consider that the object denotes anything other than some thing in the real world. We simply need to provide the machinery so that this can be made clear in the RDF itself. > The key > argument behind untidiness is that "John Smith" (or "10") cannot > possibly be meant to be a string, so it has to be something else, whose > meaning can be deduced using a right bit of logic and AI. Right, but the basis for that argument is not really that the literal cannot possibly mean a string, but that given the role and purpose of RDF as a language for making statements about the world, it is rather bizarre for it to mean a string. It is a string in the RDF/XML because we can't write the integer value ten or the person John Smith in XML, but surely it is the things in the world that are meant and not some aspects relating to the form of expression. > Or, consider our beaten up > > :x age "10" > :y shoeSize "10" > > Again, the claim of proponents of untidiness is that "10" cannot > possibly be meant to denote a string, in both cases. Why? Because we can > infer > > :x age :z > :y shoeSize :z Well, I think the better example in this case is :x title "10" (string) :y age "10" (integer) and further :z payday "10" (monthday) :q model "10" (token) etc. where the ultimate interpretation conflicts with the string equality tests > supposedly meaning that the age of :x is the shoeSize of :y. Ok, we have > datatyping now, so let's do it right: > > :x age int"10" > :y shoeSize int"10" > > Now we got it! int"10" is not a string now; it's what we want it to > mean: an integer. Damn. The entailment > > :x age :z > :y shoeSize :z > > still holds... Er, why is that a problem that it holds. It should hold. But given :x title <xsd:string>"10" :y age <xsd:integer>"10" :z payday <xsd:gMonthDay>"10" :q model <xsd:token>"10" then we can be very happy that the following entailment does *not* hold :x title :a :y age :a :z payday :a :q model :a Also, what will the impact be to applications expecting tidy semantics when :x age "10" :y shoeSize "010" does *not* entail :x age :z :y shoeSize :z ??? > Is there something wrong with the above modeling practice? Should > int"10" itself be considered untidy, like those untyped literals? > Are all those folks who chose the above modeling style dumb? NO, they > are not. Above, the properties age and shoeSize are merely used to > restrict the valid interpretations of :x and :y. There is no claim that > shoeSize is a property that holds between shoes and "shoe sizes". It's a > property that holds between shoes and integers, thereby restricting the > intepretation of :y. Sure. I'm not sure anyone was assuming otherwise. > Just as well, shoeSize could be defined as a > property that holds between shoes and strings/reals/etc. Well, one problem with saying that the interpretation is based on a property that holds between resources and lexical representations is that there is not, and IMO can never be, any restriction against non-canonical lexical forms. Therefore, even if you were to take a tidy approach where inline literals denote themselves, you would *still* have to evaluate cases such as :x :p "10" :x :p "10.0" :x :p "010" :x :p "010.0" etc. in terms of a lexical to value mapping to determine actual equality of the objects. Thus, the benefit of equality tests on tidy literals is an illusion that has no guaruntee in the real world. Folks will be dumping lexical representations as they exist in various auxiliary systems, and not normalizing them to any canonical representations. And to expect that one will always encounter the literal "10" when the integer ten is meant is a fantasy. The literal "10" will probably be the most common lexical representation for ten, but there is *no* garuntee that it will be the only lexical representation for ten, and if we are to support and respect XML Schema datatypes, which allow for such non-canonical representations, we must accept this reality. At the end of the day, if some application wants to be absolutely sure that two lexical representations denote the same thing, they must evaluate them in terms of either a non-canonical to canonical mapping or a lexical to value mapping of some datatype. Just saying the literals are tidy doesn't do it. You still have to deal with synonymous variants. > An overwhelming majority of applications use exactly this metaphor. For > example, look at the AdobeXMP documentation, where the range of > xapDynA:Volume is defined to be a Real. Did those folks want to assert > that the abstract concept of volume coincides with real numbers? No. Or, > what about CC/PP's > > :x displayWidth int"640" ? > > After all, display width is not measured in integers, but in inches or > centimeters... > > My conclusion is that it is not necessary to claim that "John Smith" > represents a person (and call for untidy literals), in order to achieve > correct modeling. And, by no means have applications and APIs to be > changed to reflect this "insight". The applications, and their > developers, possess a consistent conceptual model of what dc:Creator or > age or shoeSize mean. These apps run just fine. For the lack of > conceptual necessity of "thinking untidy" I'm suggesting: don't touch > running systems. If all we cared about were individual closed systems, which just happened to use some common API as a convenience, fine, then who cares about where the interpretation of literals happens. However, if we are concerned about the interchange of knowledge between disparate systems, which may or may not have the same internal implicit assumptions about the interpretation of literals, then it's a very big deal whether or not the standard MT for interchange, the RDF MT, specifies what their interpretation is in a portable, consistent manner. I believe that the primary purpose of RDF is as a language for interchange of knowledge (not just structured markup), and as such, the more explicitly that meaning can be expressed in that language the better. Thus, rather than saying that, in the case of :x displayWidth "640" . the meaning of "640" is some string. I'd rather see a schema that explicitly states what the interpretation of that string is, e.g. displayWidth rdfs:range xsd:integer . displayWidth x:unitOfMeasure foo:inch . etc. etc. To say that the object denotes some integer value which is a magnitude of a particular unit of measure, inches. Now *that* is knowledge that is useful for one system to tell another. And surely we don't want to define complex labeled nodes that embody all that information regarding the interpretation of values in each occurrence of a value itself! E.g. ((rdf:type,xsd:integer)(x:unitOfMeasure,foo:inch))"640" ((rdf:type,xsd:integer)(x:unitOfMeasure,foo:inch))"100" ((rdf:type,xsd:integer)(x:unitOfMeasure,foo:inch))"30" ((rdf:type,xsd:integer)(x:unitOfMeasure,foo:inch))"4800" ((rdf:type,xsd:integer)(x:unitOfMeasure,foo:inch))"10" ... How silly. Rather, the longstanding and accepted best practice is to capture the general information at a higher level (the property) and express only that portion which must be expressed for each occurrence (the lexical form). This is just plain good design. Choosing a model for datatyping which encourages ambiguity and leaves as implicit system-specific interpretations seems to me to be contrary to the very purpose of RDF. We already have a standard for structured markup, XML. We don't need another. RDF is about saying things about the world, and having the RDF MT assign meaning to literals reflecting the form of expression rather than their intended denotation in the world weakens the language and hinders the explicit and unambiguous interchange of knowledge on the SW. Patrick
Received on Thursday, 26 September 2002 03:59:18 UTC