- From: Drew McDermott <drew.mcdermott@yale.edu>
- Date: Wed, 16 May 2001 10:19:55 -0400 (EDT)
- To: www-rdf-logic@w3.org
When the latest DAML+OIL draft came out, there was some discussion of the separation between ObjectProperty's and DatatypeProperty's. I have reviewed it, but I am still puzzled by the distinction. I agree that there is a need for concrete datatypes in DAML (as in RDF and XML). I am somewhat puzzled by exactly how to go about providing them. The problem is that DAML has inherited from the SGML/XML tradition this vagueness about exactly what the leaves of the tree are in a marked-up document. There are two sorts of leaves: Attributes: <tag name="Smith"> .... </tag> Elements with no markup inside: <name>Smith</name> My CS instincts tell me that I should be looking for some notion of a "literal" at this point. I.e., in Java I can write name = "Smith"; and the compiler treats "Smith" as a literal string. Each language defines a syntax for literals that makes it unambiguous what the literal denotes. At least, I believe this to be the case. Does anyone know of any exceptions? For instance, in C-like languages 76 is the integer 76, and 0x76 is the integer 118 (because the "0x" makes the literal hexadecimal). Unfortunately, this is not how XML works. First, there is the regrettable choice of quotes to surround all attribute values. It seems to imply that all attributes have string values, which they emphatically do not. Indeed, I've never seen an example of an attribute with a string value, which would presumably be written <tag name='"Smith"'> ... </tag> (Yes, this is legitimate XML; you can use single quotes if the attribute values contains double quotes.) Of course, many applications treat "Smith" as a string, which requires further information. I can't remember where I saw it, but there is an XML/RDF convention that allows you to write something like <tag name="Smith" datatype="String"> or <tag employer="Smith" datatype="Name"> so that the string "Smith" is meant in the first case, Smith himself in the second. (Please no digressions on names vs. URx's here; the example works just as well if we use "mdtp://universe.org/everyone#Smith,J.Q" instead of "Smith".) Exactly the same remarks could be made about the other kind of XML tree leaf. We could treat any string of characters between tags <employer>Smith</employer> <name>"Smith"</name> <shoesize>9</shoesize> as though it were a literal, and everything would be clear. Unfortunately, this case is even murkier than the other. It's usually completely XML-sub-dialect-dependent what the interpretation of such things is. RDF is surprisingly vague about this. In examples such as this one from the RDF bible: <rdf:RDF> <rdf:Description about="http://www.w3.org/Home/Lassila"> <s:Creator>Ora Lassila</s:Creator> </rdf:Description> </rdf:RDF> the underlying analysis is Subject (Resource) http://www.w3.org/Home/Lassila Predicate (Property) Creator Object (literal) "Ora Lassila" This is practically the first example given, so perhaps it's deliberately oversimplified, but it appears to suggest that the string "Ora Lassila" created Ora Lassila's web page. I would have thought this was a place where we would be almost compelled to say <rdf:RDF> <rdf:Description about="http://www.w3.org/Home/Lassila"> <s:Creator resource="mdtp://universe.org/everyone#Lassila,Ora"/> </rdf:Description> </rdf:RDF> Okay, so now that we're completely confused, let me return to the topic I started with. Why is there a separation between ObjectProperty and DatatypeProperty? The only distinction between them is their range, but there are already mechanisms for specifying the ranges of properties. Jonas Liljegren said as much in a message of March 29: There is no need to split up the rdfs:Class or rdfs:Property. RDFS already has this distinction in the rdfs:Literal class. Datatype properties are recognised by having Datatype as range. Datatype is recognised by being subClassOf rdfs:Literal. The classes daml:ObjectProperty, daml:DatatypeProperty ... should go away. but the followup soon wandered off into other topics, which is too bad, because I think he was absolutely right. Anyway, can someone point me to the authoritative source on literal data in RDF/DAML? If there isn't one, I would be inclined to recommend: a) That literals occur *only* as attribute values. Text in elements is just too unconstrained. The notion of "markup-free text" is rather wobbly (probably deprecated by the Authorities); it's not clear even how to handle whitespace. b) That there be a unambiguous syntax for literal data, so that one would *not* have to declare the intended datatype of every attribute value. The convention that "Smith" sometimes refers to "Smith" and sometimes to Smith should be done away with. If a string is intended, there should be a syntax for specifying strings, either '"Smith"', "'Smith'", "\"Smith\"", or ""Smith"". (That last one is kind of cute.) c) If someone writes <shoesize value='"Smith"'/>, the RDF validity checker notes that the provided literal violates the rdfs:domain constraint on shoesize, and issues an error message. -- Drew McDermott (By the way, "mdtp:" is the "magical denotation transfer protocol," which allows us to reach out and refer to any entity anywhere without possibility of ambiguity. Unfortunately, the release of version 1.0 of the software has been significantly delayed by unexpected metaphysical snags.)
Received on Wednesday, 16 May 2001 10:19:56 UTC