W3C home > Mailing lists > Public > www-rdf-logic@w3.org > May 2001

DAML ObjectProp vs DatatypeProp

From: Drew McDermott <drew.mcdermott@yale.edu>
Date: Wed, 16 May 2001 10:19:55 -0400 (EDT)
Message-Id: <200105161419.KAA07501@pantheon-po01.its.yale.edu>
To: www-rdf-logic@w3.org

When the latest DAML+OIL draft came out, there was some discussion of
the separation between ObjectProperty's and DatatypeProperty's.  I
have reviewed it, but I am still puzzled by the distinction.

I agree that there is a need for concrete datatypes in DAML (as in RDF
and XML).  I am somewhat puzzled by exactly how to go about providing
them.  The problem is that DAML has inherited from the SGML/XML
tradition this vagueness about exactly what the leaves of the tree are
in a marked-up document.  There are two sorts of leaves:

   Attributes:   <tag name="Smith"> .... </tag>

   Elements with no markup inside:
                 <name>Smith</name>

My CS instincts tell me that I should be looking for some notion of a
"literal" at this point.  I.e., in Java I can write

     name = "Smith";

and the compiler treats "Smith" as a literal string.  Each language
defines a syntax for literals that makes it unambiguous what the
literal denotes.  At least, I believe this to be the case.  Does
anyone know of any exceptions?  For instance, in C-like languages 
76 is the integer 76, and 0x76 is the integer 118 (because the "0x"
makes the literal hexadecimal).

Unfortunately, this is not how XML works.  First, there is the
regrettable choice of quotes to surround all attribute values.  It
seems to imply that all attributes have string values, which they
emphatically do not.  Indeed, I've never seen an example of an
attribute with a string value, which would presumably be written

      <tag name='"Smith"'>  ...  </tag>

(Yes, this is legitimate XML; you can use single quotes if the
attribute values contains double quotes.)

Of course, many applications treat "Smith" as a string, which requires
further information.  I can't remember where I saw it, but there is an
XML/RDF convention that allows you to write something like

      <tag name="Smith" datatype="String">
or    <tag employer="Smith" datatype="Name">

so that the string "Smith" is meant in the first case, Smith himself
in the second.  (Please no digressions on names vs. URx's here; the
example works just as well if we use
"mdtp://universe.org/everyone#Smith,J.Q" instead of "Smith".)

Exactly the same remarks could be made about the other kind of XML
tree leaf.  We could treat any string of characters between tags

     <employer>Smith</employer>
     <name>"Smith"</name>
     <shoesize>9</shoesize>

as though it were a literal, and everything would be clear.
Unfortunately, this case is even murkier than the other.  It's usually
completely XML-sub-dialect-dependent what the interpretation of such
things is.  RDF is surprisingly vague about this.  In examples such as
this one from the RDF bible:

  <rdf:RDF>
    <rdf:Description about="http://www.w3.org/Home/Lassila">
      <s:Creator>Ora Lassila</s:Creator>
    </rdf:Description>
  </rdf:RDF>

the underlying analysis is

       Subject (Resource) 
                        http://www.w3.org/Home/Lassila 
       Predicate (Property) 
                        Creator
       Object (literal) 
                        "Ora Lassila"

This is practically the first example given, so perhaps it's
deliberately oversimplified, but it appears to suggest that the string
"Ora Lassila" created Ora Lassila's web page.  I would have thought
this was a place where we would be almost compelled to say

  <rdf:RDF>
    <rdf:Description about="http://www.w3.org/Home/Lassila">
      <s:Creator resource="mdtp://universe.org/everyone#Lassila,Ora"/>
    </rdf:Description>
  </rdf:RDF>

Okay, so now that we're completely confused, let me return to the
topic I started with.  Why is there a separation between
ObjectProperty and DatatypeProperty?  The only distinction between
them is their range, but there are already mechanisms for
specifying the ranges of properties.  

Jonas Liljegren said as much in a message of March 29:

   There is no need to split up the rdfs:Class or rdfs:Property.  RDFS
   already has this distinction in the rdfs:Literal class.

   Datatype properties are recognised by having Datatype as range.
   Datatype is recognised by being subClassOf rdfs:Literal.

   The classes daml:ObjectProperty, daml:DatatypeProperty ...
   should go away.

but the followup soon wandered off into other topics, which is too
bad, because I think he was absolutely right.

Anyway, can someone point me to the authoritative source on literal
data in RDF/DAML?  If there isn't one, I would be inclined to
recommend:

a) That literals occur *only* as attribute values.  Text in elements
is just too unconstrained.  The notion of "markup-free text" is rather
wobbly (probably deprecated by the Authorities); it's not clear even
how to handle whitespace.

b) That there be a unambiguous syntax for literal data, so that one
would *not* have to declare the intended datatype of every attribute
value.  The convention that "Smith" sometimes refers to "Smith" and
sometimes to Smith should be done away with.  If a string is intended,
there should be a syntax for specifying strings, either '"Smith"',
"'Smith'", "\"Smith\"", or ""Smith"".  (That last one is kind of
cute.)

c) If someone writes <shoesize value='"Smith"'/>, the RDF validity
checker notes that the provided literal violates the rdfs:domain
constraint on shoesize, and issues an error message.  

                                             -- Drew McDermott

(By the way, "mdtp:" is the "magical denotation transfer protocol,"
which allows us to reach out and refer to any entity anywhere without
possibility of ambiguity.  Unfortunately, the release of version 1.0
of the software has been significantly delayed by unexpected
metaphysical snags.)
Received on Wednesday, 16 May 2001 10:19:56 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 22:45:37 UTC