Re: Input sought on datatyping tradeoff from pat hayes on 2002-07-19 (www-rdf-logic@w3.org from July 2002)

From: pat hayes <phayes@ai.uwf.edu>
Date: Fri, 19 Jul 2002 16:43:14 -0500
To: "Geoff Chappell" <geoff@sover.net>
Cc: www-rdf-logic@w3.org
Message-Id: <p05111b0db95e36a68518@[65.217.30.45]>
>----- Original Message -----
>From: "Brian McBride" <bwm@hplb.hpl.hp.com>
>To: "Geoff Chappell" <geoff@sover.net>; <www-rdf-logic@w3.org>
>Sent: Monday, July 15, 2002 4:46 PM
>Subject: Re: Input sought on datatyping tradeoff
>
>
>>
>>  At 09:20 15/07/2002 -0400, Geoff Chappell wrote:
>>
>>  >I have a question about datatyping used with untidy literals. Given test
>>  >case D:
>>  >
>>  >Test D:
>>  >
>>  >    <Jenny>      <ageInYears> "10" .
>>  >    <ageInYears> rdfs:range xsd:decimal .
>>  >
>>  >    <John>  <ageInYears>   _:a .
>>  >    _:a     xsdr:decimal   "10" .
>>  >
>>  >
>>  >My understanding is that in a world of untidy literals, literals are
>>  >(potentially) ambiguous names. Not only can many literals refer to one
>>  >thing, but the same literal can refer to many things (as opposed to uris
>>  >which are supposedly unambiguous names - i.e. a uri can only identify one
>>  >thing though many uris could refer to the same thing). With this
>>  >understanding a datatype identifies by uri a black-box that performs name
>>  >resolution - i.e. the datatype is able to functionally identify a
>>  >thing/object/value based solely upon its
>>  >(potentially-ambiguous-wrt-the-world-at-large-but-not-wrt-the-datatype)
>>  >name. A datatype has a set of names that it is able to resolve and a
>>  >corresponding set of things/values.  The members of the datatype class
>(when
>>  >the datatype is used as a class) are simply the things/values it is able
>to
>>  >resolve names to.
>>
>>  That is a pretty good summary.  I think you have that right, though there
>>  was one place  where I wanted to wordsmith a bit, and there are others
>>  where the logicians might.  But those would be to picky for our purpose
>>  here, I think.
>>
>>
>>  >But what specifically is the meaning of the datatype when used as a
>>  >property?
>>
>>  Associated with the datatype is a property extension which consists of a
>>  set of pairs, e.g.
>>
>>     { (1, "1"), (2, "2"), ... }
>>
>>  This is the way the current model theory works, so there is nothing
>special
>>  in this aspect about datatype properties.
>>
>>  >Clearly in test D above the first "10" is meant to denote the
>>  >decimal value 10, as is node _:a. But what does the second "10" (the
>object
>>  >of xsdr:decimal) denote?
>>
>>  Ignoring complexities referred to by Peter for now, the second "10"
>denotes
>>  a string.
>>  We know its a string because we know xsdr:decimal is a datatype property
>>  and all datatype properties take strings as their values.
>>
>>  I may be glossing over some technical details here, but this is the basic
>idea.
>>
>>  >  One possibility is that it is also the decimal 10.
>>  >Then a datatype used as a property states the equality under the datatype
>of
>>  >the subject and object (which would be enough in this instance for a
>>  >datatype-aware processor to figure out that _:a denotes the decimal 10).
>>  >Another possibility might be that it is referring to the name itself
>(which
>>  >I guess would make use of a datatype property some sort of a quoting
>>  >mechanism?). But if that is the case, how is the rdf processor to know
>that?
>>
>>  Somewhere we have an assertion which I didn't show:
>>
>>     xsd:decimal rdf:type rdfd:datatype .
>>
>>  >what range constraint on the datatype property would indicate that? just
>>  >rdfs:Literal? does rdfs:Literal become a "built-in" datatype that maps
>>  >string values to themselves? (I often confuse myself here because in the
>>  >whole discussion of tidyness vs untidyness I understand the term
>"literal"
>>  >as used to talk about the name/label of the graph node while
>"rdfs:Literal"
>>  >obviously is referring to the type of the value - little difference I
>guess
>>  >in the world of tidy literals).
>>
>>  Just so.
>>
>>  Have I done enough to convince you this is possible, or do I need to call
>>  in the cavalry?
>
>Thanks, you've answered most of my questions. I do have a remaining
>question - let me try to restate it.

Cavalry arriving late:

>In the untidy world, unlike the tidy world, a literal does not have a fixed
>meaning. Since literals can not be subjects of statements, the only way to
>constrain the meaning of a literal is to attach range constraints to the
>property to which the literal is attached

Lets say yes, though there might be some subtle tricks to do it in other ways.

>(maybe not entirely true? I guess
>in some non-XML/RDF syntax you might also be able to terminate more than one
>arc on a literal node). A range constraint of an rdfs class is not
>sufficient to license an rdf processor to "know" the value to which the
>literal refers only to constrain it to one of the members of the class. A
>datatype constraint does provide enough information (to a processor intimate
>with that datatype) to actually fix the value since the datatype provides a
>mapping between literals and the members of the datatype class.
>
>I guess my remaining question boils down to what licenses an rdf processor
>to conclude that the literal on the object side of a datatype property is
>referring to itself while all other literals are (potentially) referring to
>entities other than themselves?

In the untidy option, it never can make that determination. A 'lone' 
literal in the untidy world is like a bnode with a meaningless label 
attached to it: it just says that something exists which has this as 
its lexical rendering; but until you know more about the 
lexical-to-value rules (ie the datatype) that doesn't tell you 
anything.

In the tidy option, literals *always* denote themselves, and any 
datatyping information doesnt change that: at best, it can be used to 
fix the interpretations of some bnode suitably 'linked' to the 
literal.

There are several 'hybrid' options on the table that try to get the 
best of both worlds. Brian's questions were partly designed to elicit 
intuitions which might have supported one of these. In spite of 
having invented a few of them, I now think they cause more smoke than 
light.

Pat Hayes


-- 
---------------------------------------------------------------------
IHMC					(850)434 8903   home
40 South Alcaniz St.			(850)202 4416   office
Pensacola,  FL 32501			(850)202 4440   fax
phayes@ai.uwf.edu 
http://www.coginst.uwf.edu/~phayes
Received on Friday, 19 July 2002 17:42:40 UTC