snapshot of http://www.coginst.uwf.edu/users/phayes/simpledatatype23-02-2002.html from Dan Brickley on 2002-03-08 (www-archive@w3.org from March 2002)

From: Dan Brickley <danbri@w3.org>
Date: Fri, 8 Mar 2002 10:45:44 -0500 (EST)
To: <www-archive@w3.org>
Message-ID: <Pine.LNX.4.30.0203081045000.25655-100000@tux.w3.org>
for archival (no images)

---------- Forwarded message ----------
Date: Fri, 8 Mar 2002 16:00:06 GMT
From: danbri@fireball.danbri.org
To: danbri@w3.org

<html><head><title>simple RDF datatype</title><style type="text/css"> body { background-color: white; } 		p.c3 {font-style: italic} 	p.c2{font-style: italic; text-align: center} 	p.c1 {text-align: center}	.smallcode {font-family: monospace; font-size: small}	.pink {background-color: #FFCCCC}	.cream {background-color: #FFFF99 }	.lilac {background-color: #FF99FF }	.datatype {font-family: monospace; font-size: small; background-color: #99CC99 }			</style> <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1"></head><body bgcolor="#FFFFFF" text="#000000"><p>This version of datatyping, a new variation on the one in </p><p>http://www.coginst.uwf.edu/users/phayes/simpledatatype.html</p><p>has the following features:</p><p>Pro:<br>  <br>  a. Literals always denote themselves (so can be tidy)<br>  b. It supports the S-B idiom (using rdfs:drange)<br>  c. It allows the use of S-B, local typing using datatype triples, and range   datatyping,!
 in any combination<br>  d. It avoids most datatype clashes and provides a technique for resolving the   ones that do arise<br>  e. Datatype class names denote the value space of the datatype. </p><p>Con:<br>  <br>  f. It requires the use of rdfs:drange (or something other than rdf:range, anyway)<br>  g. Like all these simplified proposals, it doesn't provide any way to 'declare'   datatype urirefs, and relies on automagical recognition.</p><p>I've taken out the doublet idiom. This could be put back if the WG wants it,   however. </p><p class="lilac">The new stuff is all contained in section 5 and the MT. </p><p>Pat Hayes 2/23/2002</p><h3>-------------------------------------------------------</h3><h3><br>  1. Literals </h3><p>In RDF, urirefs and blank nodes are both considered to be referring expressions;   they are used to denote resources. Literals however are best thought of simply   as syntactic 'labels' which indicate a lexical form. These lexical for!
ms can   be used to restrict the references of other nodes by using datatype schemes,   but this use is optional. If a literal is used as a referring expression, it   always refers to itself - that is, to a character string - so that a triple   of the form </p><p class="smallcode">Jenny ex:age &quot;35&quot; .</p><p>states that the value of the property called <span class="smallcode">ex:age</span>   on the subject <span class="smallcode">Jenny</span> is the two-character string   '<span class="smallcode">35</span>'. Note that it does <i>not</i> say that the   value is the number thirty-five.There is no way to modify the meaning of a literal   node.</p><p>An example of such 'in-line' use of a literal to denote a string is provided   by <span class="smallcode">dc:title</span> in the Dublin Core.</p><h3>2. Datatypes</h3><p>If the intended meaning of literals is understood by a set of users or applications,   then the simple use case illustrated by the above examp!
le could be sufficient.   This 'untyped' kind of usage is always available in RDF. However, RDF also provides   ways to use <i>datatypes</i> to assert that a literal should be interpreted   in a particular way.</p><p>A datatype is defined abstractly by two domains, one of lexical forms and one   of values, and a mapping from lexical forms to values. We assume that a datatype   is indicated by a URI, and that some external mechanism is able to access and   make use of appropriate representations of the domains and map when supplied   with the URI.The model theory is stated in terms of a global function L2V from   datatypes to the lexical-to-value mapping of that datatype. In the examples   below, urirefs which are being interpreted as datatype names will be indicated   by the use of the <span class="datatype">color green</span>.</p><h3>3. Datatype triples</h3><p>The simplest way to talk about the value of a literal under a datatype mapping   is to provide a node!
 to denote the value and link that node to the datatype,   using the name of the datatype as the property. This is called a <i>datatype   triple</i>. For example</p><p class="smallcode">Jimmy ex:age _:x .<br>  _:x <span class="datatype">xsd:number</span> &quot;35&quot; .</p><p>says that Jimmy's age is the value of the literal under the datatype mapping   <span class="datatype">xsd:number</span>, i.e. that Jimmy's age is the number   35. (Contrast this with the example in the previous section.) The datatype triple   also, incidentally, asserts that the literal itself is in the lexical space   of the datatype. For example, </p><p>_:x <span class="datatype">xsd:number</span> &quot;HumptyDumpty&quot; .</p><p>would always be false, no matter what value is assigned to the bnode. This   is the only way in which an RDF triple can be contradictory. </p><p>A datatype triple is true when the literal is a well-formed lexical form of   the datatype, and the subject denotes !
the value of the lexical form under that   datatype's lexical-to-value mapping. The intuitive reading might be &quot;..<i>can   be described, according to this datatype mapping, by the character string</i>..&quot;.</p><p>(This is 'backwards' from the usual way of thinking about a datatype mapping   as applying to the lexical form and resulting in the value; the reason for this   is simply the RDF syntactic convention that prohibits literals in subject position.Technically,   the RDF datatype property is in fact the <i>inverse</i> of the datatype's lexical-to-value   mapping; the lexical-to-value mapping goes 'from' the object of the triple 'to'   the subject.)</p><h4> 3.1 Datatype properties are a local constraint on literals.</h4><p>The datatype triple is the most 'local' style of literal datatyping in RDF;   the interpretation imposed on the subject node by the datatype property is entirely   'inside' the triple. This means for example that the same literal can!
 be used   simultaneously in two different such triples, imposing different interpretations   on two different nodes. For example, if <span class="datatype">ex:octalnumber</span>   were a datatype property, then as well as using the literal as a decimal to   indicate Jennys age, we could also assert<br>  <span class="smallcode"><br>  Judy ex:age _:y .<br>  _:y </span><span class="datatype">ex:octalnumber</span><span class="smallcode">   &quot;35&quot; .</span></p><p>to assert that Judy's age was 29, and both uses of the literal could be in   the same RDF graph. Although the two bnodes _:x and _:y denote distinct values,   the literal <i>itself</i> has the same meaning in both cases - the lexical form. </p><p>Similarly, two different literal representations of the same value could be   specified using two different datatype triples which include the same subject:</p><p class="smallcode">_:y <span class="datatype">ex:USdecimal</span> &quot;12.25&quot;   .<br> !
 _:y <span class="datatype">ex:germandecimal</span> &quot;12,25&quot; .</p><p>Obviously, this only works when the literals do in fact map to the same value   under the respective mappings. </p><h4>3.2 Datatype properties have exact domains and ranges.</h4><p>We make one additional assumption concerning the use of datatype properties:   they have <i>exact</i> domains and ranges. </p><p>Normally in RDFS, an assertion about a range:</p><p class="smallcode">ppp rdfs:range ccc .</p><p>is understood to say that the precise range of ppp is a subset of the class   ccc. This allows RDFS to combine multiple range assertions coherently and reflects   the fact that the language has no way to express a 'lower bound' on the membership   in a class. However, we will assume that for datatype properties, such an assertion   is true only when ccc is the exact range of the property, no more and no less.   This exact range is the lexical space of the datatype, so:</p><p class="sma!
llcode"><span class="datatype">ppp</span> rdfs:range ccc .</p><p>asserts that the class<span class="smallcode"> ccc </span>is precisely the   set of lexical forms that are acceptable to the datatype <span class="datatype">ppp</span>. </p><h3>4. Missing datatype information:
rdfs:dlex</h3><p>Sometimes one wishes to associate a literal with a value without specifying   a particular datatype. RDFS provides a special property for this kind of underdetermined   association, called <span class="smallcode">rdfs:dlex</span> (read: Datatype   LEXical form). The triple </p><p class="smallcode">_:x rdfs:dlex &quot;37&quot; .</p><p>asserts simply that _:x is a value which can be represented by the character   string under some possible datatype mapping. This does not in itself 'fix' the   value, of course, but it can be used as a way of making the association between   the value and a lexical form explicit, for later use or amplification. We will   call this a <i>lexical form</i> triple. A useful way to think of the meaning   of <span class="smallcode">rdfs:dlex</span> is: &quot;..<i>can be described   by the character string</i>..&quot; </p><p>Notice that since <span class="smallcode">rdfs:dlex</span> is not a datatype,   it can be used to l!
ink several different literals to the same node:</p><p class="smallcode">_:x rdfs:dlex &quot;37&quot; .<br>  _:x rdfs:dlex &quot;29&quot; .</p><p>However, this should be done with caution, as this usage may conflict with   the technique described next.</p><h3 class="lilac">5. Attaching datatype constraints to a property: rdfs:drange.</h3><p>It is often convenient to associate a datatype with the range of a property,   so that every use of the property can be understood as asserting appropriate   datatyping conditions about its object. RDFS provides the special property <span class="smallcode">rdfs:drange</span>   for this purpose.(Read as <i>d</i>atatype <i>range</i> ; but do not confuse   this with <span class="smallcode">rdfs:range</span>, which has quite a different   meaning.) </p><p>There are two kind of datatype conditions that one might wish to attach to   a property, depending on whether the object of the property is a literal, or   a value linked to a !
literal in a lexical form triple. </p><p>In the first case, the usual purpose of linking the datatype to the property   is to state that the literal in the object position <i>conforms to the lexical   conditions</i> of the datatype. For example, we might wish to 'restrict' the   property <span class="smallcode">ex:age</span> so that it is used only when   applied to numerals, so that </p><p class="smallcode">ex:age rdfs:drange <span class="datatype">xsd:number</span><br>  Jenny ex:age &quot;35&quot; . <br></p><p>has the same meaning as in section 1, but </p><p class="smallcode">ex:age rdfs:drange <span class="datatype">xsd:number</span><br>  Jenny   ex:age &quot;HumptyDumpty&quot; .</p><p>would be flagged as a datatype violation, by virtue of the association of the   datatype with the property. (Note however that this does <i>not</i> assert that   the <span class="smallcode">rdfs:range</span> of the property is the class <span class="datatype">xsd:number</span!
>;   if it did, then <i>any</i> <span class="smallcode">ex:age</span> triple with   a literal subject would be false, even &quot;<span class="smallcode">35</span>&quot;.)</p><p>The usual intention in the second case, however, is to impose a similar condition   on the <i>lexical-to-value mapping</i> used to interpret any lexical form triples   containing the object, so that</p><p class="smallcode">ex:age rdfs:drange <span class="datatype">xsd:number</span><br>  Jimmy ex:age _:x .<br>  _:x rdfs:dlex &quot;35&quot; .</p><p>means that Jimmy's age is the number 35. Here, the datatype is 'projected'   across the bnode to impose an interpretation on <span class="smallcode">rdfs:dlex</span>,   in effect making the lexical form triple have the same content as a datatype   triple.</p>  <p><img src="dtypeimages/DatatypeFigC.png" alt="diagram of effect of rdfs:drange" width="396" height="378" /><br />  Figure 1: <i>Datatype conditions imposed by rdfs:drange.   The 'blunt!
' end of the lexical-to-value map always attaches to the literal.<br />  </i> </p><p>Both of these datatyping restrictions are considered to be part of the meaning   of <span class="smallcode">rdfs:drange,</span> and they comprise its <i>total</i>   meaning. All it does is to associate datatype restrictions to other property   names in these two ways. If the object of an <span class="smallcode">rdfs:drange</span>   triple is not a datatype, then the triple is vacuous, and makes no assertion   at all.</p><p>&nbsp;</p><p>In particular, a <span class="smallcode">rdfs:drange</span> assertion places   no restrictions on the <span class="smallcode">rdfs:range</span> of the property.   Although it would often be natural to consider the range of the property to   be the lexical space of the datatype in the first case, and the value space   of the datatype in the second, this should be asserted separately if the user   wishes to make it explicit.</p><p> We note that th!
is convention uses datatype urirefs both as properties and   as class names. This is quite legal in RDF, and indeed there is a basic assumption   which relates the two uses: <i>the datatype class names the value space of the   datatype</i>, which is the domain of the datatype property (recall that properties   are 'backwards' lexical-to-value maps) ; so the following is true for any datatype   <span class="datatype">ddd</span>:</p><p class="smallcode"><span class="datatype">ddd</span> rdfs:domain <span class="datatype">ddd</span>   .</p><p>To refer to the lexical domain, use <span class="smallcode">rdfs:range</span>   applied to the datatype property. For example, the following two triples would   restrict the <span class="smallcode">rdfs:range</span> of<span class="smallcode">   ex:age</span> to be a subset of the lexical space of the datatype:</p><p class="smallcode"><span class="datatype">xsd:number</span> rdfs:range _:x .<br>  ex:age rdfs:range _:x .</p><p>!
and would therefore be suitable for use with the 'in-line' idiom used in section   1 above; while </p><p class="smallcode">ex:age rdfs:range <span class="datatype">xsd:number</span>   .</p><p>asserts that the range of the property is restricted to the value space of   the datatype, so would be suitable for use with the lexical triple or datatype   triple idioms.However, to reiterate, the same <span class="smallcode">rdfs:drange</span>   assertions would be appropriate in either case. </p><h3></h3><h4>5.1 rdfs:drange is graph-wide in scope, so can produce clashes.</h4><p>These extra datatype interpretations imposed on a property by <span class="smallcode">rdfs:drange</span>   apply to <i>any</i> such usage of the property <i>anywhere</i> in the RDF graph,   so an <span class="smallcode">rdfs:drange</span> assertion has a much wider   'scope' than a datatyping triple, and therefore needs to be used with care.   For example, if several different literals are linke!
d to a single node, then   long-range datatyping can produce a conflict:</p><p class="smallcode">ex:age rdfs:drange <span class="datatype">xsd:number</span>   .</p><p class="smallcode">Jenny ex:age <font color="#FF0000">_:x .<br>  _:x</font> rdfs:dlex &quot;37&quot; .<br>  <font color="#FF0000">_:x</font> rdfs:dlex &quot;29&quot; .</p><p></p><p></p><p>The blank node here is required by the datatype triple to have two distinct   values at the same time. This situation is called a <i>datatype clash</i>, and   is best avoided. </p><p>Similarly, if two different <span class="smallcode">rdfs:drange</span> assertions   are made about the same property, then they both apply to it. If the relevant   datatypes have disjoint lexical spaces, or if their lexical-to-value maps fail   to give the same values to a lexical form, then any use of the property with   a literal is likely to produce a datatype clash. This requires particular care   when merging information from !
different graphs which may have been written with   different, and incompatible, conventions about literal datatyping. </p><h4 class="lilac">5.2 Avoiding datatype clashes</h4><p>Unless you are sure that the datatypes in use will not produce clashes, never   use <span class="
smallcode">rdfs:dlex</span> with two different literals on   the same node. </p><p>One technique to resolve larger-range clashes is to re-label the properties.   Suppose for example that an RDF graph contains </p><p class="smallcode">ex:age rdfs:drange <span class="datatype">xsd:number</span>   .</p><p>and we wish to add some information from another graph which uses a conflicting   datatype convention:</p><p class="smallcode">ex:age rdfs:drange <span class="datatype">xsd:string</span>   .</p><p>To do so, introduce two new property names, say <span class="smallcode">ex:age1</span>   and <span class="smallcode">ex:age2</span>, transcribe all occurrences of <span class="smallcode">ex:age</span>   from one graph into one of these and all occurrences from the other graph into   the other, and then add:</p><p class="smallcode">ex:age1 rdfs:subPropertyOf ex:age .<br>  ex:age2 rdfs:subPropertyOf ex:age .</p><p>This gives</p><p><span class="smallcode">ex:age1 rdfs:d!
range <span class="datatype">xsd:number</span>   .<br>  ex:age2 rdfs:drange </span><span class="datatype">xsd:string</span><span class="smallcode">   .</span></p><p>which does not produce any datatype clashes, retains both particular ways of   imposing meanings on literals - since these restrictions are associated with   the particular property <i>name</i> - and still allows all RDFS conclusions   using the original <span class="smallcode">ex:age</span> property to be drawn   from the information in either of the graphs. This trick works because datatyping   constraints are not inherited 'upwards' through subproperty relationships; similarly,   a superclass of a datatype class need not itself be a datatype class. </p><h3 class="lilac">6. Model theory</h3><p>(We assume that the basic MT has tidy literal nodes and that I(&quot;<span class="smallcode">lll</span>&quot;)   = <span class="smallcode">lll</span> for any literal under any interpretation   I. We don't ne!
ed to mention LV.)</p><p>Suppose I is an RDFS interpretation of a graph E. Then I is <i>datatyped</i>   (with respect to a set D of datatypes) if the following is true for any datatype   uriref <span class="datatype">ddd</span> (with I(<span class="datatype">ddd</span>)   in D):</p><p>(1) IEXT(I(<span class="datatype">ddd</span>)) = {&lt;y,x&gt; : y = L2V(I(<span class="datatype">ddd</span>))(x)   } ie the inverse of the datatype lexical-to-value map.</p><p>(2) ICEXT(I(<span class="datatype">ddd</span>)) = {x : &lt;x,y&gt; in IEXT(I(<span class="datatype">ddd</span>))   } ie the value space of the datatype.</p><p>(3) For any literal <span class="smallcode">lll</span>, if E contains </p><p class="smallcode">aaa rdfs:drange <span class="datatype">ddd</span> .<br>  bbb aaa &quot;lll&quot; .</p><p>then L2V(I(<span class="datatype">ddd</span>))(<span class="smallcode">lll</span>)   is defined, ie <span class="smallcode">lll</span> is in the lexical space of   I(<spa!
n class="datatype">ddd</span>). </p><p>(4) For any literal <span class="smallcode">lll</span>, if E contains </p><p class="smallcode">aaa rdfs:drange <span class="datatype">ddd</span> .<br>  bbb aaa ccc .<br>  ccc rdfs:dlex &quot;lll&quot; .</p><p>then I(<span class="smallcode">ccc</span>) = L2V(I(<span class="datatype">ddd</span>))(<span class="smallcode">lll</span>)   ie the 'dlex' is restricted to have the same meaning as the datatype property. </p><p>We can capture the content of the fourth condition by a special closure rule   which inserts the appropriate datatyping triple, as in the first row of the   following table of closure rules:</p><table width="90%" border="1">  <tr>     <td width="49%">If the graph contains:</td>    <td width="51%">then add the triple:</td>  </tr>  <tr>     <td width="49%" > <span class="smallcode"><br>      aaa rdfs:drange </span><span class="datatype">ddd</span><span class="smallcode">       .<br>      bbb aaa ccc</span> !
.<br>      <span class="smallcode">ccc rdfs:dlex &quot;lll&quot; .</span><br>    </td>    <td width="51%" class="smallcode" >ccc <span class="datatype">ddd</span> &quot;lll&quot;       .</td>  </tr>  <tr>     <td width="49%">       <p><span class="smallcode"><br>        </span></p>    </td>    <td width="51%" class="smallcode"><span class="datatype">ddd</span> rdfs:domain       <span class="datatype">ddd</span> .</td>  </tr></table><p>However, the meaning of the other semantic conditions cannot be fully captured   by closures. </p><p>&nbsp;</p><p>&nbsp;</p></body></html>
Received on Friday, 8 March 2002 10:45:45 UTC