- From: Dan Brickley <danbri@w3.org>
- Date: Fri, 8 Mar 2002 10:45:44 -0500 (EST)
- To: <www-archive@w3.org>
for archival (no images) ---------- Forwarded message ---------- Date: Fri, 8 Mar 2002 16:00:06 GMT From: danbri@fireball.danbri.org To: danbri@w3.org <html><head><title>simple RDF datatype</title><style type="text/css"> body { background-color: white; } p.c3 {font-style: italic} p.c2{font-style: italic; text-align: center} p.c1 {text-align: center} .smallcode {font-family: monospace; font-size: small} .pink {background-color: #FFCCCC} .cream {background-color: #FFFF99 } .lilac {background-color: #FF99FF } .datatype {font-family: monospace; font-size: small; background-color: #99CC99 } </style> <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1"></head><body bgcolor="#FFFFFF" text="#000000"><p>This version of datatyping, a new variation on the one in </p><p>http://www.coginst.uwf.edu/users/phayes/simpledatatype.html</p><p>has the following features:</p><p>Pro:<br> <br> a. Literals always denote themselves (so can be tidy)<br> b. It supports the S-B idiom (using rdfs:drange)<br> c. It allows the use of S-B, local typing using datatype triples, and range datatyping,! in any combination<br> d. It avoids most datatype clashes and provides a technique for resolving the ones that do arise<br> e. Datatype class names denote the value space of the datatype. </p><p>Con:<br> <br> f. It requires the use of rdfs:drange (or something other than rdf:range, anyway)<br> g. Like all these simplified proposals, it doesn't provide any way to 'declare' datatype urirefs, and relies on automagical recognition.</p><p>I've taken out the doublet idiom. This could be put back if the WG wants it, however. </p><p class="lilac">The new stuff is all contained in section 5 and the MT. </p><p>Pat Hayes 2/23/2002</p><h3>-------------------------------------------------------</h3><h3><br> 1. Literals </h3><p>In RDF, urirefs and blank nodes are both considered to be referring expressions; they are used to denote resources. Literals however are best thought of simply as syntactic 'labels' which indicate a lexical form. These lexical for! ms can be used to restrict the references of other nodes by using datatype schemes, but this use is optional. If a literal is used as a referring expression, it always refers to itself - that is, to a character string - so that a triple of the form </p><p class="smallcode">Jenny ex:age "35" .</p><p>states that the value of the property called <span class="smallcode">ex:age</span> on the subject <span class="smallcode">Jenny</span> is the two-character string '<span class="smallcode">35</span>'. Note that it does <i>not</i> say that the value is the number thirty-five.There is no way to modify the meaning of a literal node.</p><p>An example of such 'in-line' use of a literal to denote a string is provided by <span class="smallcode">dc:title</span> in the Dublin Core.</p><h3>2. Datatypes</h3><p>If the intended meaning of literals is understood by a set of users or applications, then the simple use case illustrated by the above examp! le could be sufficient. This 'untyped' kind of usage is always available in RDF. However, RDF also provides ways to use <i>datatypes</i> to assert that a literal should be interpreted in a particular way.</p><p>A datatype is defined abstractly by two domains, one of lexical forms and one of values, and a mapping from lexical forms to values. We assume that a datatype is indicated by a URI, and that some external mechanism is able to access and make use of appropriate representations of the domains and map when supplied with the URI.The model theory is stated in terms of a global function L2V from datatypes to the lexical-to-value mapping of that datatype. In the examples below, urirefs which are being interpreted as datatype names will be indicated by the use of the <span class="datatype">color green</span>.</p><h3>3. Datatype triples</h3><p>The simplest way to talk about the value of a literal under a datatype mapping is to provide a node! to denote the value and link that node to the datatype, using the name of the datatype as the property. This is called a <i>datatype triple</i>. For example</p><p class="smallcode">Jimmy ex:age _:x .<br> _:x <span class="datatype">xsd:number</span> "35" .</p><p>says that Jimmy's age is the value of the literal under the datatype mapping <span class="datatype">xsd:number</span>, i.e. that Jimmy's age is the number 35. (Contrast this with the example in the previous section.) The datatype triple also, incidentally, asserts that the literal itself is in the lexical space of the datatype. For example, </p><p>_:x <span class="datatype">xsd:number</span> "HumptyDumpty" .</p><p>would always be false, no matter what value is assigned to the bnode. This is the only way in which an RDF triple can be contradictory. </p><p>A datatype triple is true when the literal is a well-formed lexical form of the datatype, and the subject denotes ! the value of the lexical form under that datatype's lexical-to-value mapping. The intuitive reading might be "..<i>can be described, according to this datatype mapping, by the character string</i>..".</p><p>(This is 'backwards' from the usual way of thinking about a datatype mapping as applying to the lexical form and resulting in the value; the reason for this is simply the RDF syntactic convention that prohibits literals in subject position.Technically, the RDF datatype property is in fact the <i>inverse</i> of the datatype's lexical-to-value mapping; the lexical-to-value mapping goes 'from' the object of the triple 'to' the subject.)</p><h4> 3.1 Datatype properties are a local constraint on literals.</h4><p>The datatype triple is the most 'local' style of literal datatyping in RDF; the interpretation imposed on the subject node by the datatype property is entirely 'inside' the triple. This means for example that the same literal can! be used simultaneously in two different such triples, imposing different interpretations on two different nodes. For example, if <span class="datatype">ex:octalnumber</span> were a datatype property, then as well as using the literal as a decimal to indicate Jennys age, we could also assert<br> <span class="smallcode"><br> Judy ex:age _:y .<br> _:y </span><span class="datatype">ex:octalnumber</span><span class="smallcode"> "35" .</span></p><p>to assert that Judy's age was 29, and both uses of the literal could be in the same RDF graph. Although the two bnodes _:x and _:y denote distinct values, the literal <i>itself</i> has the same meaning in both cases - the lexical form. </p><p>Similarly, two different literal representations of the same value could be specified using two different datatype triples which include the same subject:</p><p class="smallcode">_:y <span class="datatype">ex:USdecimal</span> "12.25" .<br> ! _:y <span class="datatype">ex:germandecimal</span> "12,25" .</p><p>Obviously, this only works when the literals do in fact map to the same value under the respective mappings. </p><h4>3.2 Datatype properties have exact domains and ranges.</h4><p>We make one additional assumption concerning the use of datatype properties: they have <i>exact</i> domains and ranges. </p><p>Normally in RDFS, an assertion about a range:</p><p class="smallcode">ppp rdfs:range ccc .</p><p>is understood to say that the precise range of ppp is a subset of the class ccc. This allows RDFS to combine multiple range assertions coherently and reflects the fact that the language has no way to express a 'lower bound' on the membership in a class. However, we will assume that for datatype properties, such an assertion is true only when ccc is the exact range of the property, no more and no less. This exact range is the lexical space of the datatype, so:</p><p class="sma! llcode"><span class="datatype">ppp</span> rdfs:range ccc .</p><p>asserts that the class<span class="smallcode"> ccc </span>is precisely the set of lexical forms that are acceptable to the datatype <span class="datatype">ppp</span>. </p><h3>4. Missing datatype information: rdfs:dlex</h3><p>Sometimes one wishes to associate a literal with a value without specifying a particular datatype. RDFS provides a special property for this kind of underdetermined association, called <span class="smallcode">rdfs:dlex</span> (read: Datatype LEXical form). The triple </p><p class="smallcode">_:x rdfs:dlex "37" .</p><p>asserts simply that _:x is a value which can be represented by the character string under some possible datatype mapping. This does not in itself 'fix' the value, of course, but it can be used as a way of making the association between the value and a lexical form explicit, for later use or amplification. We will call this a <i>lexical form</i> triple. A useful way to think of the meaning of <span class="smallcode">rdfs:dlex</span> is: "..<i>can be described by the character string</i>.." </p><p>Notice that since <span class="smallcode">rdfs:dlex</span> is not a datatype, it can be used to l! ink several different literals to the same node:</p><p class="smallcode">_:x rdfs:dlex "37" .<br> _:x rdfs:dlex "29" .</p><p>However, this should be done with caution, as this usage may conflict with the technique described next.</p><h3 class="lilac">5. Attaching datatype constraints to a property: rdfs:drange.</h3><p>It is often convenient to associate a datatype with the range of a property, so that every use of the property can be understood as asserting appropriate datatyping conditions about its object. RDFS provides the special property <span class="smallcode">rdfs:drange</span> for this purpose.(Read as <i>d</i>atatype <i>range</i> ; but do not confuse this with <span class="smallcode">rdfs:range</span>, which has quite a different meaning.) </p><p>There are two kind of datatype conditions that one might wish to attach to a property, depending on whether the object of the property is a literal, or a value linked to a ! literal in a lexical form triple. </p><p>In the first case, the usual purpose of linking the datatype to the property is to state that the literal in the object position <i>conforms to the lexical conditions</i> of the datatype. For example, we might wish to 'restrict' the property <span class="smallcode">ex:age</span> so that it is used only when applied to numerals, so that </p><p class="smallcode">ex:age rdfs:drange <span class="datatype">xsd:number</span><br> Jenny ex:age "35" . <br></p><p>has the same meaning as in section 1, but </p><p class="smallcode">ex:age rdfs:drange <span class="datatype">xsd:number</span><br> Jenny ex:age "HumptyDumpty" .</p><p>would be flagged as a datatype violation, by virtue of the association of the datatype with the property. (Note however that this does <i>not</i> assert that the <span class="smallcode">rdfs:range</span> of the property is the class <span class="datatype">xsd:number</span! >; if it did, then <i>any</i> <span class="smallcode">ex:age</span> triple with a literal subject would be false, even "<span class="smallcode">35</span>".)</p><p>The usual intention in the second case, however, is to impose a similar condition on the <i>lexical-to-value mapping</i> used to interpret any lexical form triples containing the object, so that</p><p class="smallcode">ex:age rdfs:drange <span class="datatype">xsd:number</span><br> Jimmy ex:age _:x .<br> _:x rdfs:dlex "35" .</p><p>means that Jimmy's age is the number 35. Here, the datatype is 'projected' across the bnode to impose an interpretation on <span class="smallcode">rdfs:dlex</span>, in effect making the lexical form triple have the same content as a datatype triple.</p> <p><img src="dtypeimages/DatatypeFigC.png" alt="diagram of effect of rdfs:drange" width="396" height="378" /><br /> Figure 1: <i>Datatype conditions imposed by rdfs:drange. The 'blunt! ' end of the lexical-to-value map always attaches to the literal.<br /> </i> </p><p>Both of these datatyping restrictions are considered to be part of the meaning of <span class="smallcode">rdfs:drange,</span> and they comprise its <i>total</i> meaning. All it does is to associate datatype restrictions to other property names in these two ways. If the object of an <span class="smallcode">rdfs:drange</span> triple is not a datatype, then the triple is vacuous, and makes no assertion at all.</p><p> </p><p>In particular, a <span class="smallcode">rdfs:drange</span> assertion places no restrictions on the <span class="smallcode">rdfs:range</span> of the property. Although it would often be natural to consider the range of the property to be the lexical space of the datatype in the first case, and the value space of the datatype in the second, this should be asserted separately if the user wishes to make it explicit.</p><p> We note that th! is convention uses datatype urirefs both as properties and as class names. This is quite legal in RDF, and indeed there is a basic assumption which relates the two uses: <i>the datatype class names the value space of the datatype</i>, which is the domain of the datatype property (recall that properties are 'backwards' lexical-to-value maps) ; so the following is true for any datatype <span class="datatype">ddd</span>:</p><p class="smallcode"><span class="datatype">ddd</span> rdfs:domain <span class="datatype">ddd</span> .</p><p>To refer to the lexical domain, use <span class="smallcode">rdfs:range</span> applied to the datatype property. For example, the following two triples would restrict the <span class="smallcode">rdfs:range</span> of<span class="smallcode"> ex:age</span> to be a subset of the lexical space of the datatype:</p><p class="smallcode"><span class="datatype">xsd:number</span> rdfs:range _:x .<br> ex:age rdfs:range _:x .</p><p>! and would therefore be suitable for use with the 'in-line' idiom used in section 1 above; while </p><p class="smallcode">ex:age rdfs:range <span class="datatype">xsd:number</span> .</p><p>asserts that the range of the property is restricted to the value space of the datatype, so would be suitable for use with the lexical triple or datatype triple idioms.However, to reiterate, the same <span class="smallcode">rdfs:drange</span> assertions would be appropriate in either case. </p><h3></h3><h4>5.1 rdfs:drange is graph-wide in scope, so can produce clashes.</h4><p>These extra datatype interpretations imposed on a property by <span class="smallcode">rdfs:drange</span> apply to <i>any</i> such usage of the property <i>anywhere</i> in the RDF graph, so an <span class="smallcode">rdfs:drange</span> assertion has a much wider 'scope' than a datatyping triple, and therefore needs to be used with care. For example, if several different literals are linke! d to a single node, then long-range datatyping can produce a conflict:</p><p class="smallcode">ex:age rdfs:drange <span class="datatype">xsd:number</span> .</p><p class="smallcode">Jenny ex:age <font color="#FF0000">_:x .<br> _:x</font> rdfs:dlex "37" .<br> <font color="#FF0000">_:x</font> rdfs:dlex "29" .</p><p></p><p></p><p>The blank node here is required by the datatype triple to have two distinct values at the same time. This situation is called a <i>datatype clash</i>, and is best avoided. </p><p>Similarly, if two different <span class="smallcode">rdfs:drange</span> assertions are made about the same property, then they both apply to it. If the relevant datatypes have disjoint lexical spaces, or if their lexical-to-value maps fail to give the same values to a lexical form, then any use of the property with a literal is likely to produce a datatype clash. This requires particular care when merging information from ! different graphs which may have been written with different, and incompatible, conventions about literal datatyping. </p><h4 class="lilac">5.2 Avoiding datatype clashes</h4><p>Unless you are sure that the datatypes in use will not produce clashes, never use <span class=" smallcode">rdfs:dlex</span> with two different literals on the same node. </p><p>One technique to resolve larger-range clashes is to re-label the properties. Suppose for example that an RDF graph contains </p><p class="smallcode">ex:age rdfs:drange <span class="datatype">xsd:number</span> .</p><p>and we wish to add some information from another graph which uses a conflicting datatype convention:</p><p class="smallcode">ex:age rdfs:drange <span class="datatype">xsd:string</span> .</p><p>To do so, introduce two new property names, say <span class="smallcode">ex:age1</span> and <span class="smallcode">ex:age2</span>, transcribe all occurrences of <span class="smallcode">ex:age</span> from one graph into one of these and all occurrences from the other graph into the other, and then add:</p><p class="smallcode">ex:age1 rdfs:subPropertyOf ex:age .<br> ex:age2 rdfs:subPropertyOf ex:age .</p><p>This gives</p><p><span class="smallcode">ex:age1 rdfs:d! range <span class="datatype">xsd:number</span> .<br> ex:age2 rdfs:drange </span><span class="datatype">xsd:string</span><span class="smallcode"> .</span></p><p>which does not produce any datatype clashes, retains both particular ways of imposing meanings on literals - since these restrictions are associated with the particular property <i>name</i> - and still allows all RDFS conclusions using the original <span class="smallcode">ex:age</span> property to be drawn from the information in either of the graphs. This trick works because datatyping constraints are not inherited 'upwards' through subproperty relationships; similarly, a superclass of a datatype class need not itself be a datatype class. </p><h3 class="lilac">6. Model theory</h3><p>(We assume that the basic MT has tidy literal nodes and that I("<span class="smallcode">lll</span>") = <span class="smallcode">lll</span> for any literal under any interpretation I. We don't ne! ed to mention LV.)</p><p>Suppose I is an RDFS interpretation of a graph E. Then I is <i>datatyped</i> (with respect to a set D of datatypes) if the following is true for any datatype uriref <span class="datatype">ddd</span> (with I(<span class="datatype">ddd</span>) in D):</p><p>(1) IEXT(I(<span class="datatype">ddd</span>)) = {<y,x> : y = L2V(I(<span class="datatype">ddd</span>))(x) } ie the inverse of the datatype lexical-to-value map.</p><p>(2) ICEXT(I(<span class="datatype">ddd</span>)) = {x : <x,y> in IEXT(I(<span class="datatype">ddd</span>)) } ie the value space of the datatype.</p><p>(3) For any literal <span class="smallcode">lll</span>, if E contains </p><p class="smallcode">aaa rdfs:drange <span class="datatype">ddd</span> .<br> bbb aaa "lll" .</p><p>then L2V(I(<span class="datatype">ddd</span>))(<span class="smallcode">lll</span>) is defined, ie <span class="smallcode">lll</span> is in the lexical space of I(<spa! n class="datatype">ddd</span>). </p><p>(4) For any literal <span class="smallcode">lll</span>, if E contains </p><p class="smallcode">aaa rdfs:drange <span class="datatype">ddd</span> .<br> bbb aaa ccc .<br> ccc rdfs:dlex "lll" .</p><p>then I(<span class="smallcode">ccc</span>) = L2V(I(<span class="datatype">ddd</span>))(<span class="smallcode">lll</span>) ie the 'dlex' is restricted to have the same meaning as the datatype property. </p><p>We can capture the content of the fourth condition by a special closure rule which inserts the appropriate datatyping triple, as in the first row of the following table of closure rules:</p><table width="90%" border="1"> <tr> <td width="49%">If the graph contains:</td> <td width="51%">then add the triple:</td> </tr> <tr> <td width="49%" > <span class="smallcode"><br> aaa rdfs:drange </span><span class="datatype">ddd</span><span class="smallcode"> .<br> bbb aaa ccc</span> ! .<br> <span class="smallcode">ccc rdfs:dlex "lll" .</span><br> </td> <td width="51%" class="smallcode" >ccc <span class="datatype">ddd</span> "lll" .</td> </tr> <tr> <td width="49%"> <p><span class="smallcode"><br> </span></p> </td> <td width="51%" class="smallcode"><span class="datatype">ddd</span> rdfs:domain <span class="datatype">ddd</span> .</td> </tr></table><p>However, the meaning of the other semantic conditions cannot be fully captured by closures. </p><p> </p><p> </p></body></html>
Received on Friday, 8 March 2002 10:45:45 UTC