![]() |
WORKING DRAFT |
Resource Description Framework (RDF)
Datatyping SpecificationLast Modified: $Date: 2002/04/03 10:21:09 $
Editors
Pat Hayes, University of West Florida
Sergey Melnik, Stanford University
Patrick Stickler, Nokia Research CenterStatus of this Document
This document is a working draft of the W3C RDF Core Working Group and is subject to change.
1. Introduction
1.1 Related Documents
1.2 Comments on the Examples
2. Desiderata for RDF Datatyping
3. XML Schema: A Foundation for RDF Datatyping
3.1 Datatype Mapping
3.2 Canonical Datatype Mapping
3.3 Datatyped Literal
4. Designation of Datatyped Literals in RDF
4.1 The Datatype Property Idiom
4.2 The Lexical Form Idiom
4.3 The Inline Idiom
4.4 Datatyping Constraints
4.4.1 Datatyped Properties
4.4.2 Datatype Clashes
4.5 RDF Datatyping and RDF Schema
4.5.1 Domain and Range of Datatype Properties
4.5.2 rdfd:range versus rdfs:range
4.5.3 Datatype Classes and rdfs:subClassOf
4.5.4 Datatype Properties and rdfs:subPropertyOf
4.5.5 The Inline Idiom and rdfs:range
5. RDF Datatyping Model Theory
6. Appendices
6.1 Levels of Interpretation
6.1.1 Literal Graph Representation
6.1.2 RDF Model Theory Interpretation
6.1.3 RDF Datatyping Interpretation
6.1.4 Extra-RDF Application Interpretation
6.2 Use Cases
6.2.1 DAML+OIL
6.2.2 CC/PP
6.2.3 Dublin Core
6.2.4 ???
7. References
What is datatyping? ...
How does datatyping relate to RDF? ...
This document is part of a set of specifications which together define RDF...
Reference to RDF Primer ...
Reference to RDF Syntax ...
Reference to RDF Schema ...
Reference to RDF Model Theory ...
For the sake of brevity and clarity, XML entities are used in the examples provided in this specification where URI References occur as attribute values. In addition, local and qualified names are used as node and arc labels in graph illustrations, even though the actual graph will contain complete URI References as labels.
The following RDF/XML 'wrapper' should be assumed for all RDF examples used in this specification:
<?xml version="1.0"?> <!DOCTYPE rdf:RDF [ <!ENTITY rdf "http://www.w3.org/1999/02/22-rdf-syntax-ns#"> <!ENTITY rdfs "http://www.w3.org/2000/01/rdf-schema#"> <!ENTITY rdfd "http://www.w3.org/2002/rdf-datatyping#"> <!ENTITY xsd "http://www.w3.org/2001/XMLSchema#"> <!ENTITY ex "uuid:f82dad84-0a58-11d6-9542-0003931df47c#"> ]> <rdf:RDF xmlns:rdf ="&rdf;" xmlns:rdfs ="&rdfs;" xmlns:rdfd ="&rdfd;" xmlns:xsd ="&xsd;" xmlns:ex ="&ex;"> <!-- example --> </rdf:RDF>
Verbage about desiderada...
Outline motivations and issues shaping the final solution...
Why datatyping and what matters...
The following list summarizes the specific desiderada that were taken into account during the development of this specification:
Backward compatibility:
Forward compatability with OWL
Ability to use predefined XML Schema simple datatypes
Ability to use non-XML-Schema defined datatypes
Ability to represent type information locally for each property value without relying on global RDF schema assertions
Ability to associate type information globally for all values of a given property via RDF schema assertions, which further act as validation constraints in the presence of local datatyping
Co-existence of "global" and "local" datatyping mechanisms without semantic conflict of any kind
Support for datatyping idioms currently in use
Minimal addition, if any, to vocabulary or syntactic machinery of RDF
A model theory for RDF datatyping
It is believed that the approach to datatyping described in this specification satisfies all of the above desiderada.
The conceptual framework for RDF datatyping presented in this specification is based on the type system defined by XML Schema for simple datatypes. This section explains how the relevant terms and concepts defined by XML Schema are expressed using the model-theoretic semantics for RDF defined in the RDF Model Theory. RDF Datatyping does not provide explicit support for XML Schema complex (structured) datatypes.
XML Schema defines a simple datatype as consisting of a) a set of distinct values, called its value space, b) a set of lexical representations, called its lexical space, c) a set of canonical lexical representations, called its canonical lexical space, and d) a set of facets that characterize properties of the value space, individual values or lexical forms. XML Schema implicitly assumes two additional components, which we call a datatype mapping and canonical datatype mapping, to be part of the datatype.
XML Schema datatyping facets are not employed by RDF Datatyping and the specification and interpretation of XML Schema datatype facets is not addressed by this specification.
A datatype mapping is a set of pairs whose first element belongs to the value space of the datatype, and the second element belongs to the lexical space of the datatype.
A datatype mapping satisfies the following properties:
For example, the datatype mapping for the XML Schema simple datatype 'xsd:boolean', where each element of the value space has two lexical representations, is as follows:
Value Space | {T, F} |
---|---|
Lexical Space | {"0", "1", "true", "false"} |
Datatype Mapping | {<T, "true">, <T, "1">, <F, "0">, <F, "false">} |
A canonical lexical space is a subset of members from the lexical space of a datatype such that there is a one-to-one mapping between members of the canonical lexical space and members of the value space.
A canonical datatype mapping is a subset of a datatype mapping that establishes this one-to-one correspondence between members of the canonical lexical space and members of the value space.
A canonical datatype mapping satisfies the following properties:
For example, the canonical datatype mapping for the XML Schema simple datatype 'xsd:boolean', where each member of the value space has a single (canonical) lexical representation, is as follows:
Value Space | {T, F} |
---|---|
Canonical Lexical Space | {"true", "false"} |
Canonical Datatype Mapping | {<T, "true">, <F, "false">} |
A datatyped literal is a pair where the first element is a URI Reference denoting a datatype and the second element is a lexical form (literal). Following from the nature of datatypes as defined above, this pairing of datatype and lexical form unambiguously identifies a specific member of a datatype mapping or canonical datatype mapping, and hence a specific member of the value space of the datatype.
A datatyped literal can be considered a "literal-in-context" where the datatype provides the context for interpretation of the lexical form (literal) to obtain an actual value.
For example, the datatyped literals which can be defined for the XML Schema simple datatype 'xsd:boolean' are as follows:
Datatyped Literal | Member of Datatype Mapping Denoted by Datatyped Literal |
Member of Value Space Denoted by Datatyped Literal |
---|---|---|
<xsd:boolean, "true"> | <T, "true"> | T |
<xsd:boolean, "1"> | <T, "1"> | T |
<xsd:boolean, "false"> | <F, "false"> | F |
<xsd:boolean, "0"> | <F, "0"> | F |
RDF datatyping is primarily concerned with the implicit or explicit designation of datatyped literal pairings. RDF datatyping only provides for the designation of datatyped literals. The internal structure and semantics of all datatypes are opaque to RDF; i.e. membership of value and lexical spaces, datatype mappings, etc. have neither representation nor interpretation in RDF. Actual interpretation of datatyped literals (determination of the actual value denoted by the datatyped literal) is performed externally to RDF by applications which have sufficient knowledge of the particular datatypes in question. RDF datatyping only provides the datatype context within which such interpretation is to take place.
A datatyped literal may be designated in several ways in RDF, according to various idioms. Three such idioms are defined by this specification: one for local (explicit) datatyping and two for global (implicit) datatyping.
The simplest way to talk about the value of a literal under a datatype mapping is to provide a node to denote the value and link that node to the datatype, using the name of the datatype as the property. For example:
<rdf:Description rdf:about="#John"> <ex:age> <rdf:Description> <xsd:integer>25</xsd:integer> </rdf:Description> </ex:age> </rdf:Description> or, the equivalent contracted form <rdf:Description rdf:about="#John"> <ex:age xsd:integer="25"/> </rdf:Description> |
![]() |
says that John's age is the value paired with (represented by) the lexical form (literal) in the datatype mapping defined for the datatype xsd:integer; i.e. that John's age is the number twenty-five.
The datatype property idiom also asserts that the literal object is a member of the lexical space of the datatype. The intuitive reading of the datatype property might be "... can be represented, according to this datatype mapping, by the character string ...". A datatype property statement is valid when the literal is a well-formed lexical form of the datatype, and the subject denotes the value of the lexical form under that datatype's lexical-to-value mapping. E.g.:
<rdf:Description rdf:about="#John"> <ex:age> <rdf:Description> <xsd:integer>pumpkin</xsd:integer> </rdf:Description> </ex:age> </rdf:Description> or, the equivalent contracted form <rdf:Description rdf:about="#John"> <ex:age xsd:integer="pumpkin"/> </rdf:Description> |
![]() |
would always be invalid, no matter what value is assigned to the blank node, as "pumpkin" is not a member of the lexical space of xsd:integer. This is the only way in which an RDF datatyping statement can be contradictory.
It is important to note that RDF cannot itself make such a determination of datatyping validity, but such validation can only be performed by an external application with sufficient knowledge about the particular datatype in question. RDF merely provides means for the designation of the datatyped literal pairings upon which such validation would be performed.
The datatype property idiom is the most 'local' style of literal datatyping in RDF; the interpretation imposed on the subject node by the datatype property is entirely 'inside' the triple. This means for example that the same literal can be used simultaneously in two different such triples, imposing different interpretations on two different nodes.
For example, in addition to the above statements about John's age expressed using the datatype xsd:integer, we could also say
<rdf:Description rdf:about="#Judy"> <ex:payday> <rdf:Description> <xsd:gDay>25</xsd:gDay> </rdf:Description> </ex:payday> </rdf:Description> or, the equivalent contracted form <rdf:Description rdf:about="#Judy"> <ex:payday xsd:gDay="25"/> </rdf:Description> |
![]() |
to assert that Judy recieves her salary on the 25th day of each month, and both uses of the literal "25" can coexist in the same RDF graph without confusion because the datatype context within which the literal is interpreted is distinct.
Although the two property value nodes denote distinct values, the literal itself has the same meaning in both cases - which is simply the 'literal' string. It is the pairing of the lexical form and datatype together (the datatyped literal) which determines the particular value, not the literal itself. The literal itself only ever denotes the string.
Similarly, two different literal representations of the same value could be specified using either the same or even different but compatible datatype properties, all sharing the same subject:
... <rdf:Description> <xsd:integer>5</xsd:integer> <xsd:integer>00005</xsd:integer> <xsd:byte>05</xsd:byte> </rdf:Description> ... |
![]() |
Obviously, this only works when the literals do in fact map to the same value under the respective datatype mappings.
Sometimes one wishes to associate a literal with a value without specifying a particular datatype. RDF Datatyping provides a special property for this kind of underdetermined association, named 'rdfd:lex' (datatype LEXical form). The following
... <rdf:Description> <rdfd:lex>42</rdfd:lex> </rdf:Description> ... |
![]() |
simply asserts that there is a value which can be represented by the lexical form "42" under some possible datatype mapping. This does not in itself 'fix' the value, of course, but it can be used as a way of making the association between the value and a lexical form explicit, for later use or amplification. We will call this a lexical form triple. A useful way to think of the meaning of rdfd:lex is: "can be described by the lexical form".
In RDF, URI References and blank nodes are both considered to be referring expressions; they are used to denote resources. Literals however are best thought of simply as syntactic 'labels' which indicate a lexical form. These lexical forms can be used to restrict the references of other nodes by using datatype schemes, but this use is optional. If a literal is used as a referring expression, it always refers to itself - that is, to a character string - so that
<rdf:Description rdf:about="#Jane"> <ex:age>25</ex:age> </rdf:Description> |
![]() |
states that the value of the property called ex:age for the subject Jane is the two-character string "25". Note that it does not say that the value is the number twenty-five.There is no way to modify the meaning of a literal node.
It is often convenient to associate a datatype with the range of a property, so that every use of the property can be understood as asserting appropriate datatyping conditions about its object. RDFS provides the special property rdfd:range for this purpose.(Read as datatype range ; but do not confuse this with rdfs:range, which has quite a different meaning.)
There are two kind of datatype conditions that one might wish to attach to a property, depending on whether the object of the property is a literal, or a value linked to a literal in a lexical form triple.
In the first case, the usual purpose of linking the datatype to the property is to state that the literal in the object position conforms to the lexical conditions of the datatype. For example, we might wish to 'restrict' the property ex:age so that it is used only when applied to numerals, so that
<rdf:Description rdf:about="&ex;age"> <rdfd:range rdf:resource="&xsd;integer"/> </rdf:Description> <rdf:Description rdf:about="#Jane"> <ex:age>25</ex:age> </rdf:Description> |
![]() |
has the same meaning as previous examples, but
<rdf:Description rdf:about="&ex;age"> <rdfd:range rdf:resource="&xsd;integer"/> </rdf:Description> <rdf:Description rdf:about="#Jane"> <ex:age>Mid-Twenties</ex:age> </rdf:Description> |
![]() |
would be flagged as a datatype violation, by virtue of the association of the datatype with the property. (Note however that this does not assert that the rdfs:range of the property is the class xsd:integer; if it did, then any ex:age triple with a literal subject would be false, even "25".)
The usual intention in the second case, however, is to impose a similar condition on the lexical-to-value mapping used to interpret any lexical form triples containing the object, so that
<rdf:Description rdf:about="&ex;age"> <rdfd:range rdf:resource="&xsd;integer"/> </rdf:Description> <rdf:Description rdf:about="#Judy"> <ex:age> <rdf:Description> <rdfd:lex>25</rdfd:lex> </rdf:Description> </ex:age> </rdf:Description> |
![]() |
means that Judy's age is the number twenty-five. Here, the datatype is 'projected' across the blank node to impose an interpretation on rdfd:lex, in effect making the lexical form idiom have the same interpretation as a datatype property idiom.
Both of these datatyping restrictions are considered to be part of the meaning of rdfd:range, and they comprise its total meaning. All it does is to associate datatype restrictions to other property names in these two ways. If the object of an rdfd:range triple is not a datatype, then the triple is vacuous, and makes no assertion at all.
In particular, a rdfd:range assertion places no restrictions on the rdfs:range of the property. Although it would often be natural to consider the range of the property to be the lexical space of the datatype in the first case, and the value space of the datatype in the second, this should be asserted separately if the user wishes to make it explicit.
These extra datatype interpretations imposed on a property by rdfd:range apply to any such usage of the property anywhere in the RDF graph, so an rdfd:range assertion has a much wider 'scope' than a datatyping triple, and therefore needs to be used with care. For example, if several different literals are linked to a single node, then long-range datatyping can produce a conflict:
<rdf:Description rdf:about="&ex;age"> <rdfd:range rdf:resource="&xsd;integer"/> </rdf:Description> <rdf:Description rdf:about="#Jane"> <ex:age> <rdf:Description> <rdfd:lex>37</rdfd:lex> <rdfd:lex>29</rdfd:lex> </rdf:Description> </ex:age> </rdf:Description> |
![]() |
The property value node here is required by the datatype triple to have two distinct values at the same time. This situation is called a datatype clash, and is best avoided.
Similarly, if two different rdfd:range assertions are made about the same property, then they both apply to it. E.g.
<rdf:Description rdf:about="&ex;age"> <rdfd:range rdf:resource="&xsd;integer"/> </rdf:Description> <rdf:Description rdf:about="&ex;age"> <rdfd:range rdf:resource="&xsd;duration"/> </rdf:Description> |
![]() |
If the relevant datatypes have disjoint lexical spaces, or if their lexical-to-value maps fail to give the same values to a lexical form, then any use of the property with a literal is likely to produce a datatype clash. This requires particular care when merging information from different graphs which may have been written with different, and incompatible, conventions about literal datatyping.
Unless you are sure that the datatypes in use will not produce clashes, never use rdfd:lex with two different literals on the same blank node.
Overview of relationship between RDF Datatyping and RDF Schema...
Datatype properties have exact domains and ranges. The domain of a datatype property corresponds to the value space of the datatype and the range of a datatype property corresponds to the lexical space of the datatype.
Normally in RDF Schema, an assertion about a range:
<rdf:Description rdf:about="#someProperty"> <rdfs:range rdf:resource="#someClass"/> </rdf:Description> |
![]() |
is understood to say that the precise range of someProperty is a subset of the class someClass. This allows RDF Schema to combine multiple range assertions coherently and reflects the fact that the language has no way to express a 'lower bound' on the membership in a class. However, for datatype properties, such an assertion is true only when someClass is the exact range of the property, no more and no less. This exact range is the lexical space of the datatype. Thus, the above range statement asserts that the RDF class someClass is precisely the set of lexical forms that are acceptable to the datatype property someProperty.
Similarly, ... (verbage about domain) ...
<rdf:Description rdf:about="#someProperty"> <rdfs:domain rdf:resource="#someClass"/> </rdf:Description> |
![]() |
Discuss similarities and differences between rdfd:range and rdfs:range -- particularly with regards to validation and genericity...
... We note that this convention uses datatype urirefs both as properties and as class names. This is quite legal in RDF, and indeed there is a basic assumption which relates the two uses: the datatype class names the value space of the datatype, which is the domain of the datatype property (recall that properties are 'backwards' lexical-to-value maps) ; so the following is true for any datatype ddd:
<rdf:Description rdf:about="#ddd"> <rdfs:domain rdf:resource="#ddd"/> </rdf:Description> |
![]() |
To refer to the lexical domain, use rdfs:range applied to the datatype property. For example, the following two triples would restrict the rdfs:range of ex:age to be a subset of the lexical space of the datatype:
<rdf:Description rdf:about="&xsd;integer"> <rdfs:range rdf:resource="#x"/> </rdf:Description> <rdf:Description rdf:about="&ex;age"> <rdfs:range rdf:resource="#x"/> </rdf:Description> |
![]() |
and would therefore be suitable for use with the 'in-line' idiom used in section 1 above; while
<rdf:Description rdf:about="&ex;age"> <rdfs:range rdf:resource="&xsd;integer"/> </rdf:Description> |
![]() |
asserts that the range of the property is restricted to the value space of the datatype, so would be suitable for use with the lexical triple or datatype triple idioms. However, to reiterate, the same rdfd:range assertions would be appropriate in either case.
Discuss subclassing of datatypes, that subclass relations relate only to value spaces, not lexical spaces, etc....
Discuss the special nature of datatyping properties and warn against creating subproperty relations with non-datatype properties...
Discuss the inherent incompatability between the inline idiom and rdfs:range with suggestions of how to address it...
The RDF Model Theory explains the fundamental model-theoretic concepts like interpretation, universe, extension etc. used for interpreting the semantics of RDF graphs. This section assumes familiarity with these basic concepts.
Suppose I is an RDFS interpretation of a graph E. Then I is datatyped (with respect to a set D of datatypes) if the following is true for any datatype URI Reference ddd (with I(ddd) in D):
(1) IEXT(I(ddd)) = {<y,x> : y = L2V(I(ddd))(x) } i.e. the inverse of the datatype (lexical form to value) mapping.
(2) ICEXT(I(ddd)) = {x : <x,y> in IEXT(I(ddd)) } i.e. the value space of the datatype.
(3) For any literal "LLL", if E contains the triples
<aaa, rdfd:range, ddd> <bbb, aaa, "LLL">
then L2V(I(ddd))("LLL") is defined; i.e. "LLL" is in the lexical space of I(ddd).
For any literal "LLL", if E contains the triples
<aaa, rdfd:range, ddd> <bbb, aaa, ccc> <ccc, rdfd:lex, "LLL">
then I(ccc) = L2V(I(ddd))("LLL") i.e. 'rdfd:lex' is restricted to have the same meaning as the datatype property.
We can capture the content of the fourth condition by a special closure rule which inserts the appropriate datatyping assertion:
If the graph contains: then add: <aaa, rdfd:range, ddd>
<bbb, aaa, ccc>
<ccc, rdfd:lex, "LLL"><ccc, ddd, "LLL">
However, the meaning of the other semantic conditions cannot be fully captured by closures.
The following appendices are non-normative...
Discuss the different levels of interpretation on the graph provided by the MT, the datatyping idioms, and datatype aware applications...
...
...
...
...
W3C RDF Core Working Group Charter, Mar 2001, http://www.w3.org/2001/sw/RDFCoreWGCharter
W3C RDF Primer, ??? 2002, http://www.w3.org/???
W3C RDF Syntax, ??? 2002, http://www.w3.org/???
W3C RDF Model Theory, ??? 2002, http://www.w3.org/???
W3C RDF Schema, ??? 2002, http://www.w3.org/???
XML Schema Part 2: Datatypes, ??? 2001, http://www.w3.org/TR/xmlschema-2/
DAML+OIL..., ??? 200?, http://???
CC/PP..., ??? 200?, http://???
OWL..., ??? 200?, http://???