W3C

RDF Datatyping

W3C Working Draft [Last Modified: $Date: 2002/04/16 13:24:06 $]

This version:
http://www-nrc.nokia.com/sw/rdf-datatyping.html
Latest version:
http://www-nrc.nokia.com/sw/rdf-datatyping.html
Previous version:
None.
Editors:
Pat Hayes, University of West Florida, phayes@ai.uwf.edu
Sergey Melnik, Stanford University, melnik@db.stanford.edu
Patrick Stickler, Nokia Research Center, patrick.stickler@nokia.com

Abstract

The Resource Description Framework (RDF) is a general-purpose language for representing information in the World Wide Web, with a particular goal of the reliable and meaningful exchange of data between applications. RDF uses URI-references and literal strings to denote the things described. The interpretation of URIs is addressed by the RDF Model Theory. This document describes a framework for the use and interpretation of literal strings in RDF, by reference to a concept of well known "datatypes" which map strings to values, and builds upon the semantic framework described in the RDF Model Theory.

Status of this Document

This is a W3C RDF Core Working Group Working Draft produced as part of the W3C Semantic Web Activity. This document incorporates decisions made by the Working Group designed to provide the reader the basic fundamentals required to effectively use datatyping with RDF in their particular applications.

This document is being released for review by W3C members and other interested parties to encourage feedback and comments. This is the current state of an ongoing work on the RDF Datatyping specification.

This is a draft document and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use it as reference material or to cite as other than "work in progress". A list of current W3C Recommendations and other technical documents can be found at http://www.w3.org/TR/.

Comments on this document are invited and should be sent to the public mailing list www-rdf-comments@w3.org. An archive of comments is available at http://lists.w3.org/Archives/Public/www-rdf-comments/.

Table of Contents

1 Introduction
  1.1 What is Datatyping?
  1.2 Desiderata for RDF Datatyping
  1.3 Related Documents
  1.4 Comments on the Examples
  1.5 Comments on the Structure of RDF Literals
2 RDF Datatypes
  2.1 XML Schema: A Foundation for RDF Datatypes
    2.1.1 rdfd:Datatype
  2.2 Datatype Mapping
  2.3 Canonical Datatype Mapping
  2.4 Datatyped Literal
3 Designation of Datatyped Literals in RDF
  3.1 The Datatype Property Idiom
  3.2 The Lexical Form Idiom
    3.2.1 rdfd:lex
  3.3 The Inline Idiom
  3.4 Datatyping Constraints and Datatyped Properties
    3.4.1 rdfd:datatype
    3.4.2 Datatype Clashes
4 RDF Datatyping and RDF Schema
  4.1 Class Extension of rdfd:Datatype
  4.2 Domain and Range of Datatype Properties
  4.3 rdfd:datatype versus rdfs:range
  4.4 Datatype Classes and rdfs:subClassOf
  4.5 Datatype Properties and rdfs:subPropertyOf
5 RDF Datatyping Model Theory
  5.1 Closure Rules
6 RDF Schema for Datatyping
7 Appendices
  7.1 Levels of Interpretation
    7.1.1 Literal Graph Representation
    7.1.2 RDF Model Theory Interpretation
    7.1.3 RDF Datatyping Interpretation
    7.1.4 Extra-RDF Application Interpretation
  7.2 Use Cases
    7.2.1 DAML+OIL
    7.2.2 CC/PP
    7.2.3 Dublin Core
    7.2.4 ???
  7.3 RDF Datatyping and Complex (Structured) XML Datatypes
8 References
9 Acknowledgments


1 Introduction

The Resource Description Framework (RDF) is a general-purpose language for representing information in the World Wide Web. It is particularly intended for representing metadata about Web resources, such as the title, author, and modification date of a Web page, the copyright and syndication information about a Web document, the availability schedule for some shared resource, or the description of a Web user's preferences for information delivery. However, by generalizing the concept of a "Web resource", RDF can be used to represent information about anything that can be identified on the Web, such as information about items available from online shopping facilities (e.g., information about prices, publishers, and availability of books or recordings).

RDF provides a common framework for expressing this information in such a way that it can be exchanged between applications without loss of meaning. Since it is a common framework, application designers can leverage the availability of common RDF parsers and processing tools. Exchanging information between different applications means that the information may be made available to applications other than those for which it was originally created.

The utility and reliability of information exchanged between applications typically requires that datatyping information be unambiguous and that the interpretation of datatyped values, which may have local representations that differ from system to system, be consistent between disparate applications. Achieving consistency in the exchange and interpretation of such datatyped information requires a well defined and standardized methodology for expressing and interpreting datatyping information.

This document defines a particular methodology for expressing datatyped information in RDF and aims to provide the reader the basic fundamentals required to effectively use datatypes and datatyped values with RDF in their particular applications.

1.1 What is Datatyping?

[informal definition of datatyping]

[common datatyping scenarios, where datatyping is needed]

Due to RDF's role as a means of interchange between disparate systems, and in order to achieve portability and independence of platform it is necessary to forgoe any native representation of values or native datatypes in RDF itself. This means that RDF has no built-in knowlede about particular datatypes such as strings or integers, and the lexical representation of a given value, such as the number twenty-five "25", has no native interpretation in RDF. RDF is datatype neutral in the same manner as it is vocabulary neutral. The specific semantics for individual datatypes must reside in the application layers above RDF.

In RDF Datatyping, literals are taken to represent the lexical representations (lexical forms) of datatype values and their datatype interpretation is based on an association of the literal with a datatype context. In some cases, the datatype value represented by a given lexical form is also denoted explicitly by a blank node in the graph.

The nature of datatypes, the means by which literals are associated with datatype contexts, and the interpretation of datatyped literals are the focus of this document.

1.2 Desiderata for RDF Datatyping

[introductory verbage about desiderada]

The following list summarizes the specific desiderada that were taken into account during the development of this specification:

It is believed that the methodology for datatyping described in this specification satisfies all of the above desiderada.

1.3 Related Documents

The complete specification of RDF consists of a number of documents:

This document is intended to augment the other parts of the RDF specification, to help information producers, system designers and application developers understand how datatypes and datatyping can be used with RDF.

1.4 Comments on the Examples

For the sake of brevity and clarity, XML entities (e.g. &rdf;) are used in the examples provided in this specification where URI References occur as attribute values. In addition, local and qualified names are used as node and arc labels in graph illustrations, even though the actual graph will contain complete URI-references as labels.

The following RDF/XML 'wrapper' should be assumed for all RDF examples used in this specification:

<?xml version="1.0"?>
<!DOCTYPE rdf:RDF [
  <!ENTITY rdf  "http://www.w3.org/1999/02/22-rdf-syntax-ns#">
  <!ENTITY rdfs "http://www.w3.org/2000/01/rdf-schema#">
  <!ENTITY xsd  "http://www.w3.org/2001/XMLSchema#">
  <!ENTITY rdfd "http://www.w3.org/2002/rdf-datatyping#">
  <!ENTITY ex   "http://www.w3.org/2002/rdf-datatyping/examples#">
]>

<rdf:RDF xmlns:rdf  ="&rdf;"
         xmlns:rdfs ="&rdfs;"
         xmlns:xsd  ="&xsd;"
         xmlns:rdfd ="&rdfd;"
         xmlns:ex   ="&ex;">

   <!-- example -->

</rdf:RDF>

1.5 Comments on the Structure of RDF Literals

[RDF literals are structured objects consisting of a triple; the first member corresponding to a single bit indicating whether the literal is structured XML content, the second member corresponding to a string (the content of the literal), and the third member corresponding to an xml:lang value (optional).]

[refs to syntax/primer/etc]

[the structure of the literal is transparent with regards to RDF Datatyping and that all that is seen is the actual string portion -- the parseType bit and xml:lang (if present) are fully irrelevant to RDF Datatyping and the specification pretends that they don't exist]

[this treatment is in-line with XML Schema's views on xml:lang as well, which actually forbids datatype values from being qualified by xml:lang. RDF Datatyping allows it, but ignores it]

2 RDF Datatypes

2.1 XML Schema: A Foundation for RDF Datatypes

The conceptual framework for RDF Datatyping presented in this specification is based on the type system defined by XML Schema for simple datatypes. A datatype defines a mapping from literal strings to corresponding values. XML schema defines a number of datatypes that are usefully employed with RDF (numbers, dates, etc.). The RDF datatyping framework supports the use of these datatypes, and any additional ones that may be defined.

Note that RDF Datatyping does not address the use of XML Schema complex (structured) datatypes in an RDF context, though see the appendices for some suggestions.

2.1.1 rdfd:Datatype

Adopting the core XML Schema definition of simple datatypes, RDF Datatyping defines an rdfd:Datatype as consisting of

  1. a set of distinct values, called its value space
  2. a set of lexical representations or forms, called its lexical space
  3. a set of canonical lexical representations which is a subset of its lexical space, called its canonical lexical space
  4. We further include two two additional components (assumed by XML Schema) as part of an rdfd:Datatype, which we call

  5. a datatype mapping
  6. a canonical datatype mapping

In addition to having the characteristics defined above, an rdfd:Datatype may also serve as a property which joins a literal node object which is a member of the lexical space of that datatype (a lexical form) to a non-literal node subject which denotes the single member of the value space of that datatype (a datatype value) which is represented by the lexical form.

2.2 Datatype Mapping

A datatype mapping is a set of pairs whose first element belongs to the lexical space of the datatype, and the second element belongs to the value space of the datatype.

A datatype mapping satisfies the following properties:

  1. Each member of the lexical space maps to exactly one member of the value space.
  2. Each member of the value space has at least one lexical representation.

For example, the datatype mapping for the XML Schema simple datatype 'xsd:boolean', where each member of the value space (represented here as 'T' and 'F') has two lexical representations, is as follows:

Value Space {T, F}
Lexical Space {"0", "1", "true", "false"}
Datatype Mapping {<"true", T>, <"1", T>, <"0", F>, <"false", F>}

2.3 Canonical Datatype Mapping

A canonical lexical space is a subset of members from the lexical space of a datatype such that there is a one-to-one mapping between members of the canonical lexical space and members of the value space.

A canonical datatype mapping is a subset of a datatype mapping that establishes this one-to-one correspondence between members of the canonical lexical space and members of the value space.

A canonical datatype mapping satisfies the following properties:

  1. Each member of the canonical lexical space maps to exactly one member of the value space.
  2. Each member of the value space has exactly one canonical lexical representation.

For example, the canonical datatype mapping for the XML Schema simple datatype 'xsd:boolean', where each member of the value space has a single (canonical) lexical representation, is as follows:

Value Space {T, F}
Canonical Lexical Space {"true", "false"}
Canonical Datatype Mapping {<"true", T>, <"false", F>}

[add verbage regarding utility of canonical mappings]

2.4 Datatyped Literal

A datatyped literal is a pair where the first element is a URI-reference denoting a datatype and the second element is a lexical form (literal). Following from the nature of datatypes as defined above, this pairing of datatype and lexical form unambiguously identifies a specific member of a datatype mapping or canonical datatype mapping, and hence a specific member of the value space of the datatype.

A datatyped literal can be considered a "literal-in-context" where the datatype provides the context for interpretation of the lexical form (literal) to obtain an actual value.

For example, the datatyped literals which can be defined for the XML Schema simple datatype 'xsd:boolean' are as follows:

Datatyped Literal Member of Datatype Mapping
Identified by Datatyped Literal
Member of Value Space
Identified by Datatyped Literal
<xsd:boolean, "true"> <"true", T> T
<xsd:boolean, "1"> <"1", T> T
<xsd:boolean, "false"> <"false", F> F
<xsd:boolean, "0"> <"0", F> F

RDF Datatyping is primarily concerned with the implicit or explicit designation of datatyped literal pairings.

RDF Datatyping only provides for the designation of datatyped literals. The internal structure and semantics of all datatypes are opaque to RDF; i.e. membership of value and lexical spaces, datatype mappings, etc. have neither representation nor interpretation in RDF. Actual interpretation of datatyped literals (determination of the actual value identified by the datatyped literal) is performed externally to RDF by applications which have sufficient knowledge of the particular datatypes in question. RDF Datatyping only provides the datatype context within which such interpretation is to take place.

3 Designation of Datatyped Literals in RDF

A datatyped literal may be designated in several ways in RDF, according to various idioms. Three such idioms are defined by this specification: one for local (explicit) datatyping and two for global (implicit) datatyping. Local datatyping associates a datatype with each individual property value explicitly. Global datatyping leaves the datatype of the property value implicit (in that the datatype of the property value itself is not individually specified) and relies on the datatype context to be defined for the property value elsewhere in the graph, by associating the datatype with the property itself rather than the property value.

3.1 The Datatype Property Idiom

In addition to denoting a datatype, an rdfd:Datatype may also serve as a property which associates a lexical form (literal) with the denotation of a datatype value. Thus

<rdf:Description rdf:about="#John">
   <ex:age>
      <rdf:Description>
         <xsd:integer>25</xsd:integer>
      </rdf:Description>
   </ex:age>
</rdf:Description>

or, the equivalent contracted form

<rdf:Description rdf:about="#John">
   <ex:age xsd:integer="25"/>
</rdf:Description>
RDF Graph

says that John's age is the member of the value space of xsd:integer which is represented by the lexical form "25". And from what we know about the datatype xsd:integer, we then know that John's age is the value twenty-five.

The datatype property idiom defines the datatype context within which the lexical form is to be interpreted and by which the value of the subject node is determined. Given the defined characteristics of datatypes, this datatype context serves to restrict the literal object of the datatype property to be a member of the lexical space of the datatype. The intuitive reading of the datatype property might be "{subject} can be represented, according to this datatype's lexical to value mapping, by the lexical form {literal}". A datatype property statement is valid when the literal object is a member of the lexical space (a lexical form) of the datatype, and the subject denotes the member of the value space of the datatype represented by the lexical form. Thus

<rdf:Description rdf:about="#John">
   <ex:age>
      <rdf:Description>
         <xsd:integer>pumpkin</xsd:integer>
      </rdf:Description>
   </ex:age>
</rdf:Description>

or, the equivalent contracted form

<rdf:Description rdf:about="#John">
   <ex:age xsd:integer="pumpkin"/>
</rdf:Description>
RDF Graph

would always be invalid, no matter what value is assigned to the subject node, as "pumpkin" is not a member of the lexical space of xsd:integer. This is the only way in which an RDF Datatyping statement can be contradictory.

It is important to note that RDF cannot itself make such a determination of datatyping validity, but such validation can only be performed by an external application with sufficient knowledge about the particular datatype in question. RDF merely provides means for the designation of the datatyped literal pairings upon which such validation would be performed.

Because the datatype property idiom is local and explicit, the interpretation imposed on the subject node by the datatype property is restricted entirely to the particular statement. This means that the same literal can be used simultaneously in numerous datatype property statements, imposing a different interpretation on each separate value node.

For example, in addition to the above statements about John's age expressed using the datatype xsd:integer, we could also say

<rdf:Description rdf:about="#Judy">
   <ex:payday>
      <rdf:Description>
         <xsd:gDay>25</xsd:gDay>
      </rdf:Description>
   </ex:payday>
</rdf:Description>

or, the equivalent contracted form

<rdf:Description rdf:about="#Judy">
   <ex:payday xsd:gDay="25"/>
</rdf:Description>
RDF Graph

to assert that Judy recieves her salary on the 25th day of each month, and both uses of the literal "25" can coexist in the same RDF graph without confusion because the datatype context within which the literal is interpreted is distinct for each case.

Although the two property value nodes denote distinct values, the literal itself has the same meaning in both cases; which is simply the 'literal' string "25". It is only within the context of a datatype, as defined by the datatyping idiom, that a complete datatyping interpretation is obtained. It is the pairing of the lexical form and datatype together (the datatyped literal) which determines the particular value, not the literal itself. The literal itself only ever denotes the string.

In more formal terms: URI-references and blank nodes are both considered to be referring expressions; they are used to denote resources. Literals however are best thought of simply as syntactic 'labels' which indicate a lexical form. These lexical forms can be used to restrict the references of other nodes by using datatype schemes, but this use is optional. If a literal is used as a referring expression, it always refers to itself - that is, to a character string.

Similarly, two different literal representations of the same value could be specified using either the same or even different but compatible datatype properties, all sharing the same subject:

...
   <rdf:Description>
      <xsd:integer>5</xsd:integer>
      <xsd:integer>00005</xsd:integer>
      <xsd:byte>05</xsd:byte>
   </rdf:Description>
...
RDF Graph

Obviously, this is only valid when the literals do in fact map to the same value under the respective datatype mappings.

3.2 The Lexical Form Idiom

One may wish to associate a literal (lexical form) with a value without specifying a particular datatype locally, leaving the datatype context implicit; to be specified either elsewhere in the RDF graph, or left as understood by a particular application. RDF Datatyping provides a special property for this kind of association, named rdfd:lex (datatype LEXical form).

3.2.1 rdfd:lex

The rdfd:lex property associates a literal node object which is presumed to be a member of the lexical space of some (possibly unspecified) datatype (a lexical form) with a non-literal node subject denoting the single member of the value space of the same datatype as the lexical form and which is represented by that lexical form.

The following

...
   <rdf:Description>
      <rdfd:lex>42</rdfd:lex>
   </rdf:Description>
...
RDF Graph

simply asserts that there is some datatype value which can be represented by the lexical form "42" under some possible datatype mapping. This does not in itself 'fix' the value, of course, but it can be used as a way of making the association between the value and a lexical form explicit, for later interpretation within an as yet unspecified datatype context. A useful way to think of the meaning of rdfd:lex is: "{subject} can be represented by the lexical form {literal}".

3.3 The Inline Idiom

If one does not require or wish to have any explicit denotation of a datatype value in the RDF graph, one may simply define a property value to be a literal node which is presumed to correspond to a member of the lexical space of some datatype. This is called the 'inline' idiom, and is similar to the lexical form idiom in that it leaves the datatype context implicit. It differs from the lexical form idiom in that it provides no explicit denotation of the value whereas in the lexical form idiom the blank node denotes the actual datatype value. Thus

<rdf:Description rdf:about="#Jane">
   <ex:age>25</ex:age>
</rdf:Description>
RDF Graph

states that Jane's age is some value (which has no denotation in the graph) which is represented by the lexical form "25". Note that the literal node "25" does not represent the actual datatype value. A literal node always denotes itself, and there is no way to modify the meaning of a literal node.

3.4 Datatyping Constraints and Datatyped Properties

3.4.1 rdfd:datatype

It is often convenient to associate a datatype with a property, so that every use of the property can be understood as asserting particular datatyping characteristics about its value. Also, in the case of the implicit inline and lexical form idioms, one must have a mechanism for specifying the datatype context within which they are to be interpreted. RDF Datatyping defines the special property rdfd:datatype for this purpose.

The rdfd:datatype property associates a datatype with a particular property. This associated datatype serves to constrain (by only providing valid interpretations for) all values of the property to correspond to one of the three idioms defined above.

Note: The constraints imposed by a datatype context asserted by the rdfd:datatype property are not to be confused with those of rdfs:range, which have a different meaning (see below).

For example, we may wish to constrain the property ex:age so that its use and interpretation is bound to numerals as defined by the datatype xsd:integer:

<rdf:Description rdf:about="&ex;age">
   <rdfd:datatype rdf:resource="&xsd;integer"/>
</rdf:Description>

<rdf:Description rdf:about="#Jane">
   <ex:age>25</ex:age>
</rdf:Description>
RDF Graph

Thus, the datatype context within which "25" is interpreted is xsd:integer, and "25" is required to be a valid member of the lexical space of xsd:integer. The rdfd:datatype assertion and the literal node together constitute the datatyped literal pairing
<xsd:integer,"25"> which represents the number twenty-five. Note, however, that the actual value twenty-five has no explicit denotation in the graph when using the inline idiom, unlike the datatype property and lexical form idioms.

An rdfd:datatype assertion fixes the datatype context of a property to the specified datatype, and that datatype context governs the interpretation of all values of the property conforming to one of the three defined datatyping idioms. Property values which do not conform to one of the three defined idioms or do not satisfy the constraints of the datatype context are invalid and have no datatyping interpretation.

Thus, the rdfd:datatype assertion both provides information necessary for the proper interpretation of the implicit idiom as well as (indirectly) constrains the valid set of literals to the lexical space of the specified datatype.

This last point is illustrated by

<rdf:Description rdf:about="&ex;age">
   <rdfd:datatype rdf:resource="&xsd;integer"/>
</rdf:Description>

<rdf:Description rdf:about="#Jane">
   <ex:age>Mid-Twenties</ex:age>
</rdf:Description>
RDF Graph

which constitutes a datatype violation, because the datatype context asserted by rdfd:datatype restricts the set of valid literal values to the lexical space of the particular datatype, and the literal "Mid-Twenties" is not a member of the lexical space of xsd:integer.

It is important to point out that only an extra-RDF application with complete knowledge about the datatype in question would be able to detect such a datatype violation. Datatypes are fully opaque to RDF and neither RDF nor RDF Schema provide generic means for datatype validation. RDF Datatyping provides mechanisms for the expression of datatyped literal pairings by specific idioms which have a well defined representation and interpretation, but cannot determine the validity of individual pairings directly.

In a similar manner as for the inline idiom, an rdfd:datatype assertion also provides the datatyping context for the interpretation of the lexical form idiom:

<rdf:Description rdf:about="&ex;age">
   <rdfd:datatype rdf:resource="&xsd;integer"/>
</rdf:Description>

<rdf:Description rdf:about="#Judy">
   <ex:age>
      <rdf:Description>
         <rdfd:lex>25</rdfd:lex>
      </rdf:Description>
   </ex:age>
</rdf:Description>
RDF Graph

As in the case of the inline idiom above, the datatype context within which "25" is interpreted is the datatype xsd:integer, and likewise "25" is required to be a valid member of the lexical space of xsd:integer. Again, the rdfd:datatype assertion and the literal node together constitute the datatyped literal pairing <xsd:integer,"25"> which represents the number twenty-five.

Furthermore, and just as in the case of the datatyping property idiom, the blank node which is the object of the ex:age property in this case is interpreted as denoting the particular datatype value; i.e. twenty-five. Thus, in the presence of an rdfd:datatype assertion, the lexical form idiom has the same interpretation as the datatype property idiom.

3.4.2 Datatype Clashes

The datatype interpretations imposed on a property by rdfd:datatype apply to any such usage of the property anywhere in the RDF graph, so an rdfd:datatype assertion has a global scope, and therefore needs to be used with care. For example, if both implicit and explicit datatyping is employed for the same property, then a globally asserted datatype context can produce a conflict with a locally asserted datatype context:

<rdf:Description rdf:about="&ex;age">
   <rdfd:datatype rdf:resource="&xsd;integer"/>
</rdf:Description>

<rdf:Description rdf:about="#Judy">
   <ex:age>
      <rdf:Description>
         <rdfd:lex>25</rdfd:lex>
      </rdf:Description>
   </ex:age>
</rdf:Description>

<rdf:Description rdf:about="#Jane">
   <ex:age>
      <rdf:Description>
         <xsd:string>Mid-Twenties</xsd:string>
      </rdf:Description>
   </ex:age>
</rdf:Description>
RDF Graph

Here, the global datatype xsd:integer is asserted for all uses of the property ex:age, and while the value for Jane's age satisfies the constraints of the xsd:integer datatype, there is a conflict with the definition of Judy's age in that while the local datatyping context of xsd:string is valid, the lexical form "Mid-Twenties" conflicts with the globally asserted datatype context for xsd:integer. Thus, care must be taken when asserting global datatype contexts to ensure that such clashes do not arise, or to at least be aware of the potential for such datatype clashes.

Another source of datatype clash is when merging two graphs which have differing global assertions regarding the datatype contexts of a given property. Thus, given

From graph 1:

<rdf:Description rdf:about="&ex;age">
   <rdfd:datatype rdf:resource="&xsd;integer"/>
</rdf:Description>

From graph 2:

<rdf:Description rdf:about="&ex;age">
   <rdfd:datatype rdf:resource="&xsd;duration"/>
</rdf:Description>
RDF Graph

if the lexical spaces of the datatypes are disjunct, or only partially intersect, then some or all of the possible lexical forms will fail to satisfy the constraints of at least one of the datatypes specified. Even if the different datatypes have identicial lexical spaces, there is no garuntee that they will share the same lexical to value mappings and thus erroneous interpretations could arise. Thus, care should be taken when merging graphs containing implicit idioms and having different, and possibly incompatible, global rdfd:datatype assertions.

4 RDF Datatyping and RDF Schema

While both RDF Schema and RDF Datatyping are defined as layers atop the core RDF foundation, the semantics and purpose of RDF Schema and RDF Datatyping are fundamentally distinct and either may be used independently of the other. There are, however, some notable points of interaction between them. Certain aspects of the RDF Datatyping vocabulary may be conveniently captured by RDF Schema semantics, as reflected in this specification. From certain forms of RDF Datatyping knowledge, one may derive a certain amount of RDF Schema knowledge (though the inverse is not true). Likewise, certain RDF Schema range constraints can restrict the RDF Datatyping idioms that may be used. The relationships between RDF, RDF Schema and RDF Datatyping -- with the partial interaction between the latter two -- may be depicted as follows:

This section provides an overview of the potential points of interaction between RDF Datatyping and RDF Schema.

4.1 Class Extension of rdfd:Datatype

Although an rdfd:Datatype denotes the complete semantics of a given datatype insofar as RDF Datatyping is concerned, the RDF Schema class extension of an rdfd:Datatype is only the value space of the datatype. This means that the semantics of RDF Schema only applies to datatype values and not to lexical forms (literals) or to any lexical to value mapping defined for a given datatype. The implications of this distinction are significant, and are discussed in greater detail below.

4.2 Domain and Range of Datatype Properties

rdfd:Datatype properties have exact domains and ranges. The domain of a datatype property corresponds to the value space of the datatype and the range of a datatype property corresponds to the lexical space of the datatype. Given that the class extension of an rdfd:Datatype is its value space, and that an rdfd:Datatype property takes a literal node as its object, we can capture the domain and (partially) the range of the datatype xsd:integer as follows:

<rdf:Description rdf:about="&xsd;integer">
   <rdfs:domain rdf:resource="&xsd;integer"/>
   <rdfs:range rdf:resource="&rdfs;Literal"/>
</rdf:Description>
RDF Graph

and in fact, such statements for every rdfd:Datatype are captured automatically in the closure rules provided by the RDF Datatyping model theory. Note that the closest we can get to defining the range of a given rdfd:Datatype, insofar as RDF Schema is concerned, is to constrain it to the set of literals. This is because the semantics of a given datatype is opaque to both RDF and RDF Schema and thus the set of valid lexical forms for a given datatype cannot be captured explicitly by RDF Schema mechanisms. Whether a given literal is a valid lexical form for a given datatype must be determined by an external application with full knowledge of the datatype in question. All an RDF Schema validator can determine is whether the object of an rdfd:Datatype property is a literal node.

4.3 rdfd:datatype versus rdfs:range

While both rdfd:datatype and rdfs:range serve to constrain the set of valid values a given property may have, they do so in different ways, and in different levels or scopes of interpretation. An rdfd:datatype assertion does not entail any rdfs:range assertion. If it did, and because the class extension of an rdfd:Datatype is the value space of the datatype, then any statement with a literal object would be invalid (i.e. any occurrence of the inline idiom) even if the literal is a member of the lexical space of the datatype in question, since a literal node does not and cannot denote a datatype value and is not a member of the value space of the datatype.

It is, however, possible to use rdfs:range to restrict the datatyping idioms used to be either the datatype property or lexical form idiom (having a blank node denoting the datatype value) or the inline idiom (having no blank node and no denotation of the datatype value. Thus, the following statements

<rdf:Description rdf:about="&ex;age">
   <rdfd:datatype rdf:resource="&xsd;integer"/>
   <rdfs:range rdf:resource="&xsd;integer"/>
</rdf:Description>
RDF Graph

restrict the datatyping idioms used to be either the datatype property idiom or the lexical form idiom, where the blank node occurring as the property value of ex:age denotes a member of the value space of xsd:integer and its rdf:type is explicit by virtue of the rdfs:range assertion.

In contrast, the following

<rdf:Description rdf:about="&ex;age">
   <rdfd:datatype rdf:resource="&xsd;integer"/>
   <rdfs:range rdf:resource="&rdfs;Literal"/>
</rdf:Description>
RDF Graph

restricts usage to the inline datatyping idiom, such that the property value of ex:age must be a literal, and the rdfd:datatype assertion constrains the literal to the lexical space of xsd:integer.

The manner in which rdfs:range can constrain the valid datatyping idioms used should be kept in mind when merging graphs from disparate sources which may have strict but conflicting policies regarding idiom usage expressed in terms of rdfs:range constraints.

4.4 Datatype Classes and rdfs:subClassOf

RDF Datatype classes may be arranged hierarchically by rdfs:subClassOf relations just as any other RDF classes. However, because the RDF class extension of an rdfd:Datatype is restricted to only its value space, rdfs:subClassOf relations relate only the value spaces of two datatypes, not their lexical spaces or lexical to value mappings.

This means that if we define xsd:integer to be an rdfs:subClassOf xsd:decimal, we assert that the value space of xsd:integer is a subset of the value space of xsd:decimal; but we do not thereby assert that the lexical space of xsd:integer is a subset of the lexical space of xsd:decimal or that the lexical mapping of xsd:integer is a subset of, and consistent with, the lexical mapping of xsd:decimal (though in fact, in this particular case, they are).

It is quite possible for two datatypes which are related by an rdfs:subClassOf relation to have partially or completely disjunct lexical spaces and/or incompatible lexical mappings. For example, if we were to define a datatype ex:octal which used octal notation to represent integer values, then it would be the case that (a) the value spaces of ex:octal and xsd:integer would be identical, (b) the lexical space of ex:octal would be a subset of the lexical space of xsd:integer, and (c) the lexical to value mappings of the datatypes would be quite different (the lexical form "25" would map to the value twenty-five for xsd:integer but to the value twenty-one for ex:octal). Yet we could still assert that xsd:integer was an rdfs:subClassOf ex:octal, without any difficulties arising from their incompatable lexical mappings, since the relation only concerns their value spaces, and is thus valid: every member of the value space of xsd:integer is also a member of the value space of ex:octal.

A datatype class may also be a subclass of a non-datatype class; again, relating the value space of the datatype class to the set of members of the non-datatype class. For example, we could define a non-datatype class ex:INT which has no fixed set of lexical representations, but only represents the set of integer values, and assert that xsd:integer is an rdfs:subClassOf ex:INT -- that the value space of xsd:integer is a subset of the set of members of ex:INT.

It is important to note that because rdfs:subClassOf relations do not include the lexical representations of a given datatype, and these lexical representations and lexical to value mappings may not be eqivalent between the related datatypes, property values which are inferred based on rdfs:subClassOf relations should not be separated from the datatype context within which they were originally expressed, or else errors in interpretation could occur. Designers of query APIs will do well to keep this in mind.

[Is it an error to define a non-datatype class to be an rdfs:subClassOf an RDF datatype class? On the one hand, per the above specification, the subclass relation is only between value spaces -- but on the other hand, if the superordinate class is of rdf:type rdfd:Datatype, then isn't too the subordinate class of type rdfd:Datatype? And then, being an instance of rdfd:Datatype, is not the subordinate class expected to have all the characteristics thereof?

To what extent then, and by what mechanisms are the characteristics defined for rdfd:Datatype transferred/ascribed to datatype classes? Is the denotation of a datatype URI also only its value space with regards to rdf:type? Or when we assert that xsd:integer rdf:type rdfd:Datatype, does that apply to the "entire" datatype? What does the URI denote for rdf:type? Everything, or only value space?]

4.5 Datatype Properties and rdfs:subPropertyOf

[discuss the special nature of datatyping properties and warn against creating subproperty relations with non-datatype properties]

5 RDF Datatyping Model Theory

The RDF Model Theory explains the fundamental model-theoretic concepts like interpretation, universe, extension etc. used for interpreting the semantics of RDF graphs. This section assumes familiarity with these basic concepts.

Suppose I is an RDF interpretation of a graph E. Then I is datatyped (with respect to a set D of datatypes) if the following is true for any datatype URI-reference ddd (with I(ddd) in D):

(1) IEXT(I(ddd)) = {<y,x> : y = L2V(I(ddd))(x)} i.e. the inverse of the datatype (lexical form to value) mapping.

(2) ICEXT(I(ddd)) = {x : <x,y> in IEXT(I(ddd))} i.e. the value space of the datatype.

(3) For any literal "LLL", if E contains the triples

   <aaa, rdfd:datatype, ddd>
   <bbb, aaa, "LLL">

then L2V(I(ddd))("LLL") is defined; i.e. "LLL" is in the lexical space of I(ddd).

(4) For any literal "LLL", if E contains the triples

   <aaa, rdfd:datatype, ddd>
   <bbb, aaa, ccc>
   <ccc, rdfd:lex, "LLL">

then I(ccc) = L2V(I(ddd))("LLL") i.e. 'rdfd:lex' is restricted to have the same meaning as the datatype property.

5.1 Closure Rules

Rule If the graph contains: then add:
0   <rdfd:Datatype, rdf:type, rdfs:Class>
<rdfd:Datatype, rdfs:subClassOf, rdf:Property>
<rdfd:datatype, rdf:type, rdf:Property>
<rdfd:datatype, rdfs:domain, rdf:Property>
<rdfd:datatype, rdfs:range, rdfd:Datatype>
<rdfd:lex, rdf:type, rdf:Property>
<rdfd:lex, rdfs:domain, rdf:Resource>
<rdfd:lex, rdfs:range, rdfs:Literal>
1a <ddd, rdf:type, rdfd:Datatype> <ddd, rdfs:domain, ddd>
1b <ddd, rdf:type, rdfd:Datatype> <ddd, rdfs:subPropertyOf, rdfd:lex>
2 <aaa, rdfd:datatype, ddd>
<bbb, aaa, ccc>
<ccc, rdfd:lex, "LLL">
<ccc, rdf:type, ddd>

Note that not all of the semantic conditions defined herein can be fully captured by closures, most notably the limitation imposed by rdfd:datatype and datatype properties constraining literals to be members of the lexical space of the datatype in question.

6 RDF Schema for Datatyping

The following RDF Schema defines the ontology outlined above in its entirety.

<?xml version="1.0"?>
<!DOCTYPE rdf:RDF [
  <!ENTITY rdf  "http://www.w3.org/1999/02/22-rdf-syntax-ns#">
  <!ENTITY rdfs "http://www.w3.org/2000/01/rdf-schema#">
  <!ENTITY rdfd "http://www.w3.org/2002/rdf-datatyping#">
]>

<rdf:RDF xmlns:rdf="&rdf;"
         xmlns:rdfs="&rdfs;"
         xmlns:rdfd="&rdfd;">

   <rdfs:Class rdf:about="&rdfd;Datatype">
      <rdfs:label xml:lang="en">RDF Datatype (Property)</rdfs:label>
      <rdfs:comment xml:lang="en">
         An RDF Datatype consists of a value space, a lexical space,
         an optional canonical lexical space which is a subset of
         its lexical space, and an N:1 mapping from the lexical
         space to the value space. An RDF Datatype may also serve
         as a property which joins a literal node object which is
         a member of the lexical space of that datatype (a lexical
         form) to a non-literal node subject which denotes the
         single member of the value space of that datatype (a
         datatype value) which is represented by the lexical form.
      </rdfs:comment>
      <rdfs:subClassOf rdf:resource="&rdf;Property"/>
   </rdfs:Class>

   <rdf:Property rdf:about="&rdfd;datatype">
      <rdfs:label xml:lang="en">RDF Datatype Range</rdfs:label>
      <rdfs:comment xml:lang="en">
         This property associates a datatype with a particular
         property. This associated datatype serves to constrain
         (by only providing valid interpretations for) all values
         of the property to correspond either to a literal node
         which is a member of the lexical space of the specified
         datatype (a lexical form), or to a non-literal node denoting
         a member of the value space of the specified datatype (a
         datatype value) to which is attached by means of either
         the rdf:lex property or a datatype property a literal node
         which is a member of the lexical space of the specified
         datatype. The associated datatype also provides the
         datatype context within which the lexical form is to be
         interpreted to determine the single datatype value
         represented by the lexical form.
      </rdfs:comment>
      <rdfs:domain rdf:resource="&rdf;Property"/>
      <rdfs:range  rdf:resource="&rdfd;Datatype"/>
   </rdf:Property>
   
   <rdf:Property rdf:about="&rdfd;lex">
      <rdfs:label xml:lang="en">RDF Datatype Lexical Form</rdfs:label>
      <rdfs:comment xml:lang="en">
         This property associates a literal node object which is
         a member of the lexical space of some (possibly unknown)
         datatype (a lexical form) with a non-literal node subject
         denoting the single member of the value space of the same
         datatype as the lexical form and which is represented by
         that lexical form.
      </rdfs:comment>
      <rdfs:domain rdf:resource="&rdf;Resource"/>
      <rdfs:range  rdf:resource="&rdfs;Literal"/>
  </rdf:Property>

</rdf:RDF>

7 Appendices

The following appendices are non-normative.

7.1 Levels of Interpretation

[discuss the different levels of interpretation on the graph provided by the MT, the datatyping idioms, and datatype aware applications]

[need to expand/refine discussion in each subsection]

7.1.1 Literal Graph Representation

The inline, datatype triple, and lexical form idioms; together with a datatype range constraint.

7.1.2 RDF Model Theory Interpretation

The RDF MT interpretation (with no datatyping semantics) is that the shared literal node value of the ex:age property in the inline idiom and the xsd:integer and rdfd:lex properties denotes itself and the blank node values of the ex:age property in the lexical form and datatype triple idioms each denote some non-literal resource.

7.1.3 RDF Datatyping Interpretation

The RDF Datatyping interpretation of the value of ex:age for all three idioms is the same, and is the datatyped literal pairing <xsd:integer, "25">. The value identified by the datatyped literal pairing (whatever that might be) is denoted by the blank nodes of the lexical form and datatype property idioms but has no explicit denotation in the inline idiom.

7.1.4 Extra-RDF Application Interpretation

The extra-RDF application interpretation, which has full knowledge of the datatype xsd:integer, of the value of ex:age for all three idioms is the same, and is the number twenty-five. The value twenty-five is denoted by the blank nodes of the lexical form and datatype property idioms but has no explicit denotation in the inline idiom.

7.2 Use Cases

[provide examples of how RDF Datatyping is expected to be applied in various application contexts]

7.2.1 DAML+OIL

[...TBD...]

7.2.2 CC/PP

[...TBD...]

7.2.3 Dublin Core

[

Examples in the "Encoding Schemes" section of the Dublin Core in
RDF Draft[1] converted to the new datatyping proposal (need to
be normalized, with expanded verbage, etc):

[1] http://logicerror.com/dcrdfDraft

*** EXAMPLE 1 ***
_:page dc:subject  _:a .
_:a    rdf:type    dct:MESH .
_:a    rdf:value  "D08.586.682.075.400" .
_:a    rdfs:label "Formate Dehydrogenase" .

becomes

_:page dc:subject  _:a .
_:a    dct:MESH   "D08.586.682.075.400" .
_:a    rdfs:label "Formate Dehydrogenase" . # @@ a better datatype for this?

*** EXAMPLE 2 ***
_:page dc:language _:a .
_:a    rdf:type    dct:RFC1766 .
_:a    rdf:value  "EN" .
_:a    rdfs:label "English" .

becomes

_:page dc:language _:a .
_:a    dct:RFC1766 "EN" .
_:a    rdfs:label "English" .

*** EXAMPLE 3 ***
_:page dc:coverage _:a .
_:a    rdf:type    dct:Point .
_:a    rdf:value   _:b .
_:b    rdf:type    dct:DCSV .
_:b    rdf:value   "name=Perth, W.A.; east=115.85717; north=-31.95301" .

becomes

_:page  dc:coverage     _:a .
_:a     dct:DCSV       "name=Perth, W.A.; east=115.85717; north=-31.95301" .
dc:DCSV rdfs:subClassOf dct:Point . # @@ is this right?

]

7.2.4 ???

[...suggestions for other use cases welcome...]

7.3 RDF Datatyping and Complex (Structured) XML Datatypes

[outline methodology for associating datatypes with XML literals such that a complex datatype is viewed similarly to a simple datatype such that its lexical space is the set of possible serializations conforming to the content model defined for the complex datatype and the value space is the set of XML Infosets represented by those serializations. An XML literal (parseType="Literal") can thus be associated with the complex datatype in the same fashion as for simple datatypes, and with similar results (in fact, one might even argue that there is no real difference whatsoever ;-) ...]

8 References

W3C RDF Core Working Group Charter, Mar 2001, http://www.w3.org/2001/sw/RDFCoreWGCharter

W3C RDF Primer, ??? 2002, http://www.w3.org/TR/2002/WD-rdf-primer-20020319/

W3C RDF Syntax, ??? 2002, http://www.w3.org/TR/rdf-syntax-grammar/

W3C RDF Test Cases, ??? 2002, http://www.w3.org/TR/rdf-testcases/

W3C RDF Model Theory, ??? 2002, http://www.w3.org/TR/rdf-mt/

W3C RDF Schema, ??? 2002, http://www.w3.org/2001/sw/RDFCore/Schema/20010913/

XML Schema Part 2: Datatypes, ??? 2001, http://www.w3.org/TR/xmlschema-2/

DAML+OIL..., ??? 200?, http://???

OWL..., ??? 200?, http://???

CC/PP..., ??? 200?, http://???

9 Acknowledgments

This document has benefited from the input of many members of the RDF Core Working Group. Particular thanks to Jeremy Carroll, Dan Connoly, Martyn Horner, Graham Klyne, and Frank Manola for their contributions during the development of the RDF Datatyping specification. Special thanks to Graham Klyne for his contributions to the section on RDF desiderada and the CC/PP use case. Thanks to Aaron Swartz for his contribution of the Dublin Core use case.


RDF/XML Metadata