RDF Datatyping -- Current Working Proposal

Editors:
Patrick Stickler, Nokia, patrick.stickler@nokia.com
Pat Hayes, University of West Florida, phayes@ai.uwf.edu
Sergey Melnik, Stanford University, melnik@db.stanford.edu

[Latest revisions by Patrick. Yet to be reviewed fully by Pat and Sergey.]

Notes Regarding this Document

This document is not presumed to reflect the final consensus of the RDF Core WG but represents a proposal that is currently being discussed and revised by the WG.

Some or all of the contents of this document may change, or the entire document may be discarded at any time. Do not design or modify implementations based on any content contained herein.

Comments on this document are invited and should be sent to the public mailing list www-rdf-comments@w3.org. An archive of comments is available at http://lists.w3.org/Archives/Public/www-rdf-comments/.

There are three types of content in this document:

  1. Content defining the core datatyping proposal, presented in black text.
  2. Content defining possible extensions or refinements to the core proposal, presented in gray text.
  3. Comments, questions, or draft material by the editors, presented in red text.

Note that any or all of the optional extensions or refinements (in gray) may be removed without any impact whatsoever to the core proposal and therefore rejection of any particular proposed extension or refinement does not constitute grounds for rejection of the core proposal itself.

Finally, please note that this current edition of this document focuses primarily on the technical details of the datatyping solution, and has not yet been fully edited with regards to final presentation and wording. Some sections, therefore, may appear terse or loosely worded. These issues will be addressed prior to publication as a working draft. Comments and discussion of this document should therefore primarily concern technical aspects of the datatyping proposal (though all comments and input are, of course, welcome and will be taken into consideration).


Abstract

The Resource Description Framework (RDF) is a general-purpose language for representing information in the World Wide Web. RDF provides a common framework for expressing this information in such a way that it can be exchanged between applications without loss of meaning. The utility and reliability of information exchanged between applications typically requires that datatyping information be unambiguous and that the interpretation of datatyped values, which may have local representations that differ from system to system, be consistent between disparate applications. Achieving consistency in the exchange and interpretation of such datatyped information requires a well defined and standardized methodology for expressing and interpreting datatyping information. This document defines a particular methodology for expressing datatyped information in RDF and aims to provide the reader the basic fundamentals required to effectively use datatypes and datatyped values with RDF in their particular applications.

1. Introduction

The Resource Description Framework (RDF) is a general-purpose language for representing information in the World Wide Web. It is particularly intended for representing metadata about Web resources, such as the title, author, and modification date of a Web page, the copyright and syndication information about a Web document, the availability schedule for some shared resource, or the description of a Web user's preferences for information delivery. However, by generalizing the concept of a "Web resource", RDF can be used to represent information about anything that can be identified on the Web, such as information about items available from online shopping facilities (e.g., information about prices, publishers, and availability of books or recordings).

RDF provides a common framework for expressing this information in such a way that it can be exchanged between applications without loss of meaning. Since it is a common framework, application designers can leverage the availability of common RDF parsers and processing tools. Exchanging information between different applications means that the information may be made available to applications other than those for which it was originally created.

The utility and reliability of information exchanged between applications typically requires that datatyping information be unambiguous and that the interpretation of datatyped values, which may have local representations that differ from system to system, be consistent between disparate applications. Achieving consistency in the exchange and interpretation of such datatyped information requires a well defined and standardized methodology for expressing and interpreting datatyping information.

This document defines a particular methodology for expressing datatyped information in RDF and aims to provide the reader the basic fundamentals required to effectively use datatypes and datatyped values with RDF in their particular applications.

1.1 What is Datatyping?

[informal definition of datatyping]

[common datatyping scenarios, where datatyping is needed]

Due to RDF's role as a means of interchange between disparate systems, and in order to achieve portability and independence of platform it is necessary to forgoe any native representation of values or native datatypes in RDF itself. This means that RDF has no built-in knowledge about particular datatypes such as strings or integers, and the lexical representation of a given value, such as the number twenty-five "25", has no native interpretation in RDF. RDF is datatype neutral in the same manner as it is vocabulary neutral. The specific semantics for individual datatypes must reside in the application layers above RDF.

In RDF Datatyping, literals are taken to represent the lexical representations (lexical forms) of datatype values and their datatype interpretation is based on an association of the literal with a particular datatype. The literal node in the graph denotes the datatype value which it represents.

The nature of datatypes, the means by which literals are associated with datatypes, and the interpretation of typed literals are the focus of this document.

1.2 Desiderata for RDF Datatyping

The following list summarizes the specific desiderada that were taken into account during the development of this specification:

It is believed that the methodology for datatyping described in this specification satisfies all of the above desiderada.

1.3 Related Documents

The complete specification of RDF consists of a number of documents:

This document is intended to augment the other parts of the RDF specification, to help information producers, system designers and application developers understand how datatypes and datatyping can be used with RDF.

1.4 Comments on the Examples

Each example is represented in three forms:

  1. its RDF/XML representation
  2. its N-Triples representation
  3. a graph illustration

For the sake of brevity and clarity, XML entities (e.g. &rdf;) are used in the XML examples provided in this specification where URI References occur as attribute values. In addition, local and qualified names are used as node and arc labels in the and illustrations, even though the actual graph nodes will contain complete URI References as labels.

The following XML 'wrapper' should be assumed for all RDF/XML examples used in this specification:

<?xml version="1.0"?>
<!DOCTYPE rdf:RDF [
  <!ENTITY rdf  "http://www.w3.org/1999/02/22-rdf-syntax-ns#">
  <!ENTITY rdfs "http://www.w3.org/2000/01/rdf-schema#">
  <!ENTITY xsd  "http://www.w3.org/2001/XMLSchema#">
  <!ENTITY base "http://www.w3.org/2002/rdf-datatyping/examples#">
]>

<rdf:RDF xmlns:rdf  ="&rdf;"
         xmlns:rdfs ="&rdfs;"
         xmlns:xsd  ="&xsd;"
         xmlns      ="&base;"
         xml:base   ="&base;">

   <!-- example -->

</rdf:RDF>

[Test cases will be derived from the examples.]

1.5 Comments on the Structure of RDF Literals

RDF literals are structured objects consisting of a typed unicode string which is optionally qualified as XML content (rdf:parseType equal to "Literal") and/or having an associated xml:lang value.

The structure of a literal can be represented by a 4-tuple, comprised of

  1. a datatype, denoted by a URIref or, if implicit, a system identifier
  2. a parseType bit, where 1 indicates an XML literal and 0 indicates a non-XML literal
  3. a unicode string, constituting a lexical form of the (possibly implicit) datatype
  4. an optional xml:lang language code

The syntax for representing typed literal nodes in N-Triples is proposed to be as follows:


   Add the following changes/productions to the N-Triples EBNF grammar:

   literal  ::= datatype '/' ( langstring | xmlString )
   datatype ::= ( uriref | systemID )
   systemID ::= '_:' name
   nodeID   ::= systemID

   Thus:

   implicitly typed non-XML literal                              _:w/"25"
   implicitly typed non-XML literal with lang                    _:x/"25"-en
   explicitly typed non-XML literal               <http://...#integer>/"25"
   explicitly typed non-XML literal with lang     <http://...#integer>/"25"-en

   implicitly typed XML literal                        _:y/xml"<xhtml:h1>Foo</xhtml:h1>"
   implicitly typed XML literal with lang              _:z/xml"<xhtml:h1>Foo</xhtml:h1>"-en
   explicitly typed XML literal              <http://...#h1>/xml"<xhtml:h1>Foo</xhtml:h1>"
   explicitly typed XML literal with lang    <http://...#h1>/xml"<xhtml:h1>Foo</xhtml:h1>"-en

It is important to maintain the partition between the datatype and the literal in the case of implicitly typed XML literals (as well as qname typed XML literals in N3, see below). If XML literals were denoted by something other than a legal name 'xml' then it is possible that the delimiting character '/' could be omitted. It is also possible to use some other delimiting character, if another would be considered better.

The character '/' was chosen because it is visually distinct and has a commonly percieved function of delimiting hierarchical scope; and in the case of a typed literal, the literal string portion can be seen as residing within the scope of the datatype.

As an aside, although the WG is not technically concerned with N3 syntax, for those who care, the above representation for typed literal nodes also works with qnames and therefore would represent a fairly painless evolution path for N3:


   implicitly typed non-XML literal                                  "25"
   implicitly typed non-XML literal with lang                        "25"-en
   qname typed non-XML literal                           xsd:integer/"25"
   qname typed non-XML literal with lang                 xsd:integer/"25"-en

   implicitly typed XML literal                            xml"<xhtml:h1>Foo</xhtml:h1>"
   implicitly typed XML literal with lang                  xml"<xhtml:h1>Foo</xhtml:h1>"-en
   qname typed XML literal                     xhtml:h1/xml"<xhtml:h1>Foo</xhtml:h1>"
   qname typed XML literal with lang           xhtml:h1/xml"<xhtml:h1>Foo</xhtml:h1>"-en

Of course, the maintainers of N3 are free to do as they like regarding the representation of typed literals. This is only a suggestion.

The structure of a literal is transparent with regards to RDF Datatyping and all that is significant is the type and unicode string portions. The parseType bit and xml:lang (if present) are irrelevant to RDF Datatyping and to the meaning of the literal.

This treatment is in-line with XML Schema's views on xml:lang as well, which explicitly forbids datatyped values from being qualified by xml:lang. RDF Datatyping allows it, but ignores it.

[refs to syntax/primer/etc]

2. RDF Datatypes

The conceptual framework for RDF datatyping presented in this specification is compatable with the type system defined by XML Schema for both simple and complex datatypes. It also can be used with any datatyping framework which conforms to the characteristics of datatypes as defined below.

2.1 rdfs:Datatype

RDF Datatyping defines an rdfs:Datatype as consisting of

  1. a set of distinct values, called its value space
  2. a set of lexical representations or forms, called its lexical space
  3. an N:1 mapping from the lexical space to the value space, called its datatype mapping

2.2 Datatype Mapping

A datatype mapping is a set of pairs whose first element belongs to the lexical space of the datatype, and the second element belongs to the value space of the datatype.

A datatype mapping satisfies the following properties:

For example, the datatype mapping for the XML Schema datatype 'xsd:boolean', where each member of the value space (represented here as 'T' and 'F') has two lexical representations, is as follows:

Value Space {T, F}
Lexical Space {"0", "1", "true", "false"}
Datatype Mapping {<"true", T>, <"1", T>, <"0", F>, <"false", F>}

For an XML Schema complex datatype, its value space is the set of all valid infosets licensed by its content model and its datatype mapping is the mapping from each XML serialization to its corresponding infoset. Two XML serializations which correspond to the same infoset are considered synonymous lexical forms, just as both "5" and "0005" are synonymous lexical forms representing the same xsd:integer value five

2.3 Typed Literal

A typed literal is a pair where the first element is a URI Reference (or implicit systemID) denoting a datatype and the second element is a lexical form (literal). Following from the nature of datatypes as defined above, this pairing of datatype and lexical form unambiguously identifies a specific member of a datatype mapping and hence a specific member of the value space of the datatype.

A typed literal can be considered a "literal-in-context" where the datatype provides the context for interpretation of the lexical form (literal) to obtain an actual value.

For example, the typed literals which can be defined for the XML Schema datatype xsd:boolean are as follows:

Typed Literal Member of Datatype Mapping
Denoted by Typed Literal
Member of Value Space
Denoted by Typed Literal
<xsd:boolean, "true"> <"true", T> T
<xsd:boolean, "1"> <"1", T> T
<xsd:boolean, "false"> <"false", F> F
<xsd:boolean, "0"> <"0", F> F

RDF datatyping is primarily concerned with the implicit or explicit designation of typed literal pairings. RDF datatyping only provides for the designation of typed literals. The internal structure and semantics of all datatypes are opaque to RDF; i.e. membership of value and lexical spaces, datatype mappings, etc. have neither representation nor interpretation in RDF. Actual interpretation of typed literals (determination of the actual value denoted by the typed literal) is performed externally to RDF by applications which have sufficient knowledge of the particular datatypes in question. RDF datatyping only provides the datatype context within which such interpretation is to take place.

3. Designation of Typed Literals in RDF

A typed literal may be designated in one of two ways in RDF, either locally (explicitly) or globally (implicitly).

3.1 Local Datatyping

Local datatyping associates a datatype with each individual property value explicitly by means of a typed literal node. Thus


<rdf:Description rdf:about="#John">
   <age rdf:type="&xsd;integer">25</age>
</rdf:Description>

<http://www.w3.org/2002/rdf-datatyping/examples#John>
   <http://www.w3.org/2002/rdf-datatyping/examples#age>
      <http://www.w3.org/2001/XMLSchema#integer>/"25" .
RDF Graph

says that John's age is the member of the value space of xsd:integer which is represented by the lexical form "25". And from what we know about the datatype xsd:integer, we then know that John's age is the value twenty-five.

A typed literal node is valid when the literal is a member of the lexical space of the datatype, in which case the typed literal node is interpreted as denoting the member of the value space of the datatype represented by that lexical form. Thus


<rdf:Description rdf:about="#John">
   <age rdf:type="&xsd;integer">pumpkin</age>
</rdf:Description>

<http://www.w3.org/2002/rdf-datatyping/examples#John>
   <http://www.w3.org/2002/rdf-datatyping/examples#age>
      <http://www.w3.org/2001/XMLSchema#integer>/"pumpkin" .
RDF Graph

would always be invalid, no matter what value is assigned to the typed literal node, as "pumpkin" is not a member of the lexical space of xsd:integer.

It is important to note that RDF cannot itself make such a determination of datatyping validity, but such validation can only be performed by an external application with sufficient knowledge about the particular datatype in question. RDF merely provides means for the designation of the typed literal pairings upon which such validation would be performed.

Local datatyping works in the same way for XML literals. An XML literal which represents an instance of the vCard:n complex element type can be typed explicitly as follows:


<rdf:Description rdf:about="#John">
   <name rdf:parseType="Literal" rdf:type="&vCard;n">
      <n xmlns="&vCard;">
         <family>Doe</family>
         <given>John</given>
      </n>
   </name>
</rdf:Description>

<http://www.w3.org/2002/rdf-datatyping/examples#John>
   <http://www.w3.org/2002/rdf-datatyping/examples#name>
      <http://...#n>/xml"<n xmlns="&vCard;"><family>Doe</family><given>John<given></n>" .
RDF Graph

3.2 Global Datatyping

It is often convenient to associate a datatype with a property, so that every use of the property can be understood as asserting particular datatype for its value. Global datatyping leaves the datatype of the literal implicit, to be determined by the datatype associated with the property itself.

RDF Datatyping employs rdfs:range to associate a datatype with a particular property. The associated datatype serves to constrain (by only providing valid interpretations for) all values of the property to correspond to members of the value space of the designated datatype, and (according to the characteristics of RDF datatypes) also constrains all lexical forms to members of the lexical space of the datatype.

In cases where no datatype is asserted for an occurrence of a given literal, a datatype range defined for the property provides the datatype from which the typed literal pairing is derived.

For example, we may wish to constrain the property age so that its use and interpretation is bound to integer values as defined by the datatype xsd:integer, and given that fixed interpretation, the datatype need not be specified for each property value, but may be left implicit, defined globally for the property itself:


<rdf:Description rdf:about="#age">
   <rdfs:range rdf:resource="&xsd;integer"/>
</rdf:Description>

<rdf:Description rdf:about="#Jane">
   <age>25</age>
</rdf:Description>

<http://www.w3.org/2002/rdf-datatyping/examples#age>
   <http://www.w3.org/2000/01/rdf-schema#range>
      <http://www.w3.org/2001/XMLSchema#integer> .

<http://www.w3.org/2002/rdf-datatyping/examples#Jane>
   <http://www.w3.org/2002/rdf-datatyping/examples#age>
      _:a/"25" .
RDF Graph

Thus, the datatype context within which "25" is interpreted is xsd:integer, and "25" is required to be a valid member of the lexical space of xsd:integer. The literal node is thus interpreted as denoting the integer value twenty-five. The rdfs:range assertion clarifies which datatype the systemID _:a denotes (see the Model Theory).

The systemID (_:a) portion of the implicitly typed literal node does in fact denote "some" datatype. All literals are typed, either explicitly by URIref or implicitly by systemID.

I.e., an implicitly typed literal denotes a datatype value that has that particular lexical representation, only we don't know from the literal node itself which datatype is meant.

Given global datatyping, via rdfs:range, the particular datatype denoted by the systemID may be determined, per rule 3 of the Model Theory.

Furthermore, since the systemID denoting an unspecified datatype will be unique for every implicitly typed literal, triples stores can safely presume that all nodes are tidy by label (which they are), and do node merging without having to be concerned about the type of node (uriref, bnode, literal).

Thus the semantic untidyness of implicitly typed literals is captured in the unique systemID denoting the unspecified datatype.

The rdfs:range assertion both provides information necessary for the proper interpretation of the implicitly typed literals as well as (indirectly) constrains the valid set of literals to the lexical space of the specified datatype.

This last point is illustrated by


<rdf:Description rdf:about="#age">
   <rdfs:range rdf:resource="&xsd;integer"/>
</rdf:Description>

<rdf:Description rdf:about="#Jane">
   <age>Mid-Twenties</age>
</rdf:Description>

<http://www.w3.org/2002/rdf-datatyping/examples#age>
   <http://www.w3.org/2000/01/rdf-schema#range>
      <http://www.w3.org/2001/XMLSchema#integer> .

<Jane>
   <http://www.w3.org/2002/rdf-datatyping/examples#age>
      _:b/"Mid-Twenties" .
RDF Graph

which constitutes a datatype violation, because the datatype context asserted by rdfs:range restricts the set of valid property values to the value space of the particular datatype, and the literal "Mid-Twenties" is not a member of the lexical space of xsd:integer and thus does not represent any member of its value space.

It is important to point out that only an extra-RDF application with complete knowledge about the datatype in question would be able to detect such a datatype violation. Datatypes are fully opaque to RDF and neither RDF nor RDF Schema provide generic means for datatype validation. RDF Datatyping provides mechanisms for the expression of typed literal pairings by specific representations which have a well defined representation and interpretation, but cannot determine the validity of individual pairings directly. This is primarily due to RDF's role as a means of interchange between disparate systems, and in order to achieve portability and independence of platform it is necessary to forgo any native representation of values or native datatypes in RDF itself. RDF is datatype neutral in the same manner as it is vocabulary neutral. The specific semantics for individual datatypes must reside in the application layers above RDF.

3.2.1 Under-Specified Datatyping

In the case of a non-typed literal, where no datatype range is specified for the property, the meaning of the literal node (what that literal node denotes) is under-specified. It denotes some datatype value which has a lexical representation corresponding to the literal string, but in the absence of any knowledge of which datatype context constrains its interpretation, we cannot know which datatype value it denotes. This is similar to the case of a blank node, where although one knows that it denotes "something", one does not know what that something is.

3.2.2 Datatype Clashes

The datatype interpretations imposed on a property by rdfs:range apply to any such usage of the property anywhere in the RDF graph, so an rdfs:range assertion has a global scope, and therefore needs to be used with care. For example, if both global and local datatyping is employed for the same property, then a globally asserted datatype can produce a conflict with an incompatable, locally asserted datatype:


<rdf:Description rdf:about="#age">
   <rdfs:range rdf:resource="&xsd;integer"/>
</rdf:Description>

<rdf:Description rdf:about="#Judy">
   <age>25</age>
</rdf:Description>

<rdf:Description rdf:about="#Jane">
   <age rdf:type="&xsd;string">Mid-Twenties</age>
</rdf:Description>

<http://www.w3.org/2002/rdf-datatyping/examples#age>
   <http://www.w3.org/2000/01/rdf-schema#range>
      <http://www.w3.org/2001/XMLSchema#integer> .

<http://www.w3.org/2002/rdf-datatyping/examples#Judy>
   <http://www.w3.org/2002/rdf-datatyping/examples#age>
      _:c/"25" .

<http://www.w3.org/2002/rdf-datatyping/examples#Jane>
   <http://www.w3.org/2002/rdf-datatyping/examples#age>
      <http://www.w3.org/2001/XMLSchema#string>/"Mid-Twenties" .
RDF Graph

Here, the global datatype xsd:integer is asserted for all uses of the property age, and while the value for Judy's age satisfies the constraints of the xsd:integer datatype, there is a conflict with the definition of Janes's age in that while the local datatyping context of xsd:string is valid, the lexical form "Mid-Twenties" conflicts with the globally asserted datatype context for xsd:integer. Thus, care must be taken when asserting global datatype contexts to ensure that such clashes do not arise, or to at least be aware of the potential for such datatype clashes.

Another source of datatype clash is when merging two graphs which have differing global assertions regarding the datatype contexts of a given property. Thus, given

From graph 1:

<rdf:Description rdf:about="#age">
   <rdfs:range rdf:resource="&xsd;integer"/>
</rdf:Description>

From graph 2:

<rdf:Description rdf:about="#age">
   <rdfs:range rdf:resource="&xsd;duration"/>
</rdf:Description>

From graph 1:

<http://www.w3.org/2002/rdf-datatyping/examples#age>
   <http://www.w3.org/2000/01/rdf-schema#range>
      <http://www.w3.org/2001/XMLSchema#integer> .

From graph 2:

<http://www.w3.org/2002/rdf-datatyping/examples#age>
   <http://www.w3.org/2000/01/rdf-schema#range>
      <http://www.w3.org/2001/XMLSchema#duration> .
RDF Graph

if the lexical spaces of the datatypes are disjunct, or only partially intersect, then some or all of the possible lexical forms will fail to satisfy the constraints of at least one of the datatypes specified. Even if the different datatypes have identicial lexical spaces, there is no garuntee that they will share the same lexical to value mappings and thus erroneous interpretations could arise. Thus, care should be taken when merging graphs containing implicit idioms and having different, and possibly incompatible, global rdfs:range assertions.

6. Literal Subjects

All literal nodes, both explicitly and implicitly typed, have a globally consistent and unambiguous meaning, similar to URIref nodes or blank nodes respectively, and therefore may occur as the subject of RDF Statements.

Typed literal subjects are expressed in RDF/XML using the following idiom:


<rdf:Description rdf:type="&some;DatatypeClass" rdf:lexicalForm="LLL">
   <!-- statements -->
</rdf:Description>

or, more concisely


<some:DatatypeClass rdf:lexicalForm="LLL">
   <!-- statements -->
</some:DatatypeClass>

E.g.


<xsd:lang rdf:lexicalForm="en">
   <rdfs:label xml:lang="en">English</rdfs:label>
   <rdfs:label xml:lang="sp">Ingles</rdfs:label>
   <rdfs:label xml:lang="fi">Englanti</rdfs:label>
</xsd:lang>

<http://www.w3.org/2001/XMLSchema#lang>/"en"
   <http://www.w3.org/2000/01/rdf-schema#label>
      _:x/"English"-en .

<http://www.w3.org/2001/XMLSchema#lang>/"en"
   <http://www.w3.org/2000/01/rdf-schema#label>
      _:y/"Ingles"-sp .

<http://www.w3.org/2001/XMLSchema#lang>/"en"
   <http://www.w3.org/2000/01/rdf-schema#label>
      _:z/"Englanti"-fi .

In the case of implicitly typed literals, the systemID denoting the unspecified datatype must of course be known in order to express any statements about the value denoted by the literal node. E.g.


<_:a rdf:lexicalForm="xyz">
   <rdfs:comment>This is the only node in the graph with this label.</rdfs:comment>
</_:a>

_:a/"xyz"
   <http://www.w3.org/2000/01/rdf-schema#comment>
      _:w/"This is the only node in the graph with this label." .

Although not likely to be of any practical use, it is technically possible to express a statement about a datatype value without specifying the datatype:

<rdf:Description rdf:lexicalForm="123">
   <rdfs:comment>Your guess is as good as mine what value this denotes...</rdfs:comment>
</rdf:Description>

_:g/"123"
   <http://www.w3.org/2000/01/rdf-schema#comment>
      _:i/"Your guess is as good as mine what value this denotes..." .

Note that the above rdfs:comment statement is not about the lexical form "123", but about the datatype value for which the string "123" is a lexical representation. If one wishes to make a statement about the string itself, then one should specify a datatype such as xsd:string in order to denote the actual string value. I.e.


<xsd:string rdf:lexicalForm="123">
   <rdfs:comment>This string value consists of three characters.</rdfs:comment>
</xsd:string>

<http://www.w3.org/2001/XMLSchema#string>/"123"
   <http://www.w3.org/2000/01/rdf-schema#comment>
      _:m/"This string value consists of three characters." .

[The attribute rdf:lexicalForm is a syntactic construct, similar to rdf:about and rdf:ID. It is only a mechanism of the RDF/XML serialization and does not occur as a term in the graph.]

5. RDF Datatyping Model Theory

[This needs to be checked by Pat... probably best to ignore it completely until then...]

The RDF Model Theory explains the fundamental model-theoretic concepts like interpretation, universe, extension etc. used for interpreting the semantics of RDF graphs. This section assumes familiarity with these basic concepts.

Suppose I is an RDF interpretation of a graph E. Then I is datatyped (with respect to a set D of datatypes) if the following is true for any datatype URI Reference ddd (with I(ddd) in D):

(1) ICEXT(I(ddd)) = {x : <x,y> in IEXT(I(ddd))} i.e. the value space of the datatype.

(2) For any explicitly typed literal ddd/"LLL", I(ddd/"LLL") = L2V(I(ddd))("LLL")

(3) For any implicitly typed literal _:x/"LLL", if E contains the triples

   ddd rdf:type rdfs:Datatype .
   aaa bbb _:x/"LLL" .
   bbb rdfs:range ddd .

then I(_:x/"LLL") = I(ddd/"LLL")

6. RDF Schema for Datatyping

The following RDF Schema defines the class rdfs:Datatype.

<?xml version="1.0"?>
<!DOCTYPE rdf:RDF [
  <!ENTITY rdf  "http://www.w3.org/1999/02/22-rdf-syntax-ns#">
  <!ENTITY rdfs "http://www.w3.org/2000/01/rdf-schema#">
]>

<rdf:RDF xmlns:rdf="&rdf;"
         xmlns:rdfs="&rdfs;">

   <rdfs:Class rdf:about="&rdfs;Datatype">
      <rdfs:label xml:lang="en">RDF Datatype</rdfs:label>
      <rdfs:comment xml:lang="en">
         An RDF Datatype consists of a value space, a lexical space,
         and an N:1 mapping from the lexical space to the value space. 
      </rdfs:comment>
      <rdfs:subClassOf rdf:resource="&rdf;Property"/>
   </rdfs:Class>

</rdf:RDF>

7. Appendices

The following appendices are non-normative.

6.1 Select Use Cases

6.1.1 Dublin Core

[original examples provided by Aaron, edited by Patrick]

[Note: qnames are used in the N-Triples for these examples...]

Examples in the "Encoding Schemes" section of the Dublin Core in RDF Draft[1] converted to the new datatyping proposal (need to be normalized, with expanded verbage, etc):

[1] http://logicerror.com/dcrdfDraft

Example 1:


<rdf:Description rdf:about="#page">
   <dc:subject>
      <dcq:MESH>
         <rdf:value>D08.586.682.075.400</rdf:value>
	 <rdfs:label>Formate Dehydrogenase</rdfs:label>
      </dcq:MESH>
   </dc:subject>
</rdf:Description>

<#page> dc:subject  _:a .
_:a     rdf:type    dcq:MESH .
_:a     rdf:value   "D08.586.682.075.400" .
_:a     rdfs:label  "Formate Dehydrogenase" .

becomes


<rdf:Description rdf:about="#page">
   <dc:subject rdf:type="&dcq;MESH">D08.586.682.075.400</dc:subject>
</rdf:Description>
<dcq:MESH rdf:lexicalForm="D08.586.682.075.400">
   <rdfs:label>Formate Dehydrogenase</rdfs:label>
</dcq:MESH>

<#page> dc:subject dcq:MESH/"D08.586.682.075.400" .
dcq:MESH/"D08.586.682.075.400" rdfs:label "Formate Dehydrogenase" .

Note that this revised representation is far more efficient since the label is defined for the datatype value only once, globally, rather than redundantly for every occurrence of the value.

Example 2:


<rdf:Description rdf:about="#page">
   <dc:language>
      <dcq:RFC1766>
         <rdf:value>EN</rdf:value>
	 <rdfs:label>English</rdfs:label>
      </dcq:RFC1766>
   </dc:language>
</rdf:Description>

<#page> dc:language _:a .
_:a     rdf:type    dcq:RFC1766 .
_:a     rdf:value  "EN" .
_:a     rdfs:label "English" .

becomes


<rdf:Description rdf:about="#page">
   <dc:language rdf:type="&dcq;RFC1766">EN</dc:language>
</rdf:Description>
<dcq:RFC1766 rdf:lexicalForm="EN">
   <rdfs:label>English</rdfs:label>
</dcq:RFC1766>

<#page> dc:language dcq:RFC1766/"EN" .
dcq:RFC1766/"EN" rdfs:label "English" .

Example 3:


<rdf:Description rdf:about="#page">
   <dc:coverage>
      <dcq:Point>
         <rdf:value>
            <dcq:DCSV>
               <rdf:value>name=Perth, W.A.; east=115.85717; north=-31.95301</rdf:value>
            </dcq:DCSV>
         </rdf:value>
      </dcq:Point>
   </dc:coverage> 
</rdf:Description>

<#page> dc:coverage _:a .
_:a     rdf:type    dcq:Point .
_:a     rdf:value   _:b .
_:b     rdf:type    dcq:DCSV .
_:b     rdf:value   "name=Perth, W.A.; east=115.85717; north=-31.95301" .

becomes


<rdf:Description rdf:about="#page">
   <dc:coverage>
      <dcq:Point>
         <rdf:value rdf:type="&dcq;DCSV">name=Perth, W.A.; east=115.85717; north=-31.95301</rdf:value>
      </dcq:Point>
   </dc:coverage> 
</rdf:Description>

<#page> dc:coverage _:a .
_:a     rdf:type    dcq:Point .
_:a     rdf:value   dcq:DCSV/"name=Perth, W.A.; east=115.85717; north=-31.95301" .

or, even more concisely


<rdf:Description rdf:about="#page">
   <dc:coverage rdf:type="&dcq;DCSV">name=Perth, W.A.; east=115.85717; north=-31.95301</dc:coverage> 
</rdf:Description>

<rdf:Description rdf:about="&dcq;DCSV">
   <rdfs:subClassOf rdf:resource="&dcq;Point"/>
</rdf:Description>

<#page> dc:coverage dcq:DCSV/"name=Perth, W.A.; east=115.85717; north=-31.95301" .

dcq:DCSV rdfs:subClassOf dcq:Point .

6.1.2 CC/PP

[Example provided by Mark Butler, chair of CC/PP WG]

At present, the CC/PP schema does not explicitly define datatyping constraints for properties (since to date, RDF has not provided a mechanism for doing so) but does constrain each property to a particular datatype, which is specified in the comments. All property values are inlined, with no explicit local typing. Thus, at present, we have

<rdf:Description ID="BitsPerPixel">
   <rdf:type rdf:resource="http://www.w3.org/TR/PR-rdf-schema#Property" /> 
   <rdfs:domain rdf:resource="#HardwarePlatform" /> 
   <rdfs:comment>
Description: The number of bits of color or grayscale 
information per pixel, related to the number of colors or shades of 
gray the device can display. 
Type: Number  <!-- *** Datatyping implicit in comment *** -->
Resolution: Override 
Examples: "2", "8"
   </rdfs:comment> 
</rdf:Description>

and the implicitly defined instance value

<BitsPerPixel>15</BitsPerPixel>

With the datatyping proposal outlined in this document, one is now able to make those datatype assertions explicit in the CC/PP schema, and hence the application semantics transparent to the RDF layer:

<rdf:Description rdf:about="&ns-prf;BitsPerPixel">
   <rdf:type rdf:resource="&ns-rdfs;Property"/>
   <rdfs:domain rdf:resource="&ns-prf;HardwarePlatform"/>
   <rdfs:range rdf:resource='&ns-prf;Number'/>  <!-- *** NEW: Explicit Constraint *** -->
   <prf:resolutionRule rdf:resource='&ns-prf;Override'/>
   <rdfs:comment xml:lang="en">
Description:  The number of bits of color or grayscale information per
pixel, related to the number of colors or shades of gray
the device can display.
Type:         Number
Resolution:   Override
Examples:     "2", "8"
   </rdfs:comment>
</rdf:Description>

6.1.3 DAML+OIL

[...TBD...]

Before:


Jane age _:x .
_:x rdf:type xsd:integer .
_:x rdf:value "25" .

Now:


Jane age xsd:integer/"25" .

6.1.4 ???

[...suggestions for other use cases welcome...]

7. References

W3C RDF Core Working Group Charter, Mar 2001, http://www.w3.org/2001/sw/RDFCoreWGCharter

W3C RDF Primer, ??? 2002, http://www.w3.org/TR/2002/WD-rdf-primer-20020319/

W3C RDF Syntax, ??? 2002, http://www.w3.org/TR/rdf-syntax-grammar/

W3C RDF Test Cases, ??? 2002, http://www.w3.org/TR/rdf-testcases/

W3C RDF Model Theory, ??? 2002, http://www.w3.org/TR/rdf-mt/

W3C RDF Schema, ??? 2002, http://www.w3.org/2001/sw/RDFCore/Schema/20010913/

XML Schema Part 2: Datatypes, ??? 2001, http://www.w3.org/TR/xmlschema-2/

DAML+OIL..., ??? 200?, http://???

OWL..., ??? 200?, http://???

CC/PP..., ??? 200?, http://???

8. Acknowledgments

This document has benefited from the input of many members of the RDF Core Working Group. Particular thanks to Jeremy Carroll, Dan Connoly, Martyn Horner, Graham Klyne, and Frank Manola for their contributions during the development of the RDF Datatyping specification. Special thanks to Graham Klyne for his contributions to the section on RDF desiderada. Thanks to Aaron Swartz for his contribution of the Dublin Core use case. Thanks to Mark Butler for his contribution of the CC/PP use case.


RDF/XML Metadata