RDF Datatyping

not quite yet a
W3C Working Draft 19 August 2002

This version:
http://www-nrc.nokia.com/sw/rdf-datatyping.html [Last Modified: 19 August 2002]
Latest version:
http://www-nrc.nokia.com/sw/rdf-datatyping.html
Previous version:
None.
Editors:
Pat Hayes, University of West Florida, phayes@ai.uwf.edu
Sergey Melnik, Stanford University, melnik@db.stanford.edu
Patrick Stickler, Nokia, patrick.stickler@nokia.com

Abstract

The Resource Description Framework (RDF) is a general-purpose language for representing information in the World Wide Web. RDF provides a common framework for expressing this information in such a way that it can be exchanged between applications without loss of meaning. The utility and reliability of information exchanged between applications typically requires that datatyping information be unambiguous and that the interpretation of datatyped values, which may have local representations that differ from system to system, be consistent between disparate applications. Achieving consistency in the exchange and interpretation of such datatyped information requires a well defined and standardized methodology for expressing and interpreting datatyping information. This document defines a particular methodology for expressing datatyped information in RDF and aims to provide the reader the basic fundamentals required to effectively use datatypes and datatyped values with RDF in their particular applications.

Status of this Document

This is a [editors edition, pre-publication] W3C RDF Core Working Group Working Draft produced as part of the W3C Semantic Web Activity. This document incorporates decisions made by the Working Group designed to provide the reader the basic fundamentals required to effectively use datatyping with RDF in their particular applications.

This document is being released for review by W3C members and other interested parties to encourage feedback and comments. This is the current state of an ongoing work on the RDF datatyping specification.

This is a draft document and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use it as reference material or to cite as other than "work in progress". A list of current W3C Recommendations and other technical documents can be found at http://www.w3.org/TR/.

Comments on this document are invited and should be sent to the public mailing list www-rdf-comments@w3.org. An archive of comments is available at http://lists.w3.org/Archives/Public/www-rdf-comments/.

Table of Contents

...


1. Introduction

The Resource Description Framework (RDF) is a general-purpose language for representing information in the World Wide Web. It is particularly intended for representing metadata about Web resources, such as the title, author, and modification date of a Web page, the copyright and syndication information about a Web document, the availability schedule for some shared resource, or the description of a Web user's preferences for information delivery. However, by generalizing the concept of a "Web resource", RDF can be used to represent information about anything that can be identified on the Web, such as information about items available from online shopping facilities (e.g., information about prices, publishers, and availability of books or recordings).

RDF provides a common framework for expressing this information in such a way that it can be exchanged between applications without loss of meaning. Since it is a common framework, application designers can leverage the availability of common RDF parsers and processing tools. Exchanging information between different applications means that the information may be made available to applications other than those for which it was originally created.

The utility and reliability of information exchanged between applications typically requires that datatyping information be unambiguous and that the interpretation of datatyped values, which may have local representations that differ from system to system, be consistent between disparate applications. Achieving consistency in the exchange and interpretation of such datatyped information requires a well defined and standardized methodology for expressing and interpreting datatyping information.

This document defines a particular methodology for expressing datatyped information in RDF and aims to provide the reader the basic fundamentals required to effectively use datatypes and datatyped values with RDF in their particular applications.

1.1 What is Datatyping?

[informal definition of datatyping]

[common datatyping scenarios, where datatyping is needed]

Due to RDF's role as a means of interchange between disparate systems, and in order to achieve portability and independence of platform it is necessary to forgoe any native representation of values or native datatypes in RDF itself. This means that RDF has no built-in knowlede about particular datatypes such as strings or integers, and the lexical representation of a given value, such as the number twenty-five "25", has no native interpretation in RDF. RDF is datatype neutral in the same manner as it is vocabulary neutral. The specific semantics for individual datatypes must reside in the application layers above RDF.

In RDF Datatyping, literals are taken to represent the lexical representations (lexical forms) of datatype values and their datatype interpretation is based on an association of the literal with a particular datatype. The literal node in the graph denotes the datatype value which it represents.

The nature of datatypes, the means by which literals are associated with datatypes, and the interpretation of typed literals are the focus of this document.

1.2 Desiderata for RDF Datatyping

[introductory verbage about desiderada]

The following list summarizes the specific desiderada that were taken into account during the development of this specification:

It is believed that the methodology for datatyping described in this specification satisfies all of the above desiderada.

1.3 Related Documents

The complete specification of RDF consists of a number of documents:

This document is intended to augment the other parts of the RDF specification, to help information producers, system designers and application developers understand how datatypes and datatyping can be used with RDF.

1.4 Comments on the Examples

Each example is represented in three forms:

  1. its RDF/XML representation
  2. its NTriples representation
  3. a graph illustration

For the sake of brevity and clarity, XML entities (e.g. &rdf;) are used in the XML examples provided in this specification where URI References occur as attribute values. In addition, local and qualified names are used as node and arc labels in the NTriples and graph illustrations, even though the actual NTriples and graph nodes will contain complete URI References as labels.

The following RDF/XML 'wrapper' should be assumed for all RDF examples used in this specification:

<?xml version="1.0"?>
<!DOCTYPE rdf:RDF [
  <!ENTITY rdf  "http://www.w3.org/1999/02/22-rdf-syntax-ns#">
  <!ENTITY rdfs "http://www.w3.org/2000/01/rdf-schema#">
  <!ENTITY xsd  "http://www.w3.org/2001/XMLSchema#">
  <!ENTITY ex   "http://www.w3.org/2002/rdf-datatyping/examples#">
]>

<rdf:RDF xmlns:rdf  ="&rdf;"
         xmlns:rdfs ="&rdfs;"
         xmlns:xsd  ="&xsd;"
         xmlns:ex   ="&ex;">

   <!-- example -->

</rdf:RDF>

1.5 Comments on the Structure of RDF Literals

[RDF literals are structured objects consisting of a string, which is optionally qualified as XML content (rdf:parseType equal to "Literal") and/or having an associated xml:lang value.

[refs to syntax/primer/etc]

[the structure of the literal is transparent with regards to RDF Datatyping and that all that is seen is the actual string portion -- the parseType bit and xml:lang (if present) are irrelevant to RDF Datatyping and the specification pretends that they don't exist]

[this treatment is in-line with XML Schema's views on xml:lang as well, which actually forbids datatype values from being qualified by xml:lang. RDF Datatyping allows it, but ignores it]

2. RDF Datatypes

The conceptual framework for RDF datatyping presented in this specification is compatable with the type system defined by XML Schema for both simple and complex datatypes. It also can be used with any datatyping framework which conforms to the characteristics of datatypes as defined below.

2.1 rdfs:Datatype

RDF Datatyping defines an rdfs:Datatype as consisting of

  1. a set of distinct values, called its value space
  2. a set of lexical representations or forms, called its lexical space
  3. an N:1 mapping from the lexical space to the value space, called its datatype mapping

2.2 Datatype Mapping

A datatype mapping is a set of pairs whose first element belongs to the lexical space of the datatype, and the second element belongs to the value space of the datatype.

A datatype mapping satisfies the following properties:

For example, the datatype mapping for the XML Schema datatype 'xsd:boolean', where each member of the value space (represented here as 'T' and 'F') has two lexical representations, is as follows:

Value Space {T, F}
Lexical Space {"0", "1", "true", "false"}
Datatype Mapping {<"true", T>, <"1", T>, <"0", F>, <"false", F>}

For an XML Schema complex datatype, its value space is the set of all valid infosets licensed by its content model and its datatype mapping is the mapping from each XML serialization to its corresponding infoset. Two XML serializations which correspond to the same infoset are considered synonymous lexical forms, just as both "5" and "0005" are synonymous lexical forms representing the same xsd:integer value five

2.3 Typed Literal

A typed literal is a pair where the first element is a URI Reference denoting a datatype and the second element is a lexical form (literal). Following from the nature of datatypes as defined above, this pairing of datatype and lexical form unambiguously identifies a specific member of a datatype mapping and hence a specific member of the value space of the datatype.

A typed literal can be considered a "literal-in-context" where the datatype provides the context for interpretation of the lexical form (literal) to obtain an actual value.

For example, the typed literals which can be defined for the XML Schema datatype 'xsd:boolean' are as follows:

Typed Literal Member of Datatype Mapping
Denoted by Typed Literal
Member of Value Space
Denoted by Typed Literal
<xsd:boolean, "true"> <"true", T> T
<xsd:boolean, "1"> <"1", T> T
<xsd:boolean, "false"> <"false", F> F
<xsd:boolean, "0"> <"0", F> F

RDF datatyping is primarily concerned with the implicit or explicit designation of typed literal pairings. RDF datatyping only provides for the designation of typed literals. The internal structure and semantics of all datatypes are opaque to RDF; i.e. membership of value and lexical spaces, datatype mappings, etc. have neither representation nor interpretation in RDF. Actual interpretation of typed literals (determination of the actual value denoted by the typed literal) is performed externally to RDF by applications which have sufficient knowledge of the particular datatypes in question. RDF datatyping only provides the datatype context within which such interpretation is to take place.

3. Designation of Typed Literals in RDF

A typed literal may be designated in one of two ways in RDF, either locally (explicitly) or globally (implicitly).

3.1 Local Datatyping

Local datatyping associates a datatype with each individual property value explicitly by means of a typed literal node. Thus


<rdf:Description rdf:about="#John">
   <ex:age rdf:type="&xsd;integer">25</ex:age>
</rdf:Description>

John ex:age xsd:integer"25" .
RDF Graph

says that John's age is the member of the value space of xsd:integer which is represented by the lexical form "25". And from what we know about the datatype xsd:integer, we then know that John's age is the value twenty-five.

A typed literal node is valid when the literal is a member of the lexical space of the datatype, in which case the typed literal node is interpreted as denoting the member of the value space of the datatype represented by that lexical form. Thus


<rdf:Description rdf:about="#John">
   <ex:age rdf:type="&xsd:integer">pumpkin</ex:age>
</rdf:Description>

John ex:age xsd:integer"pumpkin" .
RDF Graph

would always be invalid, no matter what value is assigned to the typed literal node, as "pumpkin" is not a member of the lexical space of xsd:integer.

It is important to note that RDF cannot itself make such a determination of datatyping validity, but such validation can only be performed by an external application with sufficient knowledge about the particular datatype in question. RDF merely provides means for the designation of the typed literal pairings upon which such validation would be performed.

Local datatyping works in the same way for XML literals. An XML literal which represents an instance of the vCard:n complex element type can be typed locally as follows:


<rdf:Description rdf:about="#John">
   <ex:name rdf:parseType="Literal" rdf:type="&vCard;n">
      <n xmlns="&vCard">
         <family>Doe</family>
         <given>John</given>
      </n>
   </ex:name>
</rdf:Description>

John ex:name vCard:n'<n xmlns="&vCard"><family>Doe</family><given>John<<given></n>' .

[Note: I am using single quotes here in the NTriples to distinguish XML literals from non-XML literals, the latter being delimited by double quotes. This allows for the same concatenative methods for constructing typed literal nodes to be used without introducing ambiguity (e.g. vCard:nXML"<n>...</n>"). I.e.


   non-XML literal                                                "25"
   non-XML literal with lang                                      "25"en
   URIref typed non-XML literal               <http://...#integer>"25"
   URIref typed non-XML literal with lang     <http://...#integer>"25"en
   qname typed non-XML literal                         xsd:integer"25"
   qname typed non-XML literal with lang               xsd:integer"25"en

   XML literal                                          '<h1>Foo</h1>'
   XML literal with lang                                '<h1>Foo</h1>'en
   URIref typed XML literal              <http://...#h1>'<h1>Foo</h1>'
   URIref typed XML literal with lang    <http://...#h1>'<h1>Foo</h1>'en
   qname typed XML literal                      xhtml:h1'<h1>Foo</h1>'
   qname typed XML literal with lang            xhtml:h1'<h1>Foo</h1>'en

This is only a proposed syntax. There are of course other options...]

RDF Graph

3.2 Global Datatyping

Global datatyping leaves the datatype of the property value implicit (in that the datatype of the property value itself is not individually specified) and relies on the datatype context to be defined for the property value elsewhere in the graph, by associating the datatype with the property rather than the property value.

It is often convenient to associate a datatype with a property, so that every use of the property can be understood as asserting particular datatyping characteristics about its value.

RDF Datatyping employs rdfs:range to associate a datatype with a particular property. The associated datatype serves to constrain (by only providing valid interpretations for) all values of the property to correspond to members of the value space of the designated datatype, and (according to the characteristics of RDF datatypes) also constrains all lexical forms to members of the lexical space of the datatype.

In cases where no datatype is asserted for an occurrence of a given literal, a datatype range defined for the property provides the datatype from which the typed literal pairing is derived.

For example, we may wish to constrain the property ex:age so that its use and interpretation is bound to integer values as defined by the datatype xsd:integer, and given that fixed interpretation, the datatype need not be specified for each property value, but may be left implicit, defined globally for the property itself:


<rdf:Description rdf:about="&ex;age">
   <rdfs:range rdf:resource="&xsd;integer"/>
</rdf:Description>

<rdf:Description rdf:about="#Jane">
   <ex:age>25</ex:age>
</rdf:Description>

ex:age rdfs:range xsd:integer .
Jane ex:age "25" .
RDF Graph

Thus, the datatype context within which "25" is interpreted is xsd:integer, and "25" is required to be a valid member of the lexical space of xsd:integer and the literal node is interpreted as denoting the integer value twenty-five. The rdfs:range assertion and the literal node together constitute the typed literal pairing <xsd:integer,"25"> which unambiguously denotes the number twenty-five.

The rdfs:range assertion both provides information necessary for the proper interpretation of the implicit idiom as well as (indirectly) constrains the valid set of literals to the lexical space of the specified datatype.

This last point is illustrated by


<rdf:Description rdf:about="&ex;age">
   <rdfs:range rdf:resource="&xsd;integer"/>
</rdf:Description>

<rdf:Description rdf:about="#Jane">
   <ex:age>Mid-Twenties</ex:age>
</rdf:Description>

ex:age rdfs:range xsd:integer .
Jane ex:age "Mid-Twenties" .
RDF Graph

which constitutes a datatype violation, because the datatype context asserted by rdfs:range restricts the set of valid property values to the value space of the particular datatype, and the literal "Mid-Twenties" is not a member of the lexical space of xsd:integer and thus does not represent any member of its value space.

It is important to point out that only an extra-RDF application with complete knowledge about the datatype in question would be able to detect such a datatype violation. Datatypes are fully opaque to RDF and neither RDF nor RDF Schema provide generic means for datatype validation. RDF Datatyping provides mechanisms for the expression of typed literal pairings by specific representations which have a well defined representation and interpretation, but cannot determine the validity of individual pairings directly. This is primarily due to RDF's role as a means of interchange between disparate systems, and in order to achieve portability and independence of platform it is necessary to forgoe any native representation of values or native datatypes in RDF itself. RDF is datatype neutral in the same manner as it is vocabulary neutral. The specific semantics for individual datatypes must reside in the application layers above RDF.

3.2.1 Under-Specified Datatyping

In the case of a non-typed literal, where no datatype range is specified for the property, the meaning of the literal node (what that literal node denotes) is under-specified. It denotes some datatype value which has a lexical representation corresponding to the literal string, but in the absence of any knowledge of which datatype context constrains its interpretation, we cannot know which datatype value it denotes. This is similar to the case of a blank node, where although one knows that it denotes "something", one does not know what that something is.

[Need example...]

3.2.2 Datatype Clashes

The datatype interpretations imposed on a property by rdfs:range apply to any such usage of the property anywhere in the RDF graph, so an rdfs:range assertion has a global scope, and therefore needs to be used with care. For example, if both global and local datatyping is employed for the same property, then a globally asserted datatype can produce a conflict with an incompatable, locally asserted datatype:


<rdf:Description rdf:about="&ex;age">
   <rdfs:range rdf:resource="&xsd;integer"/>
</rdf:Description>

<rdf:Description rdf:about="#Judy">
   <ex:age>25</ex:age>
</rdf:Description>

<rdf:Description rdf:about="#Jane">
   <ex:age rdf:type="&xsd;string">Mid-Twenties</ex:age>
</rdf:Description>

ex:age rdfs:range xsd:integer .
Judy ex:age "25" .
Jane ex:age xsd:string"Mid-Twenties" .
RDF Graph

Here, the global datatype xsd:integer is asserted for all uses of the property ex:age, and while the value for Jane's age satisfies the constraints of the xsd:integer datatype, there is a conflict with the definition of Judy's age in that while the local datatyping context of xsd:string is valid, the lexical form "Mid-Twenties" conflicts with the globally asserted datatype context for xsd:integer. Thus, care must be taken when asserting global datatype contexts to ensure that such clashes do not arise, or to at least be aware of the potential for such datatype clashes.

Another source of datatype clash is when merging two graphs which have differing global assertions regarding the datatype contexts of a given property. Thus, given

From graph 1:

<rdf:Description rdf:about="&ex;age">
   <rdfs:range rdf:resource="&xsd;integer"/>
</rdf:Description>

From graph 2:

<rdf:Description rdf:about="&ex;age">
   <rdfs:range rdf:resource="&xsd;duration"/>
</rdf:Description>

From graph 1:

ex:age rdfs:range xsd:integer .

From graph 2:

ex:age rdfs:range xsd:duration .
RDF Graph

if the lexical spaces of the datatypes are disjunct, or only partially intersect, then some or all of the possible lexical forms will fail to satisfy the constraints of at least one of the datatypes specified. Even if the different datatypes have identicial lexical spaces, there is no garuntee that they will share the same lexical to value mappings and thus erroneous interpretations could arise. Thus, care should be taken when merging graphs containing implicit idioms and having different, and possibly incompatible, global rdfs:range assertions.

4. RDF Datatyping Model Theory

The RDF Model Theory explains the fundamental model-theoretic concepts like interpretation, universe, extension etc. used for interpreting the semantics of RDF graphs. This section assumes familiarity with these basic concepts.

Suppose I is an RDF interpretation of a graph E. Then I is datatyped (with respect to a set D of datatypes) if the following is true for any datatype URI Reference ddd (with I(ddd) in D):

(1) ICEXT(I(ddd)) = {x : <x,y> in IEXT(I(ddd))} i.e. the value space of the datatype.

(2) For any literal "LLL", if E contains either the triples

   ddd rdf:type rdfs:Datatype .
   bbb aaa ddd"LLL" .

or the triples

   aaa rdfs:range ddd .
   ddd rdf:type rdfs:Datatype .
   bbb aaa "LLL" .

then I(bbb) = L2V(I(ddd))("LLL")

5. RDF Schema for Datatyping

The following RDF Schema defines the class rdfs:Datatype.

<?xml version="1.0"?>
<!DOCTYPE rdf:RDF [
  <!ENTITY rdf  "http://www.w3.org/1999/02/22-rdf-syntax-ns#">
  <!ENTITY rdfs "http://www.w3.org/2000/01/rdf-schema#">
]>

<rdf:RDF xmlns:rdf="&rdf;"
         xmlns:rdfs="&rdfs;">

   <rdfs:Class rdf:about="&rdfs;Datatype">
      <rdfs:label xml:lang="en">RDF Datatype (Property)</rdfs:label>
      <rdfs:comment xml:lang="en">
         An RDF Datatype consists of a value space, a lexical space,
         and an N:1 mapping from the lexical space to the value space. 
      </rdfs:comment>
      <rdfs:subClassOf rdf:resource="&rdf;Property"/>
   </rdfs:Class>

</rdf:RDF>

6. Appendices

The following appendices are non-normative...

6.1 Use Cases

[provide examples of how RDF Datatyping is expected to be applied in various application contexts]

6.1.1 Dublin Core

[original examples provided by Aaron, edited by Patrick]

Examples in the "Encoding Schemes" section of the Dublin Core in
RDF Draft[1] converted to the new datatyping proposal (need to
be normalized, with expanded verbage, etc):

[1] http://logicerror.com/dcrdfDraft

*** EXAMPLE 1 ***
_:page dc:subject  _:a .
_:a    rdf:type    dct:MESH .
_:a    rdf:value  "D08.586.682.075.400" .
_:a    rdfs:label "Formate Dehydrogenase" .

becomes

_:page dc:subject  dct:MESH"D08.586.682.075.400" .
dct:MESH"D08.586.682.075.400" rdfs:label "Formate Dehydrogenase" .


[Question: should we allow typed literal nodes to act as subjects? They
are unambiguous, globally consistent, just like URIrefs. And the above
example shows the utility of being able to talk about these datatype
values, particularly members of enumerations. Another example is

   xsd:lang"en" rdfs:label "English" .

Eh?]


*** EXAMPLE 2 ***
_:page dc:language _:a .
_:a    rdf:type    dct:RFC1766 .
_:a    rdf:value  "EN" .
_:a    rdfs:label "English" .

becomes

_:page dc:language dct:RFC1766"EN" .
dct:RFC1766"EN" rdfs:label "English" .

*** EXAMPLE 3 ***
_:page dc:coverage _:a .
_:a    rdf:type    dct:Point .
_:a    rdf:value   _:b .
_:b    rdf:type    dct:DCSV .
_:b    rdf:value   "name=Perth, W.A.; east=115.85717; north=-31.95301" .

becomes

_:page  dc:coverage dct:DCSV"name=Perth, W.A.; east=115.85717; north=-31.95301" .
dct:DCSV rdfs:subClassOf dct:Point . 

6.1.2 CC/PP

[Example provided by Mark Butler, chair of CC/PP WG]

At present, the CC/PP schema does not explicitly define datatyping constraints for properties (since to date, RDF has not provided a mechanism for doing so) but does constrain each property to a particular datatype, which is specified in the comments. All property values are inlined, with no explicit local typing. Thus, at present, we have

<rdf:Description ID="BitsPerPixel">
   <rdf:type rdf:resource="http://www.w3.org/TR/PR-rdf-schema#Property" /> 
   <rdfs:domain rdf:resource="#HardwarePlatform" /> 
   <rdfs:comment>
Description: The number of bits of color or grayscale 
information per pixel, related to the number of colors or shades of 
gray the device can display. 
Type: Number  <!-- *** Datatyping implicit in comment *** -->
Resolution: Override 
Examples: "2", "8"
   </rdfs:comment> 
</rdf:Description>

and the implicitly defined instance value

<BitsPerPixel>15</BitsPerPixel>

With the datatyping proposal outlined in this document, one is now able to make those datatype assertions explicit in the CC/PP schema, and hence the application semantics transparent to the RDF layer:

<rdf:Description rdf:about="&ns-prf;BitsPerPixel">
   <rdf:type rdf:resource="&ns-rdfs;Property"/>
   <rdfs:domain rdf:resource="&ns-prf;HardwarePlatform"/>
   <rdfs:range rdf:resource='&ns-prf;Number'/>  <!-- *** NEW: Explicit Constraint *** -->
   <prf:resolutionRule rdf:resource='&ns-prf;Override'/>
   <rdfs:comment xml:lang="en">
Description:  The number of bits of color or grayscale information per
pixel, related to the number of colors or shades of gray
the device can display.
Type:         Number
Resolution:   Override
Examples:     "2", "8"
   </rdfs:comment>
</rdf:Description>

6.1.3 DAML+OIL

[...TBD...]

6.1.4 ???

[...suggestions for other use cases welcome...]

7. References

W3C RDF Core Working Group Charter, Mar 2001, http://www.w3.org/2001/sw/RDFCoreWGCharter

W3C RDF Primer, ??? 2002, http://www.w3.org/TR/2002/WD-rdf-primer-20020319/

W3C RDF Syntax, ??? 2002, http://www.w3.org/TR/rdf-syntax-grammar/

W3C RDF Test Cases, ??? 2002, http://www.w3.org/TR/rdf-testcases/

W3C RDF Model Theory, ??? 2002, http://www.w3.org/TR/rdf-mt/

W3C RDF Schema, ??? 2002, http://www.w3.org/2001/sw/RDFCore/Schema/20010913/

XML Schema Part 2: Datatypes, ??? 2001, http://www.w3.org/TR/xmlschema-2/

DAML+OIL..., ??? 200?, http://???

OWL..., ??? 200?, http://???

CC/PP..., ??? 200?, http://???

8. Acknowledgments

This document has benefited from the input of many members of the RDF Core Working Group. Particular thanks to Jeremy Carroll, Dan Connoly, Martyn Horner, Graham Klyne, and Frank Manola for their contributions during the development of the RDF Datatyping specification. Special thanks to Graham Klyne for his contributions to the section on RDF desiderada. Thanks to Aaron Swartz for his contribution of the Dublin Core use case. Thanks to Mark Butler for his contribution of the CC/PP use case.


RDF/XML Metadata