The "Typed Data Literal" (TDL) Datatyping Scheme

Abstract

This document describes the "Typed Data Literal" (TDL) datatyping scheme (also known as "PD") which is one of the candidate proposals being discussed by the RDF Core Working Group.

Status of this Document

The document has no normative status and merely provides a reference for an ongoing discussion within the working group.

Authors

Many working group members and members of RDF Interest have helped to shape this document.

Table of Contents

1 Introduction

This document describes the "Typed Data Literal" (TDL) datatyping scheme, which is one of several proposals under consideration by the RDF Core Working Group (hereafter referred to simply as WG) for achieving a total solution for datatyping based on the foundational RDF Datatyping Model [RDF DT] which is itself defined in terms of the RDF Model Theory [RDF MT].

The TDL scheme, also known as "PDU" or "PD", is a fusion of the idioms from two earlier schemes "P" and "D" (or "DAML") along with the conceptual model from "U" (omitting the URV based idiom). When type information is omitted the Model Theory for TDL captures the ambiguous typing of the Perl programming idiom [PL].

C.f.

The formal treatment of TDL is presented as a modification to the RDF Model Theory [RDF MT]. Datatyping is achieved during interpretation. Each occurrence of a literal Unicode string may have its own node in the graph and is interpreted according to the map(s) associated with the datatype(s) associated by TDL with that node. The graph may be ill-formed because of datatyping problems (e.g. "three" is not an integer). The informal intent of TDL is to capture the normal programming paradigm that the input syntax uses the lexical space of datatypes, and the "meaning" is in the value space of the datatype. However, for technical reasons (mainly that the typing in RDF MT is part of the model rather than the interpretation), the interpretation of each Unicode string node in the graph is given as a lexical-value pair within the Universe, which most of the time is treated as being the value component. As always, the intent of the Model Theory is to capture concepts such as entailement, consistency etc. but not to indicate an

2 Definition

2.1 Overview

As defined in section 2 of [RDF DT], for any given member of a lexical space there exists a mapping to one and only one member of the value space, referred to as the datatype mapping. Likewise, for any given member of a canonical lexical space there exists a mapping to one and only one member of the value space, referred to as the canonical mapping. Because the unique and unambigous identity of the lexical, canonical, and value spaces are inherent in the identity of the datatype itself, by the very definition of a datatype, we may uniquely and unambiguously denote a specific datatype mapping or canonical mapping, and hence a specific value, simply by the pairing of a lexical form (member of the lexical space) with the identity of the datatype (which in the case of RDF is a URI Reference).

[Definition:]   The pairing of a lexical form to a datatype identity is called a typed data literal (TDL).

If the lexical form is a member of a canonical lexical space, the TDL denotes both a lexical mapping as well as a canonical mapping. Though, for the purpose of mapping a lexical form to a value, any canonical mapping is superfluous and redundant as the existence of a given canonical mapping infers the existence of a datatype mapping having the same pair of lexical form and value members.

Example
A TDL uniquely denotes a member of the value space of the datatype because there is a one-to-one correspondence between TDL pairings and datatype mappings:

2.2 An Introduction to the Model Theory for TDL

TDL is formalized as changes to the existing RDF Model Theory.
This section gives a light-weight overview, the interested reader should read appendix A for the full detail. XML Schema Union datatypes are omitted from this section; see appendix B for how they are addressed.
Datatypes are viewed as in Patel-Schneider's work
[SWOL]. That is each datatype has four components, a URI, a lexical space, a value space, and a mapping.
An RDF interpretation is with respect to some set of datatypes, which corresponds to the supported datatypes in an RDF implementation. xsd:string is the only obligatory datatype, and acts as the default type.

Terminology

We modify the terminology of the Model Theory to differentiate between literals before datatyping and literals after datatyping. The modification is:

The Interpretation of Unicode Nodes

An interpretation maps each Unicode node to some literal-value pair, of some datatype. We know there is always at least one such pair because xsd:string is supported. The type information is checked by requiring this pair to be a member of each class associated with this node (e.g. by a range constraint) and by understanding class membership of datatype classes to refer to the mapping of the datatype.

The Interpretation of rdf:value

Following Graham Klyne's suggestion rdf:value is simply equality.

The Interpretation of Asserted Triples

Asserted triples are interpreted with respect to the function IEXT. However, the range of IEXT is extended to permit any pair of objects from the Universe.
IEXT is then restricted to respect rdf:value as equality and encodes the supported datatypes.

i.e. IEXT(rdf:value) is the identity on the universe.
For if d is a datatype then,
    IEXT(rdf:type) contains the pair ( (unicode-string, value), d )
    if and only if (unicode-string, value) is in the map associated with d.

IEXT is also required to be neutral with respect to the lexical space on all other properties.
i.e.
   if (u1,v) and (u2,v) are two literal-value pairs in the universe and r a resource in IR and p a property in IP-{rdf:type,rdf:value} and both literal-pairs satisfy the range constraints on p then:

   ( r1, (u1,v) ) is in IEXT(r2) iff  (r1, (u2, v) ) is in IEXT(r2)

So while this differs from previous of the model theory in that triples with literals as object are interpreted with a literal-value pair as object, such literal-value pairs are to be understood as typed data values.

Multiple types

A literal-value pair may belong to multiple types, in which case a legal RDF graph may show multiple type information for that literal-value pair, using both the local or the global idioms. Sometimes the intersection of multiple types may be surpisingly small but not empty, for example, a binary integer type and a positive decimal integer type may have intersection { ("0",0), ("1",1) }; either of these two literal-values would be legal, but a Unicode string "10" cannot be interpreted in the presence of such conflicting type information, despite being in both lexical spaces and despite the two value spaces being the same. (Contrast with S-B, which permits "10" in such a case).

Unsupported Datatypes

An RDF implementation only knows some datatypes, and in particular may not be aware of a datatype used in a particular RDF document. The Model Theory reflects this by having an interpretation with respect to some set of datatypes (the supported datatypes). The only obligatory datatype is xsd:string. In practice, documents with an unsupported datatype constrain the datatype (in that the lexical occurrences in the document must be in the lexical space of the datatype), whereas supported datatypes constrain the document (in that the document may be ill-formed in that the unicode nodes are labelled with strings that are not in the domain of the relevant datatypes). The model theory is monotone with respect to the set of supported datatypes; meaning that implementations supporting fewer datatypes will make correct inferences but not all inferences. (e.g. they will not infer a contradiction when datatyping is invalid).

3 Representation of Typed Data Literals in RDF

A TDL may be defined in several ways in RDF, according to the particular idiom used. This proposal outlines two such idioms for defining TDL pairings, one for global (implicit) definitions and one for local (explicit) definitions. Each idiom is defined separately below.

Note: For the sake of brevity and clarity, qualified names are used in the examples provided in this section where normally URI References are required. The following namespace declarations are assumed in the examples:

   xmlns:rdf  ="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
   xmlns:rdfs ="http://www.w3.org/2000/01/rdf-schema#"
   xmlns:xsd  ="http://www.w3.org/2001/XMLSchema#"
   xmlns:ex   ="uuid:f82dad84-0a58-11d6-9542-0003931df47c/"

3.1 The rdf:value+rdf:type Local Idiom

The rdf:value+rdf:type idiom provides a means to explicitly associate a datatype with a literal value by the use of an anonymous node for which the properties rdf:value and rdf:type are defined. The property rdf:value takes the literal (lexical form) as its object and the property rdf:type takes the URI Reference of the datatype.

Example
Per the statements below, the lexical form "30" is explicitly declared to be a member of the lexical space of the datatype 'xsd:integer':

3.2 The rdfs:range Global Idiom

The rdfs:range idiom utilizes the RDF Schema [RDF Schema] rdfs:range property to define an implicit intersection of one or more lexical data types, which may be used to imply or constrain the datatype(s) of a literal.

Example
Per the following RDF statements, the lexical form "30" is implied (or required) to be a member of the lexical space of the datatype 'xsd:integer':

Whether the rdfs:range statement constitutes a constraint on the allowed datatypes depends on whether there exists any local (explicit) type assignment. If there is no local typing for the literal value whatsoever, then rdfs:range can only serve as a global (implicit) type assignment. However, if the literal has one or more types defined locally, and any locally specified datatype is not compatible with all datatypes globally implied by rdfs:range for the property, one can treat such a case as a contradition to a constraint on the expected or required datatype(s) for the property in question.

3.3 Compatability Between Idioms

It is essential that both global (implicit) and local (explicit) idioms be able to coexist within the same knowledge base without undesired interactions -- and in fact, this is essential if a global idiom is to be used as a constraint on locally defined datatypes. The rdfs:range and rdf:value+rdf:type idioms are fully compatable and can cohabit the same knowledgebase freely.

Example
Cohabitation of global and local idioms:

4 Satisfaction of Desiderada

(@@@ will expand this into a proper discussion of how the TDL addresses the specific items listed in the desiderada...)

The TDL datatyping scheme:

References

[SWOL]
Peter Patel-Schneider. The Semantic Web Ontology Language (SWOL) Available at: http://lists.w3.org/Archives/Public/www-webont-wg/2001Dec/att-0156/01-swol2. text
[PL]
Dan Connoly. PL: how a PERL programmer might do datatypes in RDF. Available at: http://lists.w3.org/Archives/Public/w3c-rdfcore-wg/2001Dec/0003.html
[RDF Core WG Charter]
W3C RDF Core Working Group Charter. Mar 2001. Available at: http://www.w3.org/2001/sw/RDFCoreWGCharter
[RDF MT]
W3C RDF Model Theory Working Draft. Jan 2002. Available at: http://lists.w3.org/Archives/Public/www-archive/2002Jan/att-0007/01-RDF_Model_Theory.htm
[RDF DT]
W3C RDF Datatyping Working Draft. Sep 2001. Available at: http://www-nrc.nokia.com/sw/RDF_DT_Foundation.html
[RDF Schema]
W3C RDF Schema Recommendation. ? 200?. Available at: http://www.w3.org/TR/2000/CR-rdf-schema-20000327/
[XSD]
World Wide Web Consortium. XML Schema Part 2: Datatypes. Available at: http://www.w3.org/TR/xmlschema-2/

Last Modified: $Date: 2002/01/16 10:12:30 $