RDF Datatyping

This document describes the "Typed Data Literal" (TDL) datatyping scheme (also known as "PD") which is one of the candidate proposals being discussed by the RDF Core Working Group.

Status of this Document

The document has no normative status and merely provides a reference for an ongoing discussion within the working group.

Authors

Many working group members and members of RDF Interest have helped to shape this document.

The TDL scheme, also known as "PDU" or "PD", is a fusion of the idioms from two earlier schemes "P" and "D" (or "DAML") along with the conceptual model from "U" (omitting the URV based idiom). When type information is omitted the Model Theory for TDL captures the ambiguous typing of the Perl programming idiom [PL].

The formal treatment of TDL is presented as a modification to the RDF Model Theory [RDF MT]. Datatyping is achieved during interpretation. Each occurrence of a literal Unicode string may have its own node in the graph and is interpreted according to the map(s) associated with the datatype(s) associated by TDL with that node. The graph may be ill-formed because of datatyping problems (e.g. "three" is not an integer). The informal intent of TDL is to capture the normal programming paradigm that the input syntax uses the lexical space of datatypes, and the "meaning" is in the value space of the datatype. However, for technical reasons (mainly that the typing in RDF MT is part of the model rather than the interpretation), the interpretation of each Unicode string node in the graph is given as a lexical-value pair within the Universe, which most of the time is treated as being the value component. As always, the intent of the Model Theory is to capture concepts such as entailement, consistency etc. but not to indicate an

2 Definition

As defined in section 2 of [RDF DT], for any given member of a lexical space there exists a mapping to one and only one member of the value space, referred to as the datatype mapping. Likewise, for any given member of a canonical lexical space there exists a mapping to one and only one member of the value space, referred to as the canonical mapping. Because the unique and unambigous identity of the lexical, canonical, and value spaces are inherent in the identity of the datatype itself, by the very definition of a datatype, we may uniquely and unambiguously denote a specific datatype mapping or canonical mapping, and hence a specific value, simply by the pairing of a lexical form (member of the lexical space) with the identity of the datatype (which in the case of RDF is a URI Reference).

[Definition:] The pairing of a lexical form to a datatype identity is called a typed data literal (TDL).

If the lexical form is a member of a canonical lexical space, the TDL denotes both a lexical mapping as well as a canonical mapping. Though, for the purpose of mapping a lexical form to a value, any canonical mapping is superfluous and redundant as the existence of a given canonical mapping infers the existence of a datatype mapping having the same pair of lexical form and value members.

Example

A TDL uniquely denotes a member of the value space of the datatype because there is a one-to-one correspondence between TDL pairings and datatype mappings:

Terminology

The Interpretation of Unicode Nodes

The Interpretation of rdf:value

The Interpretation of Asserted Triples

i.e. IEXT(rdf:value) is the identity on the universe.
For if d is a datatype then,
IEXT(rdf:type) contains the pair ( (unicode-string, value), d )
if and only if (unicode-string, value) is in the map associated with d.

IEXT is also required to be neutral with respect to the lexical space on all other properties.
i.e.
if (u1,v) and (u2,v) are two literal-value pairs in the universe and r a resource in IR and p a property in IP-{rdf:type,rdf:value} and both literal-pairs satisfy the range constraints on p then:

So while this differs from previous of the model theory in that triples with literals as object are interpreted with a literal-value pair as object, such literal-value pairs are to be understood as typed data values.

Multiple types

Unsupported Datatypes