This document describes the "Typed Data Literal" (TDL) datatyping scheme (also known as "PD") which is one of the candidate proposals being discussed by the RDF Core Working Group.
The document has no normative status and merely provides a reference for an ongoing discussion within the working group.
Many working group members and members of RDF Interest
have helped to shape this document.
This document describes the "Typed Data Literal" (TDL) datatyping
scheme, which is one of several proposals under consideration by
the RDF Core Working Group (hereafter referred to simply as WG)
for achieving a total solution for datatyping based on the foundational
RDF Datatyping Model [RDF DT] which is itself
defined in terms of the RDF Model Theory
[RDF MT].
The TDL scheme, also known as "PDU" or "PD", is a fusion of the
idioms from two earlier schemes "P" and "D" (or "DAML") along with
the conceptual model from "U" (omitting the URV based idiom).
When type information is omitted the Model Theory for TDL captures
the ambiguous typing of the Perl programming idiom
[PL].
C.f.
The formal treatment of TDL is presented as a modification to the
RDF Model Theory [RDF MT]. Datatyping is
achieved during interpretation. Each occurrence of a literal
Unicode string may have its own node in the graph and is interpreted
according to the map(s) associated with the datatype(s) associated
by TDL with that node. The graph may be ill-formed because of
datatyping problems (e.g. "three" is not an integer). The informal
intent of TDL is to capture the normal programming paradigm that
the input syntax uses the lexical space of datatypes, and the
"meaning" is in the value space of the datatype. However, for
technical reasons (mainly that the typing in RDF MT is part of the
model rather than the interpretation), the interpretation of each
Unicode string node in the graph is given as a lexical-value pair
within the Universe, which most of the time is treated as being
the value component. As always, the intent of the Model Theory is
to capture concepts such as entailement, consistency etc. but not
to indicate an
As defined in section 2 of [RDF DT], for any
given member of a lexical space there exists a mapping to one and
only one member of the value space, referred to as the datatype
mapping. Likewise, for any given member of a canonical lexical
space there exists a mapping to one and only one member of the value
space, referred to as the canonical mapping. Because the
unique and unambigous identity of the lexical, canonical, and value
spaces are inherent in the identity of the datatype itself, by the
very definition of a datatype, we may uniquely and unambiguously
denote a specific datatype mapping or canonical mapping, and hence
a specific value, simply by the pairing of a lexical form (member
of the lexical space) with the identity of the datatype (which in
the case of RDF is a URI Reference).
[Definition:]
The pairing of a lexical form to a datatype identity is called a
typed data literal (TDL).
If the lexical form is a member of a canonical lexical space, the
TDL denotes both a lexical mapping as well as a canonical mapping.
Though, for the purpose of mapping a lexical form to a value, any
canonical mapping is superfluous and redundant as the existence of
a given canonical mapping infers the existence of a datatype mapping
having the same pair of lexical form and value members.
TDL is formalized as changes to the existing RDF Model Theory.
i.e. IEXT(rdf:value) is the identity on the universe.
IEXT is also required to be neutral with respect to the lexical space
on all other properties.
( r1, (u1,v) ) is in IEXT(r2) iff (r1, (u2, v) )
is in IEXT(r2)
So while this differs from previous of the model theory in that triples
with literals as object are interpreted with a literal-value pair as object,
such literal-value pairs are to be understood as typed data values.
A TDL may be defined in several ways in RDF, according to the
particular idiom used. This proposal outlines two such idioms for
defining TDL pairings, one for global (implicit) definitions
and one for local (explicit) definitions. Each idiom is defined
separately below.
Note: For the sake of brevity and clarity, qualified names are used
in the examples provided in this section where normally URI References
are required. The following namespace declarations are assumed in
the examples:
The rdf:value+rdf:type idiom provides a means to explicitly associate
a datatype with a literal value by the use of an anonymous node
for which the properties rdf:value and rdf:type are defined. The
property rdf:value takes the literal (lexical form) as its object
and the property rdf:type takes the URI Reference of the datatype.
The rdfs:range idiom utilizes the RDF Schema [RDF Schema] rdfs:range
property to define an implicit intersection of one or more lexical
data types, which may be used to imply or constrain the datatype(s)
of a literal.
Whether the rdfs:range statement constitutes a constraint on the
allowed datatypes depends on whether there exists any local (explicit)
type assignment. If there is no local typing for the literal value
whatsoever, then rdfs:range can only serve as a global (implicit)
type assignment. However, if the literal has one or more types
defined locally, and any locally specified datatype is not compatible
with all datatypes globally implied by rdfs:range for the property,
one can treat such a case as a contradition to a constraint on the
expected or required datatype(s) for the property in question.
It is essential that both global (implicit) and local (explicit)
idioms be able to coexist within the same knowledge base without
undesired interactions -- and in fact, this is essential if
a global idiom is to be used as a constraint on locally defined
datatypes.
The rdfs:range and rdf:value+rdf:type idioms are fully compatable
and can cohabit the same knowledgebase freely.
(@@@ will expand this into a proper discussion of how the
TDL addresses the specific items listed in the desiderada...)
The TDL datatyping scheme:
Table of Contents
1 Introduction
2 Definition
2.1 Overview
2.2 An Introduction to the Model Theory for TDL
This section gives a light-weight overview, the interested reader should
read appendix A for the full detail. XML Schema Union datatypes are omitted
from this section; see appendix B for how they are addressed.
Datatypes are viewed as in Patel-Schneider's work
[SWOL]. That
is each datatype has four components, a URI, a lexical space, a value space,
and a mapping.
An RDF interpretation is with respect to some set of datatypes, which
corresponds to the supported datatypes in an RDF implementation. xsd:string
is the only obligatory datatype, and acts as the default type.
Terminology
We modify the terminology of the Model Theory to differentiate between
literals before datatyping and literals after datatyping. The modification
is:
The Interpretation of Unicode Nodes
An interpretation maps each Unicode node to some literal-value pair, of
some datatype. We know there is always at least one such pair because xsd:string
is supported. The type information is checked by requiring this pair to
be a member of each class associated with this node (e.g. by a range constraint)
and by understanding class membership of datatype classes to refer to the
mapping of the datatype.
The Interpretation of rdf:value
Following Graham Klyne's suggestion rdf:value is simply equality.
The Interpretation of Asserted Triples
Asserted triples are interpreted with respect to the function IEXT. However,
the range of IEXT is extended to permit any pair of objects from the Universe.
IEXT is then restricted to respect rdf:value as equality and encodes
the supported datatypes.
For if d is a datatype then,
IEXT(rdf:type) contains the pair ( (unicode-string,
value), d )
if and only if (unicode-string, value) is in the
map associated with d.
i.e.
if (u1,v) and (u2,v) are two literal-value pairs in the
universe and r a resource in IR and p a property in IP-{rdf:type,rdf:value}
and both literal-pairs satisfy the range constraints on p then:
Multiple types
A literal-value pair may belong to multiple types, in which case a legal
RDF graph may show multiple type information for that literal-value pair,
using both the local or the global idioms. Sometimes the intersection of
multiple types may be surpisingly small but not empty, for example, a binary
integer type and a positive decimal integer type may have intersection
{ ("0",0), ("1",1) }; either of these two literal-values would be legal,
but a Unicode string "10" cannot be interpreted in the presence of such
conflicting type information, despite being in both lexical spaces and
despite the two value spaces being the same. (Contrast with S-B, which
permits "10" in such a case).
Unsupported Datatypes
An RDF implementation only knows some datatypes, and in particular may
not be aware of a datatype used in a particular RDF document. The Model
Theory reflects this by having an interpretation with respect to some set
of datatypes (the supported datatypes). The only obligatory datatype is
xsd:string. In practice, documents with an unsupported datatype constrain
the datatype (in that the lexical occurrences in the document must be in
the lexical space of the datatype), whereas supported datatypes constrain
the document (in that the document may be ill-formed in that the unicode
nodes are labelled with strings that are not in the domain of the relevant
datatypes). The model theory is monotone with respect to the set of supported
datatypes; meaning that implementations supporting fewer datatypes will
make correct inferences but not all inferences. (e.g. they will not infer
a contradiction when datatyping is invalid).
3 Representation of Typed Data Literals in RDF
xmlns:rdf ="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:rdfs ="http://www.w3.org/2000/01/rdf-schema#"
xmlns:xsd ="http://www.w3.org/2001/XMLSchema#"
xmlns:ex ="uuid:f82dad84-0a58-11d6-9542-0003931df47c/"
3.1 The rdf:value+rdf:type Local Idiom
3.2 The rdfs:range Global Idiom
3.3 Compatability Between Idioms
4 Satisfaction of Desiderada
References
Last Modified: $Date: 2002/01/16 10:12:30 $