RDF Datatyping - TDL "Convergence" Proposal

Version

Version 4, February 9, 2002.

Abstract

This document attempts to summarize the common understanding of the RDF Core Working Group (further referred to as WG) with regards to the theoretical foundation for datatyping of literal values and serves as a basis of definition, discussion, and comparison of all proposed schemes for achieving a complete datatyping solution which are to be considered by the WG.

It describes a revised definition of the "Typed Data Literal" (TDL) datatyping scheme, adopting and incorporating certain aspects from the S datatyping proposal, including the re-annexation of Graham's Desiderada document and the Foundational material from the Sergey's S proposal document, which aims to reflect a convergence of views and opinions of the RDF Core WG with regards to datatyping, and which may replace all other proposals -- providing a single point of focus and discussion which ultimately should lead to the final datatyping solution.

Status of this Document

The document has no normative status and merely provides a reference for an ongoing discussion within the working group.

Editor

Patrick Stickler

Contributors

This document includes contributions of almost all members of the WG, in particular those provided by

Table of Contents

 
 

1 Desiderata for RDF Datatyping

  1. Backward compatibility

  2. Ability to use built-in primitive XML Schema datatypes

  3. Ability to use non-XML-Schema datatypes

  4. Ability to define datatypes using schema languages rather than relying on "built-in" data types.

  5. Ability to represent type information without an associated RDF schema

  6. Ability to reference type information in an associated RDF schema

  7. Co-existence of "global" and "local" typing mechanisms

  8. Provide account of datatyping scheme semantics

  9. Support for existing data typing idioms

  10. Tidy literal nodes

  11. Minimal addition, if any, to vocabulary or syntactic machinery

  12. Single URIs for denoting datatypes

  13. Single vocabulary for both global and local idioms concurrently

Notes:

1. Backward compatibility

The goal here is that existing use of RDF, RDF-handling software and RDF-based specifications will continue to be valid, and (as far as possible) produce results as intended by their authors.

2. Use of XML-schema datatypes

The datatyping proposal should provide an account of how the XML schema Built-in Primitive Datatypes [3] can be used with RDF.

The datatyping proposal should also be able to account for the use of XML schema data types derived from the built-in primitive types (i.e. all instances of anySimpleType).

No goal is currently expressed with respect to use of composite XML Schema datatypes.

(XML Schema is not intended to be used for defining/constraining RDF/XML syntax or RDF graph structures for the purposes of datatyping.)

3. Use of non-XML-Schema datatypes

The datatyping proposal should not preclude the use of non-XML-schema datatypes, such as custom or user-defined datatypes, or those from major components external to RDF, like SQL or UML datatypes.

4. Use of schema-defined datatypes

The datatyping proposal should not preclude using schema languages to define data types, rather than relying on "built-in" predefined data types. The proposal is not expected to give an account of any such schema language.

(This goal probably follows from 3.)

5. Represent type without associated RDF schema

It should be possible to include typing information into an RDF graph without depending on a (separately defined) RDF schema.

6. Reference type information in associated RDF schema

It should be possible to indirectly incorporate typing information into an RDF graph by referencing an associated RDF schema.

7. Co-existence of global and local typing mechanisms

One of the dimensions by which one can categorize datatyping proposals is by whether individual values are explicitly or implicitly typed, e.g. whether each occurrence needs to specify xsd:integer (explicit) or whether xsd:integer is specified as the rdfs:range of the property (implicit). RDF should allow users to choose either approach, and this approach is adopted in DAML+OIL. The use of implicit typing allows for compatibility with existing RDF data and much XML data. The use of both implicit and explicit typing allows for an extra check on the appropriateness of input. The use of explicit typing allows for direct control of the typing of data.

It should be possible for both forms of datatyping to coexist in the same RDF graph.

Adapted from:

(This looks rather like a restatement of goals 5, 6 above.)

8. Provide account of datatyping scheme semantics

The datatyping proposal should include a full account of data typing semantics, and how data typing interacts semantically with the other elements of RDF. This would preferably be expressed in terms of how the data typing proposal uses and/or extends the defined RDF model theory [2].

9. Support for existing data typing idioms

A number of idioms have been suggested for representing datatype information in an RDF graph. It is claimed or suggested that these are currently used in RDF.

These idioms are enumerated below using Notation-3 [4]. The descriptions below are intended to convey the graph form used, while being agnostic about issues of semantic denotation. The data typing proposals would need to provide an account of denotations used for any supported idiom.

A datatyping proposal may support any combination of these idioms, and the ex*: qualified names may be the same or different across those idioms supported. (The desirability or otherwise of different names is a matter for group consensus, not prejudged by this note.)

In presenting these idioms, it is useful to distinguish between "direct statements" and "schema statements", in recognition that there are different ways of handling schema statements:

(a) schema statements included explicitly in the same RDF document ("internal schema"),

(b) schema statements referenced in a separate RDF document ("external schema"), and

(c) schema statements implied and "understood" by the processing application ("implicit schema").

Presuming that:

We have three usage patterns that are equivalent, modulo the physical location or otherwise of the schema statements. To accommodate this, the idioms described below are presented in two parts:

  1. "direct statements" from which some meaning is directly derived, and
  2. where applicable, "schema statements" that can be separated from the direct statements to define an environment in which they can be evaluated.

In each case below, the intent is to express the idea that Jenny was born on 15 July 2001. The idioms simply illustrate a form of RDF graph that has this intended meaning, and do not attempt to say anything about the mechanisms for arriving at that meaning.

Idiom A
person:Jenny exA:birthDate _:1 .
_:1 exA:date "2001-07-15" .

(Adapted from [1])

Idiom B:
person:Jenny exB:birthDate "2001-07-15" .
exB:birthDate rdfs:range exB:date .

(Adapted from [1])

Idiom C:
person:Jenny exC:birthDate "2001-07-15" .
exC:birthDate rdfs:range exC:date .

(Adapted from http://lists.w3.org/Archives/Public/w3c-rdfcore-wg/2002Jan/0045.html)

Idiom D:
Jenny exD:birthDate _:1 .
_:1 rdf:value "2001-07-15" .
_:1 rdf:type exD:date .

(Adapted from http://lists.w3.org/Archives/Public/w3c-rdfcore-wg/2002Jan/0045.html)

(Also, a similar usage can be seen in RDFM&S [5], section 7.3.)

Idiom E:
Jenny exE:birthDate _:1 .
_:1 rdf:value "2001-07-15" .

I.e. the new bNode global idiom

(Adapted from ... link to Jeremy's posting ...)

10. Tidy Literal Nodes

... (TBD) ...

11. Minimal addition, if any, to vocabulary or syntactic machinery

... (TBD) ...

12. Single URIs for denoting datatypes

... (TBD) ...

13. Single vocabulary for both global and local idioms concurrently

... (TBD) ...

2 Foundation for RDF Datatyping

The conceptual framework for datatyping presented in this document is based on the type system defined in the "XML Schema Part 2: Datatypes" [XSD]. This section explains how the relevant terms and concepts defined in [XSD] are expressed using the model-theoretic semantics for RDF defined in the "RDF Model Theory Working Draft" [RDF MT].

2.1 Datatype mapping

[XSD] defines a datatype as a 3-tuple, consisting of a) a set of distinct values, called its value space, b) a set of lexical representations, called its lexical space, and c) a set of facets that characterize properties of the value space, individual values or lexical terms. [XSD] implicitly assumes a fourth component, which we call datatype mapping, to be part of the datatype.

[Definition:]  A datatype mapping is a set of pairs whose first element belongs to the value space of the datatype, and the second element belongs to the lexical space of the datatype. A datatype mapping satisfies the following properties:

  1. Each element of the lexical space maps to exactly one element of the value space.
  2. Each element of the value space has at least one lexical representation.

(@@@ is the second condition necessary? Should we distinguish between partial and complete datatype mappings?)

Example
Datatype mapping for a datatype "boolean". Each element of the value space has two lexical representations.
Value space: {T, F}
Lexical space: {"0", "1", "true", "false"}
Datatype mapping: {<T, "true">, <T, "1">, <F, "0">, <F, "false">}

2.2 Canonical datatype mapping

As specified in
[XSD], a canonical lexical representation is a set of elements from the lexical space of a datatype such that there is a one-to-one mapping between elements in the canonical lexical representation and elements in the value space. This mapping is referred to as canonical datatype mapping.

[Definition:]   A canonical datatype mapping is a subset of a datatype mapping that establishes a one-to-one correspondence between elements in the canonical lexical representation and elements in the value space.

Example
A canonical datatype mapping for the datatype "boolean" of previous example.
Canonical datatype mapping: {<T, "true">, <F, "false">}

2.3 Datatyping Schemes

[Definition:]  A datatyping scheme is a convention for representing and using datatypes in RDF.

A datatyping scheme describes how

are represented, either explicitly or implicitly, in RDF graphs (using one or several nodes, resources, literals, and statements), and interpreted using model-theoretic semantics.

[RDF MT] explains the fundamental model-theoretic concepts like interpretation, universe, extension etc. used for interpreting the semantics of RDF graphs. This document assumes familiarity with these basic concepts.

Note on Facets:

Specification and interpretation of datatype facets is out of scope of this document.

3 Introduction to TDL

This document describes a revised definition of the "Typed Data Literal" (TDL) datatyping scheme, adopting and incorporating certain aspects from the S datatyping proposal, which aims to reflect a convergence of views and opinions of the RDF Core WG (herafter WG) with regards to datatyping, and which may replace all other proposals; providing a single point of focus and discussion which ultimately should lead to the final datatyping solution.

Specific changes from the previous definition of TDL include:

For the purposes of distinguishing this proposal from previous versions of the TDL proposal, one may refer to this particular version as either the "convergence proposal" or "TDL version 4".

This datatyping proposal is based on the foundational RDF Datatyping Model [RDF DT] which is itself defined in terms of the RDF Model Theory [RDF MT].

The TDL scheme, formerly also known as "PDU" or "PD", is a fusion of the local/explicit typing idiom from the earlier scheme "D" (or "DAML"), plus a derivative of that idiom for global/implicit typing, along with the conceptual model from "U" (omitting the URV based local idiom) which introduces the TDL pairing of literal and datatype as fully denoting a data value. In addition, several properties of the S proposal have been adopted and integrated into this reformulation of the TDL datatyping scheme.

When type information is omitted the Model Theory for TDL captures the ambiguous typing of the Perl programming idiom [PL]. See the discussion in section 3.3 regarding untyped literals.

C.f.

4 Informal Definition of TDL

This section provides an informal definition of the TDL datatyping proposal. A formal, model theoretic definition is provided in section 4.

As defined in section 2 of [RDF DT], for any given member of a lexical space there exists a mapping to one and only one member of the value space, referred to as the datatype mapping. Likewise, for any given member of a canonical lexical space there exists a mapping to one and only one member of the value space, referred to as the canonical mapping. Because the unique and unambigous identity of the lexical, canonical, and value spaces are inherent in the identity of the datatype itself, by the very definition of a datatype, we may uniquely and unambiguously denote a specific datatype mapping or canonical mapping, and hence a specific value, simply by the pairing of a lexical form (member of the lexical space) with the identity of the datatype (which in the case of RDF is a URI Reference).

[Definition:]   The pairing of a lexical form to a datatype identity is called a typed data literal (TDL).

At TDL is a "literal-in-context" which identifies a single value in the value space of the datatype.

If the lexical form is a member of a canonical lexical space, the TDL denotes both a lexical mapping as well as a canonical mapping. Though, for the purpose of mapping a lexical form to a value, any canonical mapping is superfluous and redundant as the existence of a given canonical mapping infers the existence of a datatype mapping having the same pair of lexical form and value members.

Example
A TDL uniquely denotes a member of the value space of the datatype because there is a one-to-one correspondence between TDL pairings and datatype mappings:

5 Representation of Typed Data Literals in RDF

A TDL may be expressed in several ways in RDF, according to the particular idiom used. This proposal outlines two such idioms for defining TDL pairings, one for global (implicit) definitions and one for local (explicit) definitions. Each idiom is defined separately below.

Note: For the sake of brevity and clarity, qualified names are used in the examples provided in this section where normally URI References are required. The following namespace declarations are assumed in the examples:


   xmlns:rdf  ="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
   xmlns:rdfs ="http://www.w3.org/2000/01/rdf-schema#"
   xmlns:xsd  ="http://www.w3.org/2001/XMLSchema#"
   xmlns:ex   ="uuid:f82dad84-0a58-11d6-9542-0003931df47c/"

5.1 The Local/Explicit Idiom

The local idiom provides a means to explicitly associate a datatype with a lexical form by the use of an anonymous node for which the properties rdf:value and rdf:type are defined. The property rdf:value takes the literal (lexical form) as its object and the property rdf:type takes the URI Reference of the datatype.

Example
Per the statements below, the literal "30" is explicitly declared to be a member of the lexical space of the datatype 'xsd:integer', and the TDL pairing of that literal and datatype denotes the integer value 'thirty' (30):

Note that the anonymous node can be considered to represent the data value identified by the TDL pairing, in this case, the integer value 'thirty' (30).

5.2 The Global/Implicit Idiom

The global idiom is a derivative of the local idiom where the rdf:type property of the anonymous node is simply unspecified, to be inferred from global assertions made using the RDF Schema [RDF Schema] rdfs:range property. Multiple assertions of rdfs:range for a given property define an implicit intersection of one or more lexical data types, which may be used to imply or constrain the datatype(s) of the typed data literal.

Example
Per the following RDF statements, the literal "30" is implied (or required) to be a member of the lexical space of the datatype 'xsd:integer', and the TDL pairing of that literal and datatype denotes the integer value 'thirty' (30):

As with the local idiom, the anonymous node can be considered to represent the data value identified by the TDL pairing, namely the integer value 'thirty' (30).

Whether the rdfs:range statement constitutes a constraint on the allowed datatypes depends on whether there exists any local (explicit) type assignment. If there is no local typing for the literal value whatsoever, then rdfs:range can only serve as a global (implicit) type assignment. However, if the literal has one or more types defined locally, and any locally specified datatype is not compatible with all datatypes globally implied by rdfs:range for the property, one can treat such a case as a contradition to a constraint on the expected or required datatype(s) for the property in question.

5.3 Untyped Literal Idiom

... (TBD) ... Old-style global idiom is untyped literal idiom ...

Example
Per the following RDF statements, the literal "30" is interpreted only as the literal "30" and does not correspond to any other value.

Note that the presence of any rdfs:range assertion for the property ex:age does not attribute any datatype to a literal. Datatyping is only expressed by the local and global datatyping idioms described above.

5.4 Compatability Between Idioms

It is essential that both global (implicit) and local (explicit) idioms be able to coexist within the same knowledge base both together as well as with untyped literals, without undesired interactions; and in fact, this is essential if a global idiom is to be used as a constraint on locally defined datatypes. The following diagram shows how the local, global, and untyped idioms can cohabit the same knowledge base freely.

Example
Cohabitation of global, local, and untyped idioms:

5.5 Qualified Literals versus Typed Data Literals

... (TBD) ... discussion and examples based on xml:lang and xsd:lang ... a literal may be qualified in various ways, but a qualified literal is not necessarily a typed data literal ...

Example
Qualified Literal with Typed Data Literal:

5.6 Untyped Idiom as Contracted Form of Global Idiom

... (TBD) ... Old-style untyped idiom can be treated as contracted form of bNode global idiom ... either as one time legacy conversion or automatically by parser ... discuss pros/cons of both ...

5.7 XML Fragments as Typed Data Literals

... (TBD) ...

RDF does not support validation of XML content models for any XML Fragments expressed as literals, however, it is possible and useful to view XML Fragments in similar terms to typed data literals, where the XML Fragment is a lexical form (lexical expression) of an Infoset value according to a particular (complex) datatype. This allows the use of both local and global typing idioms to be applied to XML Fragments to express and/or constrain the types according to particular complex datatypes.

Example
XML Fragment as Typed Data Literal:

In the above example, the property ex:contactInfo has an rdfs:range of ex:vCard, meaning that the value of this property should be an XML Fragment that conforms to the content model defined for the ex:vCard complex type.

6 Model Theoretic Definition

[INSERT MT HERE]

7 Satisfaction of Desiderada

The official desiderada for all proposed datatyping solutions is defined in section 1.

[@@@ This section needs to be checked and probably edited a bit -- though it should still be pretty much correct]

This section clarifies how each desiderada is satisfied by this proposal. The list of desiderada is taken verbatim from the aforementioned document. Clarifications are in italics.

The TDL proposal meets all of the defined desiderata.

  1. Backward compatibility

    TDL is fully backwards compatible with all known systems and idioms insofar as it does not require modification to the present RDF graph model, does not require modification to the present XML serialization, adopts the idioms presently used by DAML+OIL, and (insofar as can be determined from the official materials) is compatable with the typing idioms employed by CC/PP.

    The model theory explicitly covers the old case of supporting no datatypes, and behaves monotonically as new datatypes are added.

    In as much as existing practice allows user typing of untyped literals (as in the PL propoal [PL] and the Jena (v1.3) system), the model theory respects that, in that untyped literals can be understood as having any typed value.

  2. Ability to use built-in primitive XML Schema datatypes

    TDL allows the use of any descendant of the XML Schema type "anySimpleType", both the predefined types as well as all custom types. This does not mean that every application will support the interpretation or validation of values associated with those types, but that all values of such types can be denoted in RDF by a TDL pairing.

  3. Ability to use non-XML-Schema datatypes

    TDL allows the use of any lexical datatype, conforming to the definition given here and in reference documents to that end, and which has URI denotation. This does not mean that every application will support the interpretation or validation of values associated with those types, but that all values of such types can be denoted in RDF by a TDL pairing.

  4. Ability to define datatypes using schema languages rather than relying on "built-in" data types.

    This is considered to be addressed in #3 above as well as by the default interpretation of non-typed literals.

  5. Ability to represent type information without an associated RDF schema

    The TDL local/explicit idiom provides for the representation of TDL pairings, and thus the typing of literal values, without any need to reference an external schema to determine typing of literals.

  6. Ability to reference type information in an associated RDF schema

    The TDL global/implicit idiom provides for the representation of TDL pairings, and thus the typing of literal values, to be encoded in one or more external schemas to imply typing of literals and/or constraints on the typing of locally typed literals.

  7. Co-existence of "global" and "local" typing mechanisms

    The TDL idioms for global and local typing are fully compatable and may coexist freely in the same knowledge base without undesirable interaction.

  8. Provide account of datatyping scheme semantics

    The TDL proposal provides a full account of datatyping semantics.

  9. Support for existing data typing idioms

    This is considered to be addressed in #1 above.


References

[SWOL]
Peter Patel-Schneider, The Semantic Web Ontology Language (SWOL), http://lists.w3.org/Archives/Public/www-webont-wg/2001Dec/att-0156/01-swol2. text
[PL]
Dan Connoly, PL: how a PERL programmer might do datatypes in RDF, http://lists.w3.org/Archives/Public/w3c-rdfcore-wg/2001Dec/0003.html
[RDF Core WG Charter]
W3C RDF Core Working Group Charter, Mar 2001, http://www.w3.org/2001/sw/RDFCoreWGCharter
[RDF Desiderada]
Graham Klyne, RDF datatyping desiderada, Jan 2002, http://lists.w3.org/Archives/Public/w3c-rdfcore-wg/2002Jan/0137.html
[RDF MT]
W3C RDF Model Theory Working Draft, Jan 2002, http://lists.w3.org/Archives/Public/www-archive/2002Jan/att-0007/01-RDF_Model_Theory.htm
[RDF DT]
W3C RDF Datatyping Working Draft, Sep 2001, http://www-nrc.nokia.com/sw/RDF_DT_Foundation.html
[RDF Schema]
W3C RDF Schema Recommendation, Mar 2000, http://www.w3.org/TR/2000/CR-rdf-schema-20000327/
[XSD]
World Wide Web Consortium, XML Schema Part 2: Datatypes, http://www.w3.org/TR/xmlschema-2/

Desiderada refs:

References

[1] Sergey Melnik, RDF Datatyping

[2] Pat Hayes, RDF model theory

[3] XML Schema Datatypes, Built-in Primitive Datatypes

[4] Notation-3

[5] RDF Model and Syntax Specification, 22-Feb-1999


Foundation Refs...

[CWM]
Object Management Group. Common Warehouse Metamodel 1.0. Feb 2001. Available at: ftp://ftp.omg.org/pub/docs/ad/01-02-01.pdf
[UML]
Object Management Group. Unified Modeling Language 1.4. Sep 2001. Available at: ftp://ftp.omg.org/pub/docs/formal/01-09-67.pdf
[PL]
Dan Connoly. PL: how a PERL programmer might do datatypes in RDF. Available at: http://lists.w3.org/Archives/Public/w3c-rdfcore-wg/2001Dec/0003.html
[RDF Core WG Charter]
W3C RDF Core Working Group Charter. Mar 2001. Available at: http://www.w3.org/2001/sw/RDFCoreWGCharter
[RDF Desiderada]
Graham Klyne, RDF datatyping desiderada, Jan 2002, http://lists.w3.org/Archives/Public/w3c-rdfcore-wg/2002Jan/0137.html
[RDF MT]
W3C RDF Model Theory Working Draft. Sep 2001. Available at: http://www.w3.org/TR/2001/WD-rdf-mt-20010925/
[RDF Schema]
W3C RDF Schema Recommendation. ? 200?. Available at: http://www.w3.org/?
[XSD]
World Wide Web Consortium. XML Schema Part 2: Datatypes. Available at: http://www.w3.org/TR/xmlschema-2/

Last Modified: $Date: 2002/02/07 08:58:13 $