This document attempts to summarize the common understanding of the RDF Core Working Group (further referred to as WG) with regards to the theoretical foundation for datatyping of literal values and serves as a basis of definition, discussion, and comparison of all proposed schemes for achieving a complete datatyping solution which are to be considered by the WG.
It describes a revised definition of the "Typed Data Literal" (TDL) datatyping scheme, adopting and incorporating certain aspects from the S datatyping proposal, including the re-annexation of Graham's Desiderada document and the Foundational material from the Sergey's S proposal document, which aims to reflect a convergence of views and opinions of the RDF Core WG with regards to datatyping, and which may replace all other proposals -- providing a single point of focus and discussion which ultimately should lead to the final datatyping solution.
The document has no normative status and merely provides a reference for an ongoing discussion within the working group.
Patrick Stickler
Backward compatibility
with existing RDF data
with existing RDF code
with existing RDF-based specifications like DAML+OIL or CC/PP
Ability to use built-in primitive XML Schema datatypes
Ability to use non-XML-Schema datatypes
Ability to define datatypes using schema languages rather than relying on "built-in" data types.
Ability to represent type information without an associated RDF schema
Ability to reference type information in an associated RDF schema
Co-existence of "global" and "local" typing mechanisms
Provide account of datatyping scheme semantics
Support for existing data typing idioms
Tidy literal nodes
Minimal addition, if any, to vocabulary or syntactic machinery
Single URIs for denoting datatypes
Single vocabulary for both global and local idioms concurrently
The goal here is that existing use of RDF, RDF-handling software and RDF-based specifications will continue to be valid, and (as far as possible) produce results as intended by their authors.
The datatyping proposal should provide an account of how the XML schema Built-in Primitive Datatypes [3] can be used with RDF.
The datatyping proposal should also be able to account for the use of XML schema
data types derived from the built-in primitive types (i.e. all instances of
anySimpleType
).
No goal is currently expressed with respect to use of composite XML Schema datatypes.
(XML Schema is not intended to be used for defining/constraining RDF/XML syntax or RDF graph structures for the purposes of datatyping.)
The datatyping proposal should not preclude the use of non-XML-schema datatypes, such as custom or user-defined datatypes, or those from major components external to RDF, like SQL or UML datatypes.
The datatyping proposal should not preclude using schema languages to define data types, rather than relying on "built-in" predefined data types. The proposal is not expected to give an account of any such schema language.
(This goal probably follows from 3.)
It should be possible to include typing information into an RDF graph without depending on a (separately defined) RDF schema.
It should be possible to indirectly incorporate typing information into an RDF graph by referencing an associated RDF schema.
One of the dimensions by which one can categorize datatyping proposals is by whether individual values are explicitly or implicitly typed, e.g. whether each occurrence needs to specify xsd:integer (explicit) or whether xsd:integer is specified as the rdfs:range of the property (implicit). RDF should allow users to choose either approach, and this approach is adopted in DAML+OIL. The use of implicit typing allows for compatibility with existing RDF data and much XML data. The use of both implicit and explicit typing allows for an extra check on the appropriateness of input. The use of explicit typing allows for direct control of the typing of data.
It should be possible for both forms of datatyping to coexist in the same RDF graph.
Adapted from:
(This looks rather like a restatement of goals 5, 6 above.)
The datatyping proposal should include a full account of data typing semantics, and how data typing interacts semantically with the other elements of RDF. This would preferably be expressed in terms of how the data typing proposal uses and/or extends the defined RDF model theory [2].
A number of idioms have been suggested for representing datatype information in an RDF graph. It is claimed or suggested that these are currently used in RDF.
These idioms are enumerated below using Notation-3 [4]. The descriptions below are intended to convey the graph form used, while being agnostic about issues of semantic denotation. The data typing proposals would need to provide an account of denotations used for any supported idiom.
A datatyping proposal may support any combination of these idioms, and the ex*: qualified names may be the same or different across those idioms supported. (The desirability or otherwise of different names is a matter for group consensus, not prejudged by this note.)
In presenting these idioms, it is useful to distinguish between "direct statements" and "schema statements", in recognition that there are different ways of handling schema statements:
(a) schema statements included explicitly in the same RDF document ("internal schema"),
(b) schema statements referenced in a separate RDF document ("external schema"), and
(c) schema statements implied and "understood" by the processing application ("implicit schema").
Presuming that:
We have three usage patterns that are equivalent, modulo the physical location or otherwise of the schema statements. To accommodate this, the idioms described below are presented in two parts:
In each case below, the intent is to express the idea that Jenny was born on 15 July 2001. The idioms simply illustrate a form of RDF graph that has this intended meaning, and do not attempt to say anything about the mechanisms for arriving at that meaning.
person:Jenny exA:birthDate _:1 . _:1 exA:date "2001-07-15" . |
(Adapted from [1])
person:Jenny exB:birthDate "2001-07-15" . |
exB:birthDate rdfs:range exB:date . |
(Adapted from [1])
person:Jenny exC:birthDate "2001-07-15" . |
exC:birthDate rdfs:range exC:date . |
(Adapted from http://lists.w3.org/Archives/Public/w3c-rdfcore-wg/2002Jan/0045.html)
Jenny exD:birthDate _:1 . _:1 rdf:value "2001-07-15" . _:1 rdf:type exD:date . |
(Adapted from http://lists.w3.org/Archives/Public/w3c-rdfcore-wg/2002Jan/0045.html)
(Also, a similar usage can be seen in RDFM&S [5], section 7.3.)
Jenny exE:birthDate _:1 . _:1 rdf:value "2001-07-15" . |
I.e. the new bNode global idiom
(Adapted from ... link to Jeremy's posting ...)
... (TBD) ...
... (TBD) ...
... (TBD) ...
... (TBD) ...
[Definition:] A
datatype mapping is a set of pairs whose first element belongs
to the value space of the datatype, and the second element belongs to the lexical
space of the datatype.
A datatype mapping satisfies the following properties:
(@@@ is the second condition necessary? Should we distinguish between partial and complete datatype mappings?)
[Definition:] A canonical datatype mapping is a subset of a datatype mapping that establishes a one-to-one correspondence between elements in the canonical lexical representation and elements in the value space.
[Definition:] A datatyping scheme is a convention for representing and using datatypes in RDF.
A datatyping scheme describes how
[RDF MT] explains the fundamental model-theoretic concepts like interpretation, universe, extension etc. used for interpreting the semantics of RDF graphs. This document assumes familiarity with these basic concepts.
Specific changes from the previous definition of TDL include:
For the purposes of distinguishing this proposal from previous versions of the TDL proposal, one may refer to this particular version as either the "convergence proposal" or "TDL version 4".
This datatyping proposal is based on the foundational RDF Datatyping Model [RDF DT] which is itself defined in terms of the RDF Model Theory [RDF MT].
The TDL scheme, formerly also known as "PDU" or "PD", is a fusion of the local/explicit typing idiom from the earlier scheme "D" (or "DAML"), plus a derivative of that idiom for global/implicit typing, along with the conceptual model from "U" (omitting the URV based local idiom) which introduces the TDL pairing of literal and datatype as fully denoting a data value. In addition, several properties of the S proposal have been adopted and integrated into this reformulation of the TDL datatyping scheme.
When type information is omitted the Model Theory for TDL captures the ambiguous typing of the Perl programming idiom [PL]. See the discussion in section 3.3 regarding untyped literals.
C.f.
This section provides an informal definition of the TDL datatyping proposal. A formal, model theoretic definition is provided in section 4.
As defined in section 2 of [RDF DT], for any given member of a lexical space there exists a mapping to one and only one member of the value space, referred to as the datatype mapping. Likewise, for any given member of a canonical lexical space there exists a mapping to one and only one member of the value space, referred to as the canonical mapping. Because the unique and unambigous identity of the lexical, canonical, and value spaces are inherent in the identity of the datatype itself, by the very definition of a datatype, we may uniquely and unambiguously denote a specific datatype mapping or canonical mapping, and hence a specific value, simply by the pairing of a lexical form (member of the lexical space) with the identity of the datatype (which in the case of RDF is a URI Reference).
[Definition:] The pairing of a lexical form to a datatype identity is called a typed data literal (TDL).
At TDL is a "literal-in-context" which identifies a single value in the value space of the datatype.
If the lexical form is a member of a canonical lexical space, the TDL denotes both a lexical mapping as well as a canonical mapping. Though, for the purpose of mapping a lexical form to a value, any canonical mapping is superfluous and redundant as the existence of a given canonical mapping infers the existence of a datatype mapping having the same pair of lexical form and value members.
A TDL may be expressed in several ways in RDF, according to the particular idiom used. This proposal outlines two such idioms for defining TDL pairings, one for global (implicit) definitions and one for local (explicit) definitions. Each idiom is defined separately below.
Note: For the sake of brevity and clarity, qualified names are used in the examples provided in this section where normally URI References are required. The following namespace declarations are assumed in the examples:
xmlns:rdf ="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:rdfs ="http://www.w3.org/2000/01/rdf-schema#" xmlns:xsd ="http://www.w3.org/2001/XMLSchema#" xmlns:ex ="uuid:f82dad84-0a58-11d6-9542-0003931df47c/"
The local idiom provides a means to explicitly associate a datatype with a lexical form by the use of an anonymous node for which the properties rdf:value and rdf:type are defined. The property rdf:value takes the literal (lexical form) as its object and the property rdf:type takes the URI Reference of the datatype.
Note that the anonymous node can be considered to represent the data
value identified by the TDL pairing, in this case, the
integer value 'thirty' (30).
The global idiom is a derivative of the local idiom where the
rdf:type property of the anonymous node is simply unspecified, to
be inferred from global assertions made using the RDF Schema
[RDF Schema] rdfs:range property. Multiple assertions of rdfs:range
for a given property define an implicit intersection of one or more
lexical data types, which may be used to imply or constrain the
datatype(s) of the typed data literal.
As with the local idiom, the anonymous node can be considered to represent
the data value identified by the TDL pairing, namely the integer
value 'thirty' (30).
Whether the rdfs:range statement constitutes a constraint on the allowed
datatypes depends on whether there exists any local (explicit) type assignment.
If there is no local typing for the literal value whatsoever, then rdfs:range
can only serve as a global (implicit) type assignment. However, if the
literal has one or more types defined locally, and any locally specified
datatype is not compatible with all datatypes globally implied by rdfs:range
for the property, one can treat such a case as a contradition to a constraint
on the expected or required datatype(s) for the property in question.
... (TBD) ... Old-style global idiom is untyped literal idiom ...
Note that the presence of any rdfs:range assertion for the property
ex:age does not attribute any datatype to a literal. Datatyping is
only expressed by the local and global datatyping idioms described
above.
It is essential that both global (implicit) and local (explicit) idioms
be able to coexist within the same knowledge base both together as well
as with untyped literals, without undesired interactions;
and in fact, this is essential if a global idiom is to be used as a
constraint on locally defined datatypes. The following diagram
shows how the local, global, and untyped idioms
can cohabit the same knowledge base freely.
... (TBD) ... discussion and examples based on xml:lang
and xsd:lang ... a literal may be qualified in various
ways, but a qualified literal is not necessarily a
typed data literal ...
... (TBD) ... Old-style untyped idiom can be treated as contracted
form of bNode global idiom ... either as one time legacy conversion
or automatically by parser ... discuss pros/cons of both ...
... (TBD) ...
RDF does not support validation of XML content models for
any XML Fragments expressed as literals, however, it is possible
and useful to view XML Fragments in similar terms to typed
data literals, where the XML Fragment is a lexical form (lexical
expression) of an Infoset value according to a particular
(complex) datatype. This allows the use of both local and global
typing idioms to be applied to XML Fragments to express and/or
constrain the types according to particular complex datatypes.
In the above example, the property ex:contactInfo has an
rdfs:range of ex:vCard, meaning that the value of this
property should be an XML Fragment that conforms to the
content model defined for the ex:vCard complex type.
[INSERT MT HERE]
The official desiderada for all proposed datatyping solutions is defined
in section 1.
[@@@ This section needs to be checked and probably
edited a bit -- though it should still be pretty much correct]
This section clarifies how each desiderada is satisfied by this proposal.
The list of desiderada is taken verbatim from the aforementioned document.
Clarifications are in italics.
The TDL proposal meets all of the defined desiderata.
TDL is fully backwards compatible with all known systems and idioms
insofar as it does not require modification to the present RDF graph model,
does not require modification to the present XML serialization, adopts
the idioms presently used by DAML+OIL, and (insofar as can be determined
from the official materials) is compatable with the typing idioms employed
by CC/PP.
The model theory explicitly covers the old case of supporting no
datatypes, and behaves monotonically as new datatypes are added.
In as much as existing practice allows user typing of untyped literals
(as in the PL propoal [PL] and the Jena (v1.3) system), the model theory
respects that, in that untyped literals can be understood as having any
typed value.
TDL allows the use of any descendant of the XML Schema type "anySimpleType",
both the predefined types as well as all custom types. This does not mean
that every application will support the interpretation or validation of
values associated with those types, but that all values of such types can
be denoted in RDF by a TDL pairing.
TDL allows the use of any lexical datatype, conforming to the definition
given here and in reference documents to that end, and which has URI denotation.
This does not mean that every application will support the interpretation
or validation of values associated with those types, but that all values
of such types can be denoted in RDF by a TDL pairing.
This is considered to be addressed in #3 above as well as by the
default interpretation of non-typed literals.
The TDL local/explicit idiom provides for the representation of TDL
pairings, and thus the typing of literal values, without any need to reference
an external schema to determine typing of literals.
The TDL global/implicit idiom provides for the representation of
TDL pairings, and thus the typing of literal values, to be encoded in one
or more external schemas to imply typing of literals and/or constraints
on the typing of locally typed literals.
The TDL idioms for global and local typing are fully compatable and
may coexist freely in the same knowledge base without undesirable interaction.
The TDL proposal provides a full account of datatyping semantics.
This is considered to be addressed in #1 above.
[1] Sergey Melnik,
RDF Datatyping [2] Pat Hayes, RDF model theory [3] XML
Schema Datatypes, Built-in Primitive Datatypes [4] Notation-3 [5] RDF Model and Syntax Specification,
22-Feb-1999
Foundation Refs...
5.2 The Global/Implicit Idiom
5.3 Untyped Literal Idiom
5.4 Compatability Between Idioms
5.5 Qualified Literals versus Typed Data Literals
5.6 Untyped Idiom as Contracted Form of Global Idiom
5.7 XML Fragments as Typed Data Literals
6 Model Theoretic Definition
7 Satisfaction of Desiderada
References
Desiderada refs:
References
Last Modified: $Date: 2002/02/07 08:58:13 $