Patrick,
I am using netscape composer to create some HTML, but I will leave
it to you to merge with your document. (I am not sure what editor you are
using, I believe Composer creates fairly simply HTML that should be straight
forward to merge).
I suggest the following edits to your document, my text is not yet ready,
but an early version is below.
-
In the introduction
The TDL scheme, also known as "PDU" or "PD", is a fusion of the idioms
from two earlier schemes "P" and "D" (or "DAML") along with the conceptual
model from "U" (omitting the URV based idiom). When
type information is omitted the Model Theory for TDL captures the ambiguous
typing of the Perl programming idiom (PL).
-
Delete text (the same content will be addressed with new section). I find
this is too aggressive because it is so early in the document.
It should be noted up front that the TDL datatyping
scheme
does not require modification to the present RDF
graph model
does not require modification to the present
RDF/XML serialization
does not require modification to the present
N3 notation
does not require modification to the present
NTriples notation
reflects common RDF usage, adopting popular idioms
already in use
reflects common XML Schema usage for the typing
of literals
provides for both global (implicit) and local
(explicit) typing concurrent ly
-
Modify text on TDL & Model Theory
OLD:
In accordance with [RDF MT], the primary RDF
syntax used in the TDL scheme is based on tidy graphs (a tidy graph is
the one in which no two nodes carry the same label). The interpretation
of each literal is assumed fixed and determined by its content. (For example,
the interpretation of literals could be defined as an identity mapping.)
NEW:
The formal treatment of TDL is presented as a modification to the RDF
Model Theory. Datatyping is achieved during interpretation. Each occurrence
of a literal Unicode string may have its own node in the graph and is interpreted
according to the map(s) associated with the datatype(s) associated by TDL
with that node. The graph may be ill-formed because of datatyping problems
(e.g. "three" is not an integer). The informal intent of TDL is to capture
the normal programming paradigm that the input syntax uses the lexical
space of datatypes, and the "meaning" is in the value space of the datatype.
However, for technical reasons (mainly that the typing in RDF MT is part
of the model rather than the interpretation), the interpretation of each
Unicode string node in the graph is given as a lexical-value pair within
the Universe, which most of the time is treated as being the value component.
As always, the intent of the Model Theory is to capture concepts such as
entailement, consistency etc. but not to indicate an approach to implementation.
In particular, the existence of lexical-value pairs within the Universe
of Interpretation is not intended to indicate a deep metaphysical belief
in such things!
-
Delete appendices A&B. This stuff is not needed for TDL and acts as
possible obstacles to other members of the group from supporting TDL. I
suggest that if TDL is accepted then these appendices could form a Techinal
Note or something like that. (Also I have appendices to add).
-
Add my section "Introduction to the Model-Theoretic Interpretation" at
section 2.2
-
Add new section comparing TDL with the requirements (this replaces the
deleted text from the intro).
-
Add my appendix "The Model Theoretic Interpretation" at end.
-
Add my appendix "XML Schema Union Datatypes in RDF" at end.
-
I wonder whether in 3.3 we should have more text explaining why TDL has
full compatibiility and S does not. (e.g. "In TDL the local and global
typing mechanism are the same: in the model theory the representation is
identical, the lexical-value pair. This can be contrasted with S where
the global idom (idiom B) operates entirely within the lexical space and
cannot freely interoperate with the local idiom (idiom A) which operates
principally in the value space. In S, idiom A, this allows different lexicalizations
of the same value space (e.g. octal and decimal integers) to interoperate,
whereas in TDL such interoperability is not possible. S, idiom B, does
not prohibit such interoperability, but is highly problematic.")
-
Can we change the Model Theory reference to be a more recent version e.g.
http://lists.w3.org/Archives/Public/www-archive/2002Jan/att-0007/01-RDF_Model_Theory.htm
An Introduction to the Model Theory for TDL
TDL is formalized as changes to the existing RDF Model Theory.
This section gives a light-weight overview, the interested reader should
read appendix A for the full detail. XML Schema Union datatypes are omitted
from this section; see appendix B for how they are addressed.
Datatypes are viewed as in Patel-Schneider's work [OWL: URL:???]. That
is each datatype has four components, a URI, a lexical space, a value space,
and a mapping.
An RDF interpretation is with respect to some set of datatypes, which
corresponds to the supported datatypes in an RDF implementation. xsd:string
is the only obligatory datatype, and acts as the default type.
Terminology
We modify the terminology of the Model Theory to differentiate between
literals before datatyping and literals after datatyping. The modification
is:
-
We use the term "Unicode node" to refer to a node in the graph labelled
with a unicode string.
-
We use the term literal-value pair to refer to a pair consisting of a unicode
string and a value from the value space of some datatype. The only interesting
literal-value pairs are ones that belong to the mapping of some datatype.
-
We do not use terminology such as "literal node" or "literal value".
-
We refer to the set of datatypes used in an RDF interpretation as the "supported
datatypes".
The Interpretation of Unicode Nodes
An interpretation maps each Unicode node to some literal-value pair, of
some datatype. We know there is always at least one such pair because xsd:string
is supported. The type information is checked by requiring this pair to
be a member of each class associated with this node (e.g. by a range constraint)
and by understanding class membership of datatype classes to refer to the
mapping of the datatype.
The Interpretation of rdf:value
Following Graham Klyne's suggestion rdf:value is simply equality.
The Interpretation of Asserted Triples
Asserted triples are interpreted with respect to the function IEXT. However,
the range of IEXT is extended to permit any pair of objects from the Universe.
IEXT is then restricted to respect rdf:value as equality and encodes
the supported datatypes.
i.e. IEXT(rdf:value) is the identity on the universe.
For if d is a datatype then,
IEXT(rdf:type) contains the pair ( (unicode-string,
value), d )
if and only if (unicode-string, value) is in the
map associated with d.
IEXT is also required to be neutral with respect to the lexical space
on all other properties.
i.e.
if (u1,v) and (u2,v) are two literal-value pairs in the
universe and r a resource in IR and p a property in IP-{rdf:type,rdf:value}
and both literal-pairs satisfy the range constraints on p then:
( r1, (u1,v) ) is in IEXT(r2) iff (r1, (u2, v) )
is in IEXT(r2)
So while this differs from previous of the model theory in that triples
with literals as object are interpreted with a literal-value pair as object,
such literal-value pairs are to be understood as typed data values.
Multiple types
A literal-value pair may belong to multiple types, in which case a legal
RDF graph may show multiple type information for that literal-value pair,
using both the local or the global idioms. Sometimes the intersection of
multiple types may be surpisingly small but not empty, for example, a binary
integer type and a positive decimal integer type may have intersection
{ ("0",0), ("1",1) }; either of these two literal-values would be legal,
but a Unicode string "10" cannot be interpreted in the presence of such
conflicting type information, despite being in both lexical spaces and
despite the two value spaces being the same. (Contrast with S-B, which
permits "10" in such a case).
Unsupported Datatypes
An RDF implementation only knows some datatypes, and in particular may
not be aware of a datatype used in a particular RDF document. The Model
Theory reflects this by having an interpretation with respect to some set
of datatypes (the supported datatypes). The only obligatory datatype is
xsd:string. In practice, documents with an unsupported datatype constrain
the datatype (in that the lexical occurrences in the document must be in
the lexical space of the datatype), whereas supported datatypes constrain
the document (in that the document may be ill-formed in that the unicode
nodes are labelled with strings that are not in the domain of the relevant
datatypes). The model theory is monotone with respect to the set of supported
datatypes; meaning that implementations supporting fewer datatypes will
make correct inferences but not all inferences. (e.g. they will not infer
a contradiction when datatyping is invalid).
An Introduction to the Model Theory for TDL
TDL is formalized as changes to the existing RDF Model Theory.
Datatypes are viewed as in Patel-Schneider's work [OWL: URL:???]. That
is each datatype has four components, a URI, a lexical space, a value space,
and a mapping. Unlike previous work, the mapping is a relationship rather
than a function. This is specifically to accomodate XML Schema Union datatypes.
For all other datatypes the mapping is a function. Each datatype is a resource
and is found in the Universe of interpretation.
An RDF interpretation is with respect to some set of datatypes, minimally
containing xsd:string, which can be viewed as the default datatype.
Terminology
We modify the terminology of the Model Theory to differentiate between
literals before datatyping and literals after datatyping. The modification
is:
-
We use the term "Unicode node" to refer to a node in the graph labelled
with a unicode string.
-
We use the term literal-value pair to refer to a pair consisting of a unicode
string and a value from the value space of some datatype. The only interesting
literal-value pairs are ones that belong to the mapping of some datatype.
-
We do not use terminology such as "literal node" or "literal value".
-
We refer to the set of datatypes used in an RDF interpretation as the "supported
datatypes".
The Interpretation of Unicode Nodes
Each Unicode node is interpreted as a literal-value pair. The literal-value
pair must occur in the map of some datatype. (Hence the requirement that
xsd:string is in the set of datatypes, this ensures that there is at least
one possible interpretation of every Unicode node). The unicode string
component of the literal-value pair is the label of the Unicode node. If
there is no type information available for a Unicode node, it can hence
be interpreted according to any of the supported datatypes, as long as
the Unicode string is in the literal space of the datatype. In this way,
TDL formalises the PL proposal.
The Universe of an Interpretation
The Universe is formed by the union of:
-
IR, the set of resources, which is a superset of the set of datatypes.
-
the value space of each datatype
-
the mapping of each datatype.
i.e. The Universe contains resources, typed data values, and literal-value
pairs.
The Interpretation of Datatype URIs
The interpretation mapping IS is restricted to mapping any datatype URI
in V to the corresponding datatype in IR. That is, a datatype URI does
identify a datatype.
The Interpretation of Asserted Triples
Asserted triples are interpreted with respect to the function IEXT. However,
the range of IEXT is extended to permit any pair of objects from the Universe.
IEXT is then restricted to respect rdf:value as equality and encodes
the supported datatypes.
i.e. IEXT(rdf:value) is the identity on the universe.
For if d is a datatype then,
IEXT(rdf:type) contains the pair ( (unicode-string,
value), d )
if and only if (unicode-string, value) is in the
map associated with d.
IEXT is also required to be neutral with respect to the lexical space
on all other properties.
i.e.
if (u1,v) and (u2,v) are two literal-value pairs in the
universe and r a resource in IR and p a property in IP-{rdf:type,rdf:value}
and both literal-pairs satisfy the range constraints on p then:
( r1, (u1,v) ) is in IEXT(r2) iff (r1, (u2, v) )
is in IEXT(r2)
So while this differs from previous of the model theory in that triples
with literals as object are interpreted with a literal-value pair as object,
such literal-value pairs are to be understood as typed data values.
Multiple types
A literal-value pair may belong to multiple types, in which case a legal
RDF graph may show multiple type information for that literal-value pair,
using both the local or the global idioms. Sometimes the intersection of
multiple types may be surpisingly small but not empty, for example, a binary
integer type and a positive decimal integer type may have intersection
{ ("0",0), ("1",1) }; either of these two literal-values would be legal,
but a Unicode string "10" cannot be interpreted in the presence of such
conflicting type information, despite being in both lexical spaces and
despite the two value spaces being the same. (Contrast with S-B, which
permits "10" in such a case).
Unsupported Datatypes
An RDF implementation only knows some datatypes, and in particular may
not be aware of a datatype used in a particular RDF document. The Model
Theory reflects this by having an interpretation with respect to some set
of datatypes (the supported datatypes). The only obligatory datatype is
xsd:string. In practice, documents with an unsupported datatype constrain
the datatype (in that the lexical occurrences in the document must be in
the lexical space of the datatype), whereas supported datatypes constrain
the document (in that the document may be ill-formed in that the unicode
nodes are labelled with strings that are not in the domain of the relevant
datatypes). The model theory is monotone with respect to the set of supported
datatypes; meaning that implementations supporting fewer datatypes will
make correct inferences but not all inferences. (e.g. they will not infer
a contradiction when datatyping is invalid).