new datatyping proposal

Here is a summary of the new proposal. It does not yet address test 
cases and several important issues that we discussed offline and that 
were raised on the list. We hit the deadline, so I'm sending off a 
current snapshot to continue the discussion on the list.

Based on feedback by Guha, Jos, Mike, Pat, and Sergey


1. Graph model
--------------

a) The set of literals is not limited to string tokens, but may also 
include tokens for integers, floats, binary data etc. These can be used 
in the graph directly. In the model theory, a
literal is a constant with a fixed interpretation.

Example of using an integer in the abstract syntax:

Jenny --ageYears--> int_5

[[We have to say more about the graph syntax of those constants]]

b) The data type URIs correspond to RDF resources which are subClasses
of rdfs:Literal. So, the data type can be the range of a
property.

Example:

ageYears --rdfs:range--> xsd:integer
xsd:integer --rdfs:subClassOf--> rdfs:Literal

In the model theory, we have

I(xsd:integer) = {I(int_0), I(int_1), ... }

It is the intent that at least some of these data types (like integers, 
floats and strings) correspond to data types provided by programming 
languages and storage systems, so as to allow for efficient storage and 
retrieval of RDF.

[[We can either leave the literal as an opaque thing in the graph syntax 
and model theory or we can try to capture the type information, e.g., by 
means of 4-tuples]]


2. Concrete syntaxes
--------------------

In the RDF/XML syntax, non-string literals are encoded in accordance
with the XML Schema spec as

<propName xsi:type="URI">XML content</propName>

[[URI vs. QName; xsi:type vs. parseType]]

RDF/XML parsers provide callbacks that allow generating a compact
internal representation of literals that correspond to data types
provided by programming languages and storage systems (e.g., integers,
floats and strings). Similarly, the serializers provide callbacks for
encoding such literals in RDF/XML.

Other concrete RDF syntaxes (esp. non-XML-based, like NTriples) need
to provide their own mechanisms for encoding literals. These
mechanisms may or may not use type URIs.


3. Extensibility path
---------------------

The set of literals used in the RDF graph model is
open-ended. However, the type of literals affects their representation
in concrete syntaxes. To facilitate roundtripping between different
concrete syntaxes, we have to enumerate a minimal set of
(XSD) types that are required to be supported by RDF applications.

Subsequent standardization efforts are expected to extend the
currently provided typing mechanism in a way that allows defining
primitive and derived types without affecting concrete
syntaxes. Lexical forms, bNodes, URI schemes, or other mechanisms that
have been subject of discussion of RDF Core can be used for this
purpose.

Received on Thursday, 8 August 2002 07:05:26 UTC