Towards closure on the data type issue

Hey folks,

It has occurred to me (and likely to others as well) that we
may be getting a little ahead of ourselves with the various
data typing proposals which either require changes to the
current graph model or which contain elements or aspects which
cannot be properly or easily expressed in the current RDF/XML
serialization. 

I agree that it would be great if we could revamp RDF to address 
these sorts of issues in a more elegant manner (and my own earlier
proposals suggest such a revamping) but I've come to repent my own 
radicalism within the scope of the WG's charter (not in general 
though ;-) and I think that we all may need to do so a bit.

Therefore, in the interest of the WG getting past this particular
data typing issue (in a satisfactory manner) within the constraints 
of the charter and moving on, I very humbly and with great 
trepidation make the recommendation outlined below.


Note that this recommendation:

A. Does not require modification to the present graph model.
B. Does not require modification to the present XML serialization.
C. Does not require modification to the present N3 notation.
D. Does not require modification to the present NTriples notation.
E. Reflects common RDF usage, making popular idioms "recommended".
F. Reflects common XML Schema usage for the typing of literals.


[And, please, if I happen to use words such as 'define' or 'imply' 
or 'denote' in a fashion that doesn't fit precisely into a particular 
strict interpretation used by a given discipline, please presume that I 
did not intend them to; and if my specific meaning is not clear, I will
be very happy to try to clarify it for you.]


--

Here's the recommendation:

The issue of data typing of literals, with particular focus
on the relation between RDF interpretation and XML Schema
simple data types could/should be addressed as follows:


1. Adopt, summarize, and interpret the definition of "data type"
according to the XML Schema spec such that:

  * an RDF "data type" (DT) corresponds to a value space

  * an RDF "lexical data type" (LDT) is a subclass of RDF data type
    which in addition to a value space defines a lexical space
    and/or a canonical lexical space

  * both DT and LDT are identified by URI Ref

  * for a LDT which defines a lexical space, every member of the 
    lexical space maps to one and only one member of the value space

  * for a LDT which defines a canonical lexical space, every member 
    of the canonical lexical space maps to one and only one member 
    of the value space

  * for a LDT which defines both a lexical space and a canonical 
    lexical space, every member of the lexical space maps to one 
    and only one member of the canonical lexical space

  * XML Schema simple data types are LDTs

  * XML Schema LDTs define both a lexical space and a canonical 
    lexical space

--

2. Define the concept of 'data value' as follows:

A data value is a member of a value space of a particular
data type.

A specific data value is denoted by a pairing of a 
lexical form denoted by a literal and a data type denoted 
by a URI Ref.

A "typed data literal" (TDL) is an RDF construct 
corresponding to the pairing, by RDF mechanisms, of lexical 
form (literal) and data type (URI Ref) which denotes a 
data value. I.e. TDL(Literal,URIRef)

A TDL may be defined in several ways in RDF, as defined
below.

--

3. Specify that relations between DTs or LDTs defined by 
the RDF Schema rdfs:subClassOf property only concern the 
intersection of value spaces and not of lexical spaces.

Thus, a member of the value space of a subclass data type
must be a member of the value space of a superclass data
type, but the member of the lexical space of a subclass
data type need not be a member of the lexical space of
the superclass data type.

This is critical, to enable the definition of upper-level
DT classes which serve the same purpose as upper-level
ontologies of property classes -- whereby the value spaces
of two LDTs can be declared as compatible even if their
lexical spaces are not.

--

4. Specify that the object of a triple denotes a data value
(a value in the value space of a particular data type), whether 
the object of the triple is a literal, an anonymous node, or 
a resource node with uriref label, as defined below. 

I.e., it is the object slot or position of the triple that 
denotes the data value, not the graph construct that fills that 
slot.

I believe that this is compatible with the general view
of at least the P++, S, DC, U, and X proposals, and
possibly P.

--

5. An RDF typed data literal can be defined by one of the
following three methods, where

   * all of these three methods are allowed

   * all of these three methods are deemed to have
     identitical interpretation with regards to
     the above definitions for RDF data typing

   * no system or content is required to use any of 
     these three methods; they are only recommendations
     which intended to provide a clearly defined, 
     consistent interpetation

-

METHOD I: Anonymous Node Construct

The following anonymous node based construct (idiom) 
is used:

Typed Data Literals defined in examples:

   TDL("10",xsd:integer)

In graph notation:

   xyz --ex:someProp--> [] --rdf:value--> "10"
                         \
                          ---rdf:type---> xsd:integer

In NTriples

   xyz ex:someProp _:1 .
   _:1 rdf:value "10" .
   _:1 rdf:type xsd:integer .

In N3

   xyz ex:someProp [ rdf:value "10", rdf:type xsd:integer ] .

In RDF/XML

   <rdf:Description rdf:ID="xyz">
      <ex:someProp>
         <xsd:integer>10</xsd:integer>
      </ex:someProp>
   </rdf:Description>

Note: The following constraints/requirements apply:

   * the anonymous node has one and only one rdf:value
   * the anonymous node has one and only one rdf:type
   * the property value of rdf:value is a literal
   * the property value of rdf:type is a URI Ref

Otherwise, the anonymous node is free to have any other
properties whatsoever without affecting the interpretation
of this construct/idiom.

-

METHOD II: RDF Schema rdfs:range definition

The rdfs:range of a property is paired with a literal
object (property value):

Typed Data Literals defined in examples:

   TDL("10",xsd:integer)
   TDL("10",foo:int)

In NTriples:

   xyz ex:someProp "10" .
   ex:someProp rdfs:range xsd:integer .

   implies

   xyz ex:someProp _:1 .
   _:1 rdf:value "10" .
   _:1 rdf:type xsd:integer .

and

   xyz ex:someProp _:1 .
   _:1 rdf:value "10" .
   _:1 rdf:type xsd:integer .
   ex:someProp rdfs:range foo:int .

   implies

   xyz ex:someProp _:1 .
   _:1 rdf:value "10" .
   _:1 rdf:type xsd:integer .
   _:1 rdf:type foo:int .

Note: locally defined types do not supercede range types
nor do range types supercede locally defined types.
 
-

METHOD III: URV Encoding

The lexical form and data type URI Ref can be encoded as
a URV.

Typed Data Literals defined in examples:

   TDL("10",xsd:integer)
   TDL("10",foo:int)

In NTriples:

   xyz ex:someProp <xsd:integer:10> .
   <xsd:integer> lit:mapsTo xsd:integer .

   implies

   xyz ex:someProp _:1 .
   _:1 rdf:value "10" .
   _:1 rdf:type xsd:integer .

and 

   xyz ex:someProp <xsd:integer:10> .
   ex:someProp rdfs:range foo:int .

   implies

   xyz ex:someProp _:1 .
   _:1 rdf:value "10" .
   _:1 rdf:type xsd:integer .
   _:1 rdf:type foo:int .

Note: The benefit if URV encoding is that typed data literals may
then participate in tidying operations, resulting in a significant
reduction of graph real-estate without loss of information.

Note: The 'lit:' ontology provides the means for defining a lexical
space for and mapping to DTs which do not themselves define a lexical
space.


--

6. The "execution" of a mapping from lexical form to internal
representation of the corresponding value in the value space
by a specific application requires that said application have
knowledge about both the lexical space and value space of the
data type.

Comparison of values normally requires an execution of that 
mapping and is not intended to be based on the lexical form 
embodied in the RDF literal.

If, by coincidence or design, all lexical forms constitute
canonical lexical forms, such that the string order of
the lexical space corresponds to the value order of the
value space, then an application is free to treat lexical
forms as values for comparisons of equality or order without
executing the mapping from lexical form to value; but this
is a special case and not a requirement for data types or
the definition or interpretation of typed data literals in 
general.

---

All of the above methods and definitions are, I believe, 100%
compatible with, and expressible in terms of, the present RDF
and RDFS Recommendations, and together provide a clear, consistent,
and useful description of how literals are to be defined and
interpreted in terms of data types, either those defined by 
XML Schema, or by any other data type scheme.

If the above recommendation is totally off track and offensive
to anyone, feel free to rip it to shreads and slap me silly...

Regards,

Patrick

--
               
Patrick Stickler              Phone: +358 50 483 9453
Senior Research Scientist     Fax:   +358 7180 35409
Nokia Research Center         Email: patrick.stickler@nokia.com

Received on Monday, 19 November 2001 04:35:52 UTC