RE: Cutting the Patrician datatype knot

From: Patrick.Stickler@nokia.com
Subject: RE: Cutting the Patrician datatype knot
Date: Thu, 29 Nov 2001 14:09:09 +0200

> > For example, if you allow union XML Schema datatypes there is 
> > a model of 
> > 
> > 	<rdfs:range foo xsd:[integer union string]>
> > 	<John foo 7>
> 
> As I think I've said earlier, I don't consider 
> [integer union string] to be a "valid" data type.

And why not?  

> The definition of a data type that I subscribe to is
> that a data type defines a value space and (optionally)
> a lexical space, and a member of the lexical space maps
> to one and only one member of the value space.

[integer union string] satisfies this definition.  In [integer union
string] the lexical item "7" maps to the integer 7.

> In the above union "data type", the literal "7" maps to
> two members of the value space. Therefore, it is not a
> valid data type.

Not correct.  Please read the XML Schema recommendation to see how union
datatypes work.

> What you seem to be defining is just a union of lexical 
> space. I.e., the union of the lexical space of integers with 
> the lexical space of strings; which, however possible to do,
> is not particularly useful if you want to deal with the
> values themselves.

No, XML Schema has a method for creating union datatypes that satisfies
your requirements.  If you want to exclude such datatypes you have to
provide a criterion other than ``usefulness''.

> XML Schema is not concerned with values the same way that
> an application would be. XML Schema only has to ensure
> the integrity of the lexical and structural space. Thus,
> a union such as above is reasonable, as XML Schema does
> not itself worry about the ambiguity that arises in the
> lexical to value mapping.  

Again, XML Schema does *not* have ambiguous lexical-to-value mappings.
Although this is not explicitly stated in the XML Schema datatype document,
it can be inferred from lots of places in section 2.  [Note to XML Schema
people:  This property of datatypes should be explicitly stated.  Also,
datatypes really should be four-tuples, one element being the
lexical-to-value map!]

> You do, though, raise an important question -- whether it
> is possible to define XML Schema simple data types which
> do not have a N:1 mapping from lexical space to value space.
> If we can have 1:N or N:N mappings, then we are going to
> have problems, and that might mean that perhaps XML Schema
> may need to be more constrained with regards to some
> simple type derivations.

No XML Schema datatype has a 1:N or N:N lexical-to-value map.  It is not
the presence of such datatypes that causes problems.

Instead, again, it is the presence of two (different) datatypes that have
overlapping value spaces but different lexical-to-value maps within this
overlap!

> I'm presuming, of course, that RDF is only concerned with
> simple data types, not all XML Schema definable types in
> general.

This is true even if you include all XML Schema datatypes, even the
composite ones.

> > For example, what is the theory of rdf:type on datatype classes?
> 
> Good question. I'm not the best person to offer an answer,
> insofar as the formal MT is concerned, but I would expect
> that the theory of rdf:type is the same for all classes, datatype
> or otherwise, and it is the knowledge about a particular class
> that tells us it is a data type class, and data type classes
> have distinct characteristics, such as defining a value space
> and (optionally) lexical space. If we declare that literals
> may only be bound to data type classes, then we know that a
> given class is a data type class if it is bound to a literal,
> and thus know how to interpret the pairing of literal (lexical
> form) to data type.

But if you don't provide a theory of rdf:type on datatype classes, then
others cannot evaluate your mechanism, as it uses rdf:type to determine the
lexical-to-value mapping.

> Cheers,
> 
> Patrick

We now have several fully-specified schemes for RDF datatypes.  They may
have problems, but just about any fully-specified scheme is better than an
underspecified one!  

Peter F. Patel-Schneider
Bell Labs Research

Received on Thursday, 29 November 2001 08:50:35 UTC