Lexical vs Value spaces and class hierarchies

Below is a brief outline of an ontology for "Typed Data Literals"
which is meant to provide the basis for the definition of URV 
schemes, and an RDF schema (partially incomplete) which defines 
an XML Schema URV scheme in terms of that ontology.
 
It addresses the distinction between value space and lexical
space, as well as lexical space versus canonical lexical space,
and allows for defining mappings between data type classes
in terms of either value space, lexical space, or both.

---

Here's the ontology "schema" (only has comments so far ;-)

<?xml version="1.0"?>

<!DOCTYPE uridef [
  <!ENTITY rdf    "http://www.w3.org/1999/02/22-rdf-syntax-ns#">
  <!ENTITY rdfs   "http://www.w3.org/2000/01/rdf-schema#">
  <!ENTITY lit    "voc://nokia.com/lit-1.0/">
]>

<rdf:RDF
   xmlns:rdf  ="&rdf;"
   xmlns:rdfs ="&rdfs;"
   xmlns:lit  ="&lit;"
>

<!-- Ontology for Typed Data Literals:

lit:mapsTo

   An RDF Class (other than a TDL type) which every TDL
   for this TDL type corresponds to, for both the value space
   and lexical space.
   Is subPropertyOf lit:correspondsTo.

lit:correspondsTo

   An RDF Class representing a value space which every TDL for
   this TDL type is a member, but which may not conform to
   its lexical space.
   Is subPropertyOf lit:approximateTo.

lit:approximateTo

   An RDF Class representing a value space which the TDL for
   this TDL type is a member, but which may not conform to
   its lexical space and may differ in precision such that
   conversion may result in a loss of information.

lit:conformsTo

   An informative string identifying a standard to which the
   TDL type conforms, both the value space and lexical space,
   if and as specified.

lit:subTypeOf

   A TDL type which is superordinate to this TDL type such
   that every TDL for this type is a valid TDL of the
   superordinate type, both in value space and lexical space.

lit:pattern

   A pattern which matches a lexical form for a TDL of this type.

lit:xpattern

   A pattern which matches a lexical form for a TDL not of this type.

- - -

The set of patterns defined for a TDL type constitute an OR'd set
of options, any single pattern may match. The set of xpatterns defined
for a TDL type constitute an AND'd set of exclusions, none of which
may match. Patterns are applied prior to xpatterns for a given TDL type.

If a given TDL type is a subTypeOf one or more other TDL types, the
patterns and xpatterns defined are complementary to the patterns defined
for the superTypes, such that, validation of a lexical form must be
done from furthest ancestor to locally defined TDL type, and if failure
ocurrs at any stage, the value is invalid. This permits subtypes to
define their lexical spaces in terms of supertypes by restriction (adding
only xpatterns) and also ensures that all lexical forms of a TDL type
conform to the lexical space defined for all superordinate TDL types.
If there is multiple inheritance, then validation must be done for every
path from each furthest ancestor to the local TDL type.

All lit:mapsTo, lit:correspondsTo, lit:approximateTo, and lit:conformsTo
relations defined for a superordinate TDL type are inherited by a 
subordinate TDL type.

-->

</rdf:RDF>

=================

And here's the definition for the XML Schema simple types:

<?xml version="1.0"?>

<!--

RDF Schema defining Type Data Literal (TDL) Encodings and Mappings
to XML Schema Simple Types using Canonical Representations

Author: Patrick Stickler
        Nokia Research Center
        patrick.stickler@nokia.com

-->

<!DOCTYPE uridef [
  <!ENTITY rdf  "http://www.w3.org/1999/02/22-rdf-syntax-ns#">
  <!ENTITY rdfs "http://www.w3.org/2000/01/rdf-schema#">
  <!ENTITY xsd  "http://www.w3.org/2001/XMLSchema#">
  <!ENTITY lit  "voc://nokia.com/lit-1.0/">
]>

<rdf:RDF
   xmlns:rdf   ="&rdf;"
   xmlns:rdfs  ="&rdfs;"
   xmlns:xsd   ="&xsd;"
   xmlns:lit   ="&lit;"
>

<!-- need to distill patterns to utilize superType definitions -->


<!-- Primitive Data Types -->

<rdf:Description rdf:about="xsd:anySimpleType">
   <lit:mapsTo rdf:resource="&xsd;anySimpleType"/>
</rdf:Description>

<rdf:Description rdf:about="xsd:string">
   <lit:mapsTo rdf:resource="&xsd;string"/>
   <lit:subTypeOf rdf:resource="xsd:anySimpleType"/>
</rdf:Description>

<rdf:Description rdf:about="xsd:boolean">
   <lit:mapsTo rdf:resource="&xsd;boolean"/>
   <lit:subTypeOf rdf:resource="xsd:anySimpleType"/>
   <lit:pattern>0</lit:pattern>
   <lit:pattern>1</lit:pattern>
   <lit:pattern>true</lit:pattern>
   <lit:pattern>false</lit:pattern>
</rdf:Description>

<rdf:Description rdf:about="xsd:decimal">
   <lit:mapsTo rdf:resource="&xsd;decimal"/>
   <lit:subTypeOf rdf:resource="xsd:anySimpleType"/>
   <lit:pattern>0\.0</lit:pattern>
   <lit:pattern>-?0\.[0-9]*[1-9]</lit:pattern>
   <lit:pattern>-?[1-9][0-9]*\.[0-9]*[1-9]</lit:pattern>
</rdf:Description>

<rdf:Description rdf:about="xsd:float">
   <lit:mapsTo rdf:resource="&xsd;float"/>
   <lit:subTypeOf rdf:resource="xsd:anySimpleType"/>
   <!-- check for completeness -->
   <!-- canonical form should use fixed point notation! -->
   <lit:pattern>-?0\.[0-9]*[1-9]E-?[1-9][0-9]*</lit:pattern>
   <lit:pattern>-?[1-9][0-9]*\.[0-9]*[1-9]E-?[1-9][0-9]*</lit:pattern>
   <lit:pattern>INF</lit:pattern>
</rdf:Description>

<rdf:Description rdf:about="xsd:double">
   <lit:mapsTo rdf:resource="&xsd;double"/>
   <lit:subTypeOf rdf:resource="xsd:anySimpleType"/>
   <!-- check for completeness -->
   <!-- canonical form should use fixed point notation! -->
   <lit:pattern>-?0\.[0-9]*[1-9]E-?[1-9][0-9]*</lit:pattern>
   <lit:pattern>-?[1-9][0-9]*\.[0-9]*[1-9]E-?[1-9][0-9]*</lit:pattern>
   <lit:pattern>INF</lit:pattern>
   <lit:pattern>-INF</lit:pattern>
   <lit:pattern>NaN</lit:pattern>
</rdf:Description>

<rdf:Description rdf:about="xsd:duration">
   <lit:mapsTo rdf:resource="&xsd;duration"/>
   <lit:subTypeOf rdf:resource="xsd:anySimpleType"/>
   <!-- need to constrain digits? -->
 
<lit:pattern>-?P([0-9]+Y)?([0-9]+M)?([0-9]+D)?(T[0-9]+H)?([0-9]+M)?([0-9]+S)
?</lit:pattern>
   <lit:xpattern>-?P</lit:xpattern>
</rdf:Description>

<!-- Add lit:xpattern's to trap bogus date elements? -->

<rdf:Description rdf:about="xsd:dateTime">
   <lit:mapsTo rdf:resource="&xsd;dateTime"/>
   <lit:subTypeOf rdf:resource="xsd:anySimpleType"/>
 
<lit:pattern>-?[0-9]{4,}-[0-9]{2}-[0-9]{2}T(([01][0-9])|(2[0-3])):[0-5][0-9]
:[0-5][0-9]Z?</lit:pattern>
   <lit:xpattern>^0000-.*</lit:xpattern>
</rdf:Description>

<rdf:Description rdf:about="xsd:time">
   <lit:mapsTo rdf:resource="&xsd;time"/>
   <lit:subTypeOf rdf:resource="xsd:anySimpleType"/>
   <lit:pattern>(([01][0-9])|(2[0-3])):[0-5][0-9]:[0-5][0-9]Z?</lit:pattern>
</rdf:Description>

<rdf:Description rdf:about="xsd:date">
   <lit:mapsTo rdf:resource="&xsd;date"/>
   <lit:subTypeOf rdf:resource="xsd:anySimpleType"/>
   <lit:pattern>-?[0-9]{4,}-[0-9]{2}-[0-9]{2}</lit:pattern>
   <lit:xpattern>^0000-.*</lit:xpattern>
</rdf:Description>

<rdf:Description rdf:about="xsd:gYearMonth">
   <lit:mapsTo rdf:resource="&xsd;gYearMonth"/>
   <lit:subTypeOf rdf:resource="xsd:anySimpleType"/>
   <lit:pattern>-?[0-9]{4,}-[0-9]{2}</lit:pattern>
   <lit:xpattern>^0000-.*</lit:xpattern>
</rdf:Description>

<rdf:Description rdf:about="xsd:gYear">
   <lit:mapsTo rdf:resource="&xsd;gYear"/>
   <lit:subTypeOf rdf:resource="xsd:anySimpleType"/>
   <lit:pattern>-?[0-9]{4,}</lit:pattern>
   <lit:xpattern>0000</lit:xpattern>
</rdf:Description>

<rdf:Description rdf:about="xsd:gMonthDay">
   <lit:mapsTo rdf:resource="&xsd;gMonthDay"/>
   <lit:subTypeOf rdf:resource="xsd:anySimpleType"/>
   <lit:pattern>--[0-9]{2}[0-9]{2}</lit:pattern>
</rdf:Description>

<rdf:Description rdf:about="xsd:gDay">
   <lit:mapsTo rdf:resource="&xsd;gDay"/>
   <lit:subTypeOf rdf:resource="xsd:anySimpleType"/>
   <lit:pattern>---[0-9]{2}</lit:pattern>
</rdf:Description>

<rdf:Description rdf:about="xsd:gMonth">
   <lit:mapsTo rdf:resource="&xsd;gMonth"/>
   <lit:subTypeOf rdf:resource="xsd:anySimpleType"/>
   <lit:pattern>--[0-9]{2}--</lit:pattern>
</rdf:Description>

<rdf:Description rdf:about="xsd:hexBinary">
   <lit:mapsTo rdf:resource="&xsd;hexBinary"/>
   <lit:subTypeOf rdf:resource="xsd:anySimpleType"/>
   <lit:pattern>([0-9A-F]{2})+</lit:pattern>
</rdf:Description>

<rdf:Description rdf:about="xsd:base64Binary">
   <lit:mapsTo rdf:resource="&xsd;base64Binary"/>
   <lit:subTypeOf rdf:resource="xsd:anySimpleType"/>
   <lit:pattern>[\+/=0-9A-Za-z]+</lit:pattern>
</rdf:Description>

<rdf:Description rdf:about="xsd:anyURI">
   <lit:mapsTo rdf:resource="&xsd;anyURI"/>
   <lit:subTypeOf rdf:resource="xsd:anySimpleType"/>
   <!-- lit:pattern TBD -->
</rdf:Description>

<rdf:Description rdf:about="xsd:QName">
   <lit:mapsTo rdf:resource="&xsd;QName"/>
   <lit:subTypeOf rdf:resource="xsd:anySimpleType"/>
   <!-- lit:pattern TBD -->
</rdf:Description>

<rdf:Description rdf:about="xsd:NOTATION">
   <lit:mapsTo rdf:resource="&xsd;NOTATION"/>
   <lit:subTypeOf rdf:resource="xsd:anySimpleType"/>
   <!-- lit:pattern TBD -->
</rdf:Description>


<!-- Derived Data Types -->

<rdf:Description rdf:about="xsd:normalizedString">
   <lit:mapsTo rdf:resource="&xsd;normalizedString"/>
   <lit:subTypeOf rdf:resource="xsd:string"/>
   <lit:xpattern>.*#xD.*</lit:xpattern>
   <lit:xpattern>.*#x9.*</lit:xpattern>
</rdf:Description>

<rdf:Description rdf:about="xsd:token">
   <lit:mapsTo rdf:resource="&xsd;token"/>
   <lit:subTypeOf rdf:resource="xsd:normalizedString"/>
   <!-- should be type xsd:tokenizedString with subtype xsd:token -->
   <lit:xpattern>.*#xD.*</lit:xpattern>
   <lit:xpattern>.*#x9.*</lit:xpattern>
   <lit:xpattern>^#x20.*</lit:xpattern>
   <lit:xpattern>.*#x20$</lit:xpattern>
   <lit:xpattern>.*(#x20){2,}.*</lit:xpattern>
</rdf:Description>

<rdf:Description rdf:about="xsd:language">
   <lit:mapsTo rdf:resource="&xsd;language"/>
   <lit:subTypeOf rdf:resource="xsd:token"/>
   <!-- should be subTypeOf xsd:name? -->
   <!-- this needs to be more constrained -->
   <lit:pattern>[a-z]{2}</lit:pattern>
</rdf:Description>

<rdf:Description rdf:about="xsd:NMTOKEN">
   <lit:mapsTo rdf:resource="&xsd;NMTOKEN"/>
   <lit:subTypeOf rdf:resource="xsd:token"/>
   <lit:subTypeOf rdf:resource="xsd:NMTOKENS"/>
   <!-- lit:pattern TBD -->
</rdf:Description>

<rdf:Description rdf:about="xsd:NMTOKENS">
   <lit:mapsTo rdf:resource="&xsd;NMTOKENS"/>
   <lit:subTypeOf rdf:resource="xsd:token"/>
   <!-- lit:pattern TBD -->
</rdf:Description>

<rdf:Description rdf:about="xsd:Name">
   <lit:mapsTo rdf:resource="&xsd;Name"/>
   <lit:subTypeOf rdf:resource="xsd:token"/>
   <!-- lit:pattern TBD -->
</rdf:Description>

<rdf:Description rdf:about="xsd:NCName">
   <lit:mapsTo rdf:resource="&xsd;NCName"/>
   <lit:subTypeOf rdf:resource="xsd:name"/>
   <!-- lit:pattern TBD -->
</rdf:Description>

<rdf:Description rdf:about="xsd:ID">
   <lit:mapsTo rdf:resource="&xsd;ID"/>
   <lit:subTypeOf rdf:resource="xsd:NCName"/>
   <!-- lit:pattern TBD -->
</rdf:Description>

<rdf:Description rdf:about="xsd:IDREF">
   <lit:mapsTo rdf:resource="&xsd;IDREF"/>
   <lit:subTypeOf rdf:resource="xsd:NCName"/>
   <lit:subTypeOf rdf:resource="xsd:IDREFS"/>
   <!-- lit:pattern TBD -->
</rdf:Description>

<rdf:Description rdf:about="xsd:IDREFS">
   <lit:mapsTo rdf:resource="&xsd;IDREFS"/>
   <lit:subTypeOf rdf:resource="xsd:token"/>
   <!-- lit:pattern TBD -->
</rdf:Description>

<rdf:Description rdf:about="xsd:ENTITY">
   <lit:mapsTo rdf:resource="&xsd;ENTITY"/>
   <lit:subTypeOf rdf:resource="xsd:NCName"/>
   <lit:subTypeOf rdf:resource="xsd:ENTITIES"/>
   <!-- lit:pattern TBD -->
</rdf:Description>

<rdf:Description rdf:about="xsd:ENTITIES">
   <lit:mapsTo rdf:resource="&xsd;ENTITIES"/>
   <lit:subTypeOf rdf:resource="xsd:token"/>
   <!-- lit:pattern TBD -->
</rdf:Description>

<rdf:Description rdf:about="xsd:integer">
   <lit:mapsTo rdf:resource="&xsd;integer"/>
   <lit:subTypeOf rdf:resource="xsd:anySimpleType"/>
   <!-- canonical representation not valid for xsd:decimal! -->
   <lit:correspondsTo rdf:resource="&xsd;decimal"/>
   <lit:pattern>0</lit:pattern>
   <lit:pattern>-?[1-9][0-9]*</lit:pattern>
</rdf:Description>

<rdf:Description rdf:about="xsd:nonPositiveInteger">
   <lit:mapsTo rdf:resource="&xsd;nonPositiveInteger"/>
   <lit:subTypeOf rdf:resource="xsd:integer"/>
   <lit:pattern>-0</lit:pattern>
   <lit:pattern>-[1-9][0-9]*</lit:pattern>
</rdf:Description>

<rdf:Description rdf:about="xsd:negativeInteger">
   <lit:mapsTo rdf:resource="&xsd;negativeInteger"/>
   <lit:subTypeOf rdf:resource="xsd:nonPositiveInteger"/>
   <lit:pattern>-[1-9][0-9]*</lit:pattern>
</rdf:Description>

<rdf:Description rdf:about="xsd:long">
   <lit:mapsTo rdf:resource="&xsd;long"/>
   <lit:subTypeOf rdf:resource="xsd:integer"/>
   <lit:pattern>0</lit:pattern>
   <lit:pattern>-?[1-9][0-9]*</lit:pattern>
</rdf:Description>

<rdf:Description rdf:about="xsd:int">
   <lit:mapsTo rdf:resource="&xsd;int"/>
   <lit:subTypeOf rdf:resource="xsd:long"/>
   <!-- need to constrain between 2147483647 and -2147483648 -->
   <lit:pattern>0</lit:pattern>
   <lit:pattern>-?[1-9][0-9]*</lit:pattern>
</rdf:Description>

<rdf:Description rdf:about="xsd:short">
   <lit:mapsTo rdf:resource="&xsd;short"/>
   <lit:subTypeOf rdf:resource="xsd:int"/>
   <!-- need to constrain between 32767 and -32768 -->
   <lit:pattern>0</lit:pattern>
   <lit:pattern>-?[1-9][0-9]*</lit:pattern>
</rdf:Description>

<rdf:Description rdf:about="xsd:byte">
   <lit:mapsTo rdf:resource="&xsd;byte"/>
   <lit:subTypeOf rdf:resource="xsd:short"/>
   <!-- need to constrain between 127 and -128 -->
   <lit:pattern>0</lit:pattern>
   <lit:pattern>-?[1-9][0-9]*</lit:pattern>
</rdf:Description>

<rdf:Description rdf:about="xsd:nonNegativeInteger">
   <lit:mapsTo rdf:resource="&xsd;nonNegativeInteger"/>
   <lit:subTypeOf rdf:resource="xsd:integer"/>
   <lit:pattern>0</lit:pattern>
   <lit:pattern>[1-9][0-9]*</lit:pattern>
</rdf:Description>

<rdf:Description rdf:about="xsd:unsignedLong">
   <lit:mapsTo rdf:resource="&xsd;unsignedLong"/>
   <lit:subTypeOf rdf:resource="xsd:nonNegativeInteger"/>
   <!-- need to constrain below 18446744073709551615 -->
   <lit:pattern>0</lit:pattern>
   <lit:pattern>[1-9][0-9]*</lit:pattern>
</rdf:Description>

<rdf:Description rdf:about="xsd:unsignedInt">
   <lit:mapsTo rdf:resource="&xsd;unsignedInt"/>
   <lit:subTypeOf rdf:resource="xsd:unsignedLong"/>
   <!-- need to constrain below 4294967295 -->
   <lit:pattern>0</lit:pattern>
   <lit:pattern>[1-9][0-9]*</lit:pattern>
</rdf:Description>

<rdf:Description rdf:about="xsd:unsignedShort">
   <lit:mapsTo rdf:resource="&xsd;unsignedShort"/>
   <lit:subTypeOf rdf:resource="xsd:unsignedInt"/>
   <!-- need to constrain below 65535 -->
   <lit:pattern>0</lit:pattern>
   <lit:pattern>[1-9][0-9]*</lit:pattern>
</rdf:Description>

<rdf:Description rdf:about="xsd:unsignedByte">
   <lit:mapsTo rdf:resource="&xsd;unsignedByte"/>
   <lit:subTypeOf rdf:resource="xsd:unsignedShort"/>
   <!-- need to constrain below 255 -->
   <lit:pattern>0</lit:pattern>
   <lit:pattern>[1-9][0-9]*</lit:pattern>
</rdf:Description>

<rdf:Description rdf:about="xsd:positiveInteger">
   <lit:mapsTo rdf:resource="&xsd;positiveInteger"/>
   <lit:subTypeOf rdf:resource="xsd:nonNegativeInteger"/>
   <lit:pattern>[1-9][0-9]*</lit:pattern>
</rdf:Description>

</rdf:RDF>

=====

A few misc. comments...

I've noted that the XML Schema simple type hierarchy is not
quite perfect. There are a few comments to that end in the
above schema. In particular, the list versions of the token
subtypes seem "upside down" insofar as lexical space is
concerned, and integer shouldn't be a subtype of decimal,
etc. So we may end up with two distinct hierarchies, one
for value space, defined via rdfs:subClassOf and one for
lexical space, defined via lit:subTypeOf. Note that I have
not defined the rdfs:subClassOf relations between the
XML Schema simple type classes above, only their realization
as types of the 'xsd:' URV scheme.

Secondly, note that with a URV definition such as that for
the 'xsd:' scheme above, one need not use an XML Schema
engine for validation of lexical forms, but simply test
whether the value conforms to the specified patterns and 
xpatterns. Thus, a single function that provides regular
expression matching does the trick. This is also a good
thing because I am presuming that we will be allowing 
statements to be asserted via other interfaces than XML
serialization, and thus, this provides an XML-independent
means for testing lexical forms by type.

Finally, I am presuming that a URI scheme prefix is itself
a valid URI, which may be wrong (or undefined). Thus,
<xsd:integer> is the URI representing the URV scheme for
XML Schema integers, and is not the qname for the actual XML 
Schema class. Hopefully this distinction is clear in the 
schema above.

Cheers,

Patrick

--
               
Patrick Stickler              Phone: +358 50 483 9453
Senior Research Scientist     Fax:   +358 7180 35409
Nokia Research Center         Email: patrick.stickler@nokia.com

Received on Tuesday, 13 November 2001 16:01:34 UTC