Appendix A: RDFL - An Ontology for Lexical Datatypes

This appendix outlines an ontology for defining and relating datatypes based on the TDL datatyping scheme described above. Note that this ontology is not necessary for the application of the TDL scheme, but serves as a convenient mechanism for validation and/or interpretation of literals. It can also be used solely for defining the relations between datatypes leaving the definition of and interpretation by their lexical spaces to other applications.

The goals of the RDFL ontology are to enable one to:

  1. define the lexical space of a datatype by regular expression
  2. define the canonical lexical space of a datatype by regular expression
  3. test if a given lexical form is a member of a lexical space
  4. test if a given lexical form is a member of a canonical lexical space
  5. define the relations between lexical datatypes in terms of lexical space and canonical lexical space

The RDFL ontology borrows the basic datatyping vocabulary and semantics from XML Schema but differs from XML Schema such that:

  1. it provides for explicit pattern exclusions
  2. it provides for patterns which differentiate the canonical lexical space from the lexical space of a datatype
  3. it allows the definition of relations between datatypes either in terms of value space alone, both lexical space and value space, or all three of canonical, lexical and value space
  4. it only supports the definition of datatypes by lexical characteristics (regular expression) -- not by any other facets
  5. the facets supported for list datatypes are restricted to those concerned with length, and length is tested according to the number of distinct sequences of whitespace in a lexical form
  6. an implicit initial '^' and final '$' is presumed for every pattern such that a pattern must match the entire lexical form

Regular expression values must conform to the encoding defined by XML Schema.

The complete machine readable version of the RDFL schema is available here.

See Appendix B for an RDFL Definition of the pre-defined XML Schema simple datatypes.

A.1 Lexical Datatypes


<rdfs:Class rdf:about="&rdfl;LexicalDatatype">
   <rdfs:subClassOf rdf:resource="&rdfs;Literal"/>
</rdfs:Class>

<rdfs:Property rdf:about="&rdfl;lexicalSubClassOf">
   <rdfs:subPropertyOf rdf:resource="&rdfs;subClassOf"/>
   <rdfs:domain        rdf:resource="&rdfl;LexicalDatatype"/>
   <rdfs:range         rdf:resource="&rdfl;LexicalDatatype"/>
</rdfs:Property>

<rdfs:Property rdf:about="&rdfl;pattern">
   <rdfs:domain rdf:resource="&rdfl;LexicalDatatype"/>
   <rdfs:range  rdf:resource="&rdfl;RegularExpression"/>
</rdfs:Property>

<rdfs:Property rdf:about="&rdfl;patternExclusion">
   <rdfs:domain rdf:resource="&rdfl;LexicalDatatype"/>
   <rdfs:range  rdf:resource="&rdfl;RegularExpression"/>
</rdfs:Property>

<rdfs:Property rdf:about="&rdfl;patternDependencyOn">
   <rdfs:domain        rdf:resource="&rdfl;LexicalDatatype"/>
   <rdfs:range         rdf:resource="&rdfl;LexicalDatatype"/>
</rdfs:Property>

An rdfl:LexicalDatatype has both value space and lexical space.

Lexical space is defined lexically, by a set of inclusive and exclusive regular expressions, and in the context of all superordinate rdfl:LexicalDatatypes related by rdfl:lexicalSubClassOf.

Lexical and canonical spaces have no explicit URI identity. Only the datatype itself.

rdfs:subClassOf relates value spaces only.

rdfl:lexicalSubClassOf relates both lexical & value spaces.

For any rdfl:LexicalDatatype X which is an rdfl:LexicalSubClassOf some rdfl:LexicalDatatype Y, all members of the lexical space of X are also members of the lexical space of Y.

The patterns defined for an rdfl:LexicalDatatype may be dependent on patterns defined for another type, as indicated by the value of the rdfl:patternDependencyOn property. Prior to matching the local patterns for a datatype, all types that it is dependent on must first successfully match. This constraint is enforced by the validation algorithm specified below.

Subordinate rdfl:LexicalDatatypes related by rdfl:lexicalSubClassOf may only restrict, not extend, the lexical spaces of superordinate rdfl:LexicalDatatypes. This constraint is enforced by the lexical subset validation algorithm defined below, in that a lexical form that fails to match the lexical definition of all superordinate rdfl:LexicalDatatypes will fail to match the local rdfl:LexicalDatatype.

A.2 Canonical (Lexical) Datatypes


<rdfs:Class rdf:about="&rdfl;CanonicalDatatype">
   <rdfs:subClassOf rdf:resource="&rdfl;LexicalDatatype"/>
</rdfs:Class>

<rdfs:Property rdf:about="&rdfl;canonicalSubClassOf">
   <rdfs:subPropertyOf rdf:resource="&rdfl;lexicalSubClassOf"/>
   <rdfs:domain        rdf:resource="&rdfl;CanonicalDatatype"/>
   <rdfs:range         rdf:resource="&rdfl;CanonicalDatatype"/>
</rdfs:Property>

<rdfs:Property rdf:about="&rdfl;canonicalPattern">
   <rdfs:domain rdf:resource="&rdfl;CanonicalDatatype"/>
   <rdfs:range  rdf:resource="&rdfl;RegularExpression"/>
</rdfs:Property>

<rdfs:Property rdf:about="&rdfl;canonicalPatternExclusion">
   <rdfs:domain rdf:resource="&rdfl;CanonicalDatatype"/>
   <rdfs:range  rdf:resource="&rdfl;RegularExpression"/>
</rdfs:Property>

Canonical datatype has value space, lexical space, and canonical lexical space. The lexical space and canonical lexical space may be identitical.

An rdfl:CanonicalDatatype is an rdfs:subClassOf rdfl:LexicalDatatype.

rdfl:canonicalSubClassOf relates lexical, canonical, & value spaces.

Canonical lexical space is defined lexically, by a set of inclusive and exclusive regular expressions, and in the context of all superordinate rdfl:LexicalDatatypes related by either rdfl:lexicalSubClassOf or rdfl:canonicalSubClassOf.

For any rdfl:CanonicalDatatype X which is an rdfl:lexicalSubClassOf some rdfl:LexicalDatatype Y, all members of the lexical space of X are also members of the lexical space of Y.

Likewise, for any rdfl:CanonicalDatatype X which is an rdfl:canonicalSubClassOf some rdfl:CanonicalDatatype Z, all members of the canonical lexical space of X are also members of the canonical lexical space of Z.

Note that not all rdfl:CanonicalDatatypes having an rdfl:lexicalSubClassOf relation also have an rdfl:canonicalSubClassOf relation. Consider xsd:integer and xsd:decimal, where xsd:integer is an rdfl:lexicalSubClassOf xsd:decimal but not an rdfl:canonicalSubClassOf of xsd:decimal because the canonical lexical space of xsd:integer is not a subset of the canonical lexical space of xsd:decimal (e.g. "5" is a member of the canonical lexical space of xsd:integer but is not a member of the canonical lexical space of xsd:decimal).

As with rdfl:LexicalDatatype, the patterns defined for an rdfl:CanonicalDatatype may be dependent on patterns defined for another type, as indicated by the value of the rdfl:patternDependencyOn property. Prior to matching the local patterns for a datatype, all types that it is dependent on must first successfully match. This constraint is enforced by the validation algorithm specified below.

As with rdfl:LexicalDatatypes, subordinate rdfl:CanonicalDatatypes related by rdfl:canonicalSubClassOf may only restrict, not extend, the canonical lexical spaces of superordinate rdfl:CanonicalDatatypes. This constraint is enforced by the lexical subset validation algorithm defined below, in that a lexical form that fails to match the canonical lexical definition of all superordinate rdfl:CanonicalDatatypes will fail to match the local rdfl:CanonicalDatatype.

A.4 Union Datatypes


<rdfs:Class rdf:about="&rdfl;UnionDatatype"/>

<rdfs:Property rdf:about="&rdfl;memberType">
   <rdfs:domain rdf:resource="&rdfl;UnionDatatype"/>
   <rdfs:range  rdf:resource="&rdfl;LexicalDatatype"/>
</rdfs:Property>

An rdfl:UnionDatatype represents the definition of a superordinate datatype shared by all of the union member datatypes. The rdfl:memberType relation is the inverse of the rdfl:lexicalSubClassOf relation, such that it syndicates of the definitions of the subordinate datatypes, rather than constraining the definition of a superordinate datatype.

An rdfl:UnionDatatype may only syndicate, but neither extend nor constrain, the value space or lexical spaces of its members. This is enforced by the validation algorithm below in that a lexical form that fails to match the lexical definition of at least one member datatype is not a valid lexical form for the union datatype.

Because XML Schema defines an explicit order of interpretation for datatype unions, one would expect that one would define rdfl:memberType relations using an rdf:Seq collection. E.g.


    <rdfl:UnionDatatype rdf:about="#myUnionDatatype">
      <rdfl:memberType>
        <rdf:Seq>
          <rdf:li rdf:resource="xsd:integer"/>
          <rdf:li rdf:resource="xsd:date"/>
          <rdf:li rdf:resource="xsd:string"/>
        </rdf:Seq>
      </rdfl:memberType>
    </rdfl:UnionDatatype>

where when a specific member datatype is to be deduced for a given literal, it is first checked to see if it is a valid lexical form for xsd:integer, and if so, it is interpreted as an xsd:integer, if not, then it is checked to see if it is a valid lexical form for xsd:date, and if so, it is intrepreted as an xsd:date, else it is interpreted as an xsd:string.

A.5 List Datatypes


<rdfs:Class rdf:about="&rdfl;ListDatatype"/>

<rdfs:Property rdf:about="&rdfl;itemType">
   <rdfs:domain rdf:resource="&rdfl;ListDatatype"/>
   <rdfs:range  rdf:resource="&rdfl;LexicalDatatype"/>
</rdfs:Property>

<rdfs:Property rdf:about="&rdfl;length">
   <rdfs:domain rdf:resource="&rdfl;ListDatatype"/>
</rdfs:Property>

<rdfs:Property rdf:about="&rdfl;maxLength">
   <rdfs:domain rdf:resource="&rdfl;ListDatatype"/>
</rdfs:Property>

<rdfs:Property rdf:about="&rdfl;minLength">
   <rdfs:domain rdf:resource="&rdfl;ListDatatype"/>
</rdfs:Property>

... discussion ...

A.6 Validation of Lexical Forms

Algorithm for the Validation of lexical forms:

(this is *very* informal psuedocode... will refine it into something more formal before done...)

If any MATCH operation fails, the entire test fails.


IF (rdfl:CanonicalDatatype)
{
   MATCH every rdfl:patternDependencyOn type
   IF (any rdfl:pattern defined)
   {
      MATCH any rdfl:pattern
   }
   MATCH no rdfl:patternExclusion
   IF (canonical membership test)
   {
      IF (any rdfl:canonicalPattern defined)
      {
         MATCH any rdfl:canonicalPattern
      }
      MATCH no rdfl:canonicalPatternExclusion
   }
}
ELSE IF (rdfl:LexicalDatatype)
{
   MATCH every rdfl:patternDependencyOn type
   IF (any rdfl:pattern defined)
   {
      MATCH any rdfl:pattern
   }
   MATCH no rdfl:patternExclusion
}
ELSE IF (rdfl:UnionDatatype)
{
   MATCH any rdfl:memberType component type
}
ELSE IF (rdfl:ListDatatype)
{
   IF (rdfl:length is defined AND list length not equal to rdfl:length value)
   {
      FAIL
   }
   IF (rdfl:minLength is defined AND list length less than rdfl:minLength value)
   {
      FAIL
   }
   IF (rdfl:maxLength is defined AND list length greater than rdfl:maxLength value)
   {
      FAIL
   }
   FOREACH (whitespace delimited lexical form)
   {
      MATCH any rdfl:itemType component type
   }
}
ELSE
{
   FAIL
}

A.7 Lexical Subset Validation of Lexical Forms

Algorithm for the Validation of lexical forms based on subclass relations in addition to pattern dependency relations, ensuring lexical subset conformance for that lexical form for all superordinate lexical datatypes:

(this is *very* informal psuedocode... will refine it into something more formal before done...)

If any MATCH operation fails, the entire test fails.


IF (rdf:type is rdfl:CanonicalDatatype)
{
   MATCH every rdfl:lexicalSubClassOf superordinate type
   MATCH every rdfl:patternDependencyOn type
   IF (any rdfl:pattern defined)
   {
      MATCH any rdfl:pattern
   }
   MATCH no rdfl:patternExclusion
   IF (canonical membership test)
   {
      MATCH every rdfl:canonicalSubClassOf superordinate type with canonical test
      IF (any rdfl:canonicalPattern defined)
      {
         MATCH any rdfl:canonicalPattern
      }
      MATCH no rdfl:canonicalPatternExclusion
   }
}
ELSE IF (rdf:type is rdfl:LexicalDatatype)
{
   MATCH every rdfl:lexicalSubClassOf superordinate type
   MATCH every rdfl:patternDependencyOn type
   IF (any rdfl:pattern defined)
   {
      MATCH any rdfl:pattern
   }
   MATCH no rdfl:patternExclusion
}
ELSE IF (rdf:type is rdfl:UnionDatatype)
{
   MATCH any rdfl:memberType component type
}
ELSE IF (rdf:type is rdfl:ListDatatype)
{
   IF (rdfl:length is defined AND list length not equal to rdfl:length value)
   {
      FAIL
   }
   IF (rdfl:minLength is defined AND list length less than rdfl:minLength value)
   {
      FAIL
   }
   IF (rdfl:maxLength is defined AND list length greater than rdfl:maxLength value)
   {
      FAIL
   }
   FOREACH (whitespace delimited lexical form)
   {
      MATCH any rdfl:itemType component type
   }
}
ELSE
{
   FAIL
}

Appendix B: An RDFL Definition of XML Schema Simple Datatypes

The following is an RDF Schema definition of the pre-defined XML Schema simple datatypes [XSD] according to the RDFL ontology defined above, omitting any definition of their lexical spaces.

A more comprehensive schema, presently in draft form, which includes pattern statements defining their lexical and canonical lexical spaces can be found here, and when completed will be suitable for validation of typed data literals based on the validation algoritm defined for RDFL to determine the integrity of datatyping knowledge relating to XML Schema simple datatypes, according to either canonical or non-canonical lexical spaces, independent of a complete XML Schema validator.


<?xml version="1.0"?>

<!DOCTYPE uridef [
  <!ENTITY rdf  "http://www.w3.org/1999/02/22-rdf-syntax-ns#">
  <!ENTITY rdfs "http://www.w3.org/2000/01/rdf-schema#">
  <!ENTITY xsd  "http://www.w3.org/2001/XMLSchema#">
  <!ENTITY rdfl "rdfl:">
]>

<rdf:RDF
   xmlns:rdf   ="&rdf;"
   xmlns:rdfs  ="&rdfs;"
   xmlns:xsd   ="&xsd;"
   xmlns:rdfl  ="&rdfl;"
>

<rdfl:LexicalDatatype rdf:about="&xsd;anySimpleType"/>

<rdfl:LexicalDatatype rdf:about="&xsd;string">
   <rdfl:lexicalSubClassOf rdf:resource="&xsd;anySimpleType"/>
</rdfl:LexicalDatatype>

<rdfl:CanonicalDatatype rdf:about="&xsd;boolean">
   <rdfl:lexicalSubClassOf rdf:resource="&xsd;anySimpleType"/>
</rdfl:CanonicalDatatype>

<rdfl:CanonicalDatatype rdf:about="&xsd;decimal">
   <rdfl:lexicalSubClassOf rdf:resource="&xsd;anySimpleType"/>
</rdfl:CanonicalDatatype>

<rdfl:CanonicalDatatype rdf:about="&xsd;float">
   <rdfl:lexicalSubClassOf rdf:resource="&xsd;anySimpleType"/>
</rdfl:CanonicalDatatype>

<rdfl:CanonicalDatatype rdf:about="&xsd;double">
   <rdfl:lexicalSubClassOf rdf:resource="&xsd;anySimpleType"/>
</rdfl:CanonicalDatatype>

<rdfl:CanonicalDatatype rdf:about="&xsd;duration">
   <rdfl:lexicalSubClassOf rdf:resource="&xsd;anySimpleType"/>
</rdfl:CanonicalDatatype>

<rdfl:CanonicalDatatype rdf:about="&xsd;dateTime">
   <rdfl:lexicalSubClassOf rdf:resource="&xsd;anySimpleType"/>
</rdfl:CanonicalDatatype>

<rdfl:CanonicalDatatype rdf:about="&xsd;time">
   <rdfl:lexicalSubClassOf rdf:resource="&xsd;anySimpleType"/>
</rdfl:CanonicalDatatype>

<rdfl:CanonicalDatatype rdf:about="&xsd;date">
   <rdfl:lexicalSubClassOf rdf:resource="&xsd;anySimpleType"/>
</rdfl:CanonicalDatatype>

<rdfl:CanonicalDatatype rdf:about="&xsd;gYearMonth">
   <rdfl:lexicalSubClassOf rdf:resource="&xsd;anySimpleType"/>
</rdfl:CanonicalDatatype>

<rdfl:CanonicalDatatype rdf:about="&xsd;gYear">
   <rdfl:lexicalSubClassOf rdf:resource="&xsd;anySimpleType"/>
</rdfl:CanonicalDatatype>

<rdfl:CanonicalDatatype rdf:about="&xsd;gMonthDay">
   <rdfl:lexicalSubClassOf rdf:resource="&xsd;anySimpleType"/>
</rdfl:CanonicalDatatype>

<rdfl:CanonicalDatatype rdf:about="&xsd;gDay">
   <rdfl:lexicalSubClassOf rdf:resource="&xsd;anySimpleType"/>
</rdfl:CanonicalDatatype>

<rdfl:CanonicalDatatype rdf:about="&xsd;gMonth">
   <rdfl:lexicalSubClassOf rdf:resource="&xsd;anySimpleType"/>
</rdfl:CanonicalDatatype>

<rdfl:CanonicalDatatype rdf:about="&xsd;hexBinary">
   <rdfl:lexicalSubClassOf rdf:resource="&xsd;anySimpleType"/>
</rdfl:CanonicalDatatype>

<rdfl:LexicalDatatype rdf:about="&xsd;base64Binary">
   <rdfl:lexicalSubClassOf rdf:resource="&xsd;anySimpleType"/>
</rdfl:LexicalDatatype>

<rdfl:LexicalDatatype rdf:about="&xsd;anyURI">
   <rdfl:lexicalSubClassOf rdf:resource="&xsd;anySimpleType"/>
</rdfl:LexicalDatatype>

<rdfl:LexicalDatatype rdf:about="&xsd;QName">
   <rdfl:lexicalSubClassOf rdf:resource="&xsd;anySimpleType"/>
</rdfl:LexicalDatatype>

<rdfl:LexicalDatatype rdf:about="&xsd;NOTATION">
   <rdfl:lexicalSubClassOf rdf:resource="&xsd;anySimpleType"/>
</rdfl:LexicalDatatype>

<rdfl:LexicalDatatype rdf:about="&xsd;normalizedString">
   <rdfl:lexicalSubClassOf rdf:resource="&xsd;string"/>
</rdfl:LexicalDatatype>

<rdfl:LexicalDatatype rdf:about="&xsd;token">
   <rdfl:lexicalSubClassOf rdf:resource="&xsd;normalizedString"/>
</rdfl:LexicalDatatype>

<rdfl:CanonicalDatatype rdf:about="&xsd;language">
   <rdfl:lexicalSubClassOf rdf:resource="&xsd;token"/>
</rdfl:CanonicalDatatype>

<rdfl:LexicalDatatype rdf:about="&xsd;NMTOKEN">
   <rdfl:lexicalSubClassOf rdf:resource="&xsd;token"/>
</rdfl:LexicalDatatype>

<rdfl:ListDatatype rdf:about="&xsd;NMTOKENS">
   <rdfl:itemType rdf:resource="&xsd;NMTOKEN"/>
</rdfl:ListDatatype>

<rdfl:LexicalDatatype rdf:about="&xsd;Name">
   <rdfl:lexicalSubClassOf rdf:resource="&xsd;token"/>
</rdfl:LexicalDatatype>

<rdfl:LexicalDatatype rdf:about="&xsd;NCName">
   <rdfl:lexicalSubClassOf rdf:resource="&xsd;name"/>
</rdfl:LexicalDatatype>

<rdfl:LexicalDatatype rdf:about="&xsd;ID">
   <rdfl:lexicalSubClassOf rdf:resource="&xsd;NCName"/>
</rdfl:LexicalDatatype>

<rdfl:LexicalDatatype rdf:about="&xsd;IDREF">
   <rdfl:lexicalSubClassOf rdf:resource="&xsd;NCName"/>
</rdfl:LexicalDatatype>

<rdfl:ListDatatype rdf:about="&xsd;IDREFS">
   <rdfl:itemType rdf:resource="&xsd;IDREF"/>
</rdfl:ListDatatype>

<rdfl:LexicalDatatype rdf:about="&xsd;ENTITY">
   <rdfl:lexicalSubClassOf rdf:resource="&xsd;NCName"/>
</rdfl:LexicalDatatype>

<rdfl:ListDatatype rdf:about="&xsd;ENTITIES">
   <rdfl:itemType rdf:resource="&xsd;ENTITY"/>
</rdfl:ListDatatype>

<rdfl:CanonicalDatatype rdf:about="&xsd;integer">
   <rdfl:lexicalSubClassOf rdf:resource="&xsd;decimal"/>
   <!--
      Note that xsd:integer is not an rdfl:canonicalSubClassOf but
      only an rdfl:lexicalSubClassOf xsd:decimal because the canonical
      lexical space for xsd:integer is not a subset of the canonical
      lexical space of xsd:decimal.
   -->
</rdfl:CanonicalDatatype>

<rdfl:CanonicalDatatype rdf:about="&xsd;nonPositiveInteger">
   <rdfl:canonicalSubClassOf rdf:resource="&xsd;integer"/>
</rdfl:CanonicalDatatype>

<rdfl:CanonicalDatatype rdf:about="&xsd;negativeInteger">
   <rdfl:canonicalSubClassOf rdf:resource="&xsd;nonPositiveInteger"/>
</rdfl:CanonicalDatatype>

<rdfl:CanonicalDatatype rdf:about="&xsd;long">
   <rdfl:canonicalSubClassOf rdf:resource="&xsd;integer"/>
</rdfl:CanonicalDatatype>

<rdfl:CanonicalDatatype rdf:about="&xsd;int">
   <rdfl:canonicalSubClassOf rdf:resource="&xsd;long"/>
</rdfl:CanonicalDatatype>

<rdfl:CanonicalDatatype rdf:about="&xsd;short">
   <rdfl:canonicalSubClassOf rdf:resource="&xsd;int"/>
</rdfl:CanonicalDatatype>

<rdfl:CanonicalDatatype rdf:about="&xsd;byte">
   <rdfl:canonicalSubClassOf rdf:resource="&xsd;short"/>
</rdfl:CanonicalDatatype>

<rdfl:CanonicalDatatype rdf:about="&xsd;nonNegativeInteger">
   <rdfl:canonicalSubClassOf rdf:resource="&xsd;integer"/>
</rdfl:CanonicalDatatype>

<rdfl:CanonicalDatatype rdf:about="&xsd;unsignedLong">
   <rdfl:canonicalSubClassOf rdf:resource="&xsd;nonNegativeInteger"/>
</rdfl:CanonicalDatatype>

<rdfl:CanonicalDatatype rdf:about="&xsd;unsignedInt">
   <rdfl:canonicalSubClassOf rdf:resource="&xsd;unsignedLong"/>
</rdfl:CanonicalDatatype>

<rdfl:CanonicalDatatype rdf:about="&xsd;unsignedShort">
   <rdfl:canonicalSubClassOf rdf:resource="&xsd;unsignedInt"/>
</rdfl:CanonicalDatatype>

<rdfl:CanonicalDatatype rdf:about="&xsd;unsignedByte">
   <rdfl:canonicalSubClassOf rdf:resource="&xsd;unsignedShort"/>
</rdfl:CanonicalDatatype>

<rdfl:CanonicalDatatype rdf:about="&xsd;positiveInteger">
   <rdfl:canonicalSubClassOf rdf:resource="&xsd;nonNegativeInteger"/>
</rdfl:CanonicalDatatype>

</rdf:RDF>