Seperating simple and complex types

Simple and compex types have different domains. The former deal with atoms,
units of data without discernible substructure. The latter represent nested
structures. In its current incarnation, XML Schema intermingles the two,
allowing derivation of complex types from simple ones while deriving the
simple urType from the complex one.

To me, this state of affairs is unsatisfactory.

I propose the introduction of a new schema component, <text>, to achieve
complete seperation of type systems. This addition will increase the
expressive power of schemata.

<text> is a component hitherto specified implicitly: a text child of an
element. Under this scheme, simple types serve to validate text and
attributes. Complex types serve to validate elements only.

There is no need for a complex ancestor of the simple urType anymore --- the
simple urType becomes a proper root. There is no need for derivation of
complex types from simple types --- text content simply becomes part of the
content model. <simpleContent> and <complexContent> children of a
complexType are no longer necessary, and neither is the mixed attribute ---
content models become fully explicit. Thus, the type hierarchies are
completely seperated.

<text> admits a simple type and the usual occurrence bound attributes. To
preserve regular processing, all text children of an element must share the
same simple type, just as like-named elements do. Straightforward rules for
derivation apply: in derivation of complex types by restriction, text type
must be a restriction of the base type's text type. In derivation by
extension (which may one day cover more then appending), text type must be
an extension of the base type's, i.e., an ancestor type.

The following examples specify a complex type with integer content and an
integer attribute using the current notation and the proposed one.
Compactness and legibility of the new approach are evident:

    <extension base="integer">
      <attribute name="att" type=integer/>

  <text type="integer"/>
  <attribute name="att" type="integer"/>

Note that this proposal allows for finer control than previously available.
Text may be typed even if element children are present. Exact sequences of
text fragments and elements may be specified. Still, full generality is
retained: old-style mixed elements may be implemented through Kleene closure
on a choice element surrounding elements and text.

Markus L. Noga                 IPD Goos                Universitšt Karlsruhe

Received on Thursday, 22 March 2001 05:24:59 UTC