XML Schema Structure

The structure of XML Schema
---------------------------

A language is easier to analyze if it has a clear structure.
XML Schema Part 1 makes a valiant effort to separate abstract syntax
from concrete syntax, the former being treated in Section 3 and the
latter in Section 4.  Nonetheless, there are places where the chosen
concrete syntax has affected the abstract syntax, we believe in a
way that obscures the structure of XML Schema.  We give two examples
below.

I am also asking the XML Query working group to support changes
along these lines, but in writing this letter I am not acting as
a representative of XML Query.

Yours sincerely,

Philip Wadler, Lucent


A.  Element declarations
------------------------

An `element' element serves three distinct purposes.

1.  Global element declaration.  At the top-level of a document, it
can establish a global association between a given element name and a
given type.  It may also specify an equivalence class (superclass).

  at top level:
  <element name="..." type="..." equivClass="..."/>

Instead of a type attribute, the type may appear as the content
of the element.

2.  Local element declaration.  Within a model group, it can specify
an element name and a type.  This specifies that an element with the
given name should appear with the given type.  A multiplicity may also
be specified.

  in model group:
  <element name="..." type="..." minoccurs="..." maxoccurs="..."/>

Again, instead of a type attribute, the type may appear as the content
of the element.

3.  Global element reference.  Within a model group, it can specify a
reference to a global element declaration.  This specifies that an
element with the given name should appear with the associated type
(as specified by the global element declaration).  A multiplicity
may also be specified.

  in model group:
  <element ref="..." minoccurs="..." maxoccurs="..."/>

The element may have no content.  More or less, a global element
reference may always be replaced by a local element declaration, where
the name and type are that of the referenced global element
declaration.  (The main exception is that an equivalence class
(subclass) can be declared for a global element but not for a local
element, which is unfortunate.)

Clearly, these represent three distinct (though related)
functionalities, as indicated by the fact that the legal set of
attributes differs for all three.

Suggested fixes:

* The three distinct functionalities should be listed separately in
Sections 3 and 4 of XML Schema Part 1.  A similar attempt to
separate functionality from syntax should be made for all of
XML Schema Part 1.

* It may also be helpful to use different element names to distinguish
the different element functionalities.  In that case, the Schema for
Schemas could enforce the constraints on which attributes may appear
with which functionalities.


B.  Multiplicities
------------------

The grammar of regular expressions in DTDs features three separate
operators, sequence (comma), choice (bar), and repeat (star).  In XML
Schema, the first two of these are denoted by `sequence' and `choice'
elements.  However, the third does not appear separately, and instead
`minOccurs' and `maxOccurs' may appear on every particle.  It would
better reflect the underlying structure of regular expressions to have
a separate `repeat' element, with `min' and `max' attributes.

For example, consider the DTD

	ab?(c|d)+

In the current XML Schema syntax, this is rendered as follows:

	<sequence>
	  <element ref="a"/>
	  <element ref="b" minOccurs="0" maxOccurs="1"/>
	  <choice minOccurs="1" maxOccurs="*">
	    <element name="c"/>
	    <element name="d"/>
	  </choice>
	</sequence>

It would be better to use a syntax along the following lines:

	<sequence>
	  <element name="a"/>
	  <repeat min="0" max="1">
	    <element name="b"/>
	  </repeat>
	  <repeat min="1" max="*">
	    <choice>
	      <element name="c"/>
	      <element name="d"/>
	    </choice>
	  </repeat>
	</sequence>

One could also define

   <star>...</star>	to abbreviate	<repeat min="0" max="*">...</repeat>
   <plus>...</plus>	to abbreviate	<repeat min="1" max="*">...</repeat>
   <option>...</option>	to abbreviate	<repeat min="0" max="1">...</repeat>

With these abbreviations, the above becomes

	<sequence>
	  <element name="a"/>
	  <plus>
	    <element name="b"/>
	  </plus>
	  <option>
	    <choice>
	      <element name="c"/>
	      <element name="d"/>
	    </choice>
	  </option>
	</sequence>

This design is better for the following reasons.

* The structure of the XML corresponds closely to the structure of the
parse tree.  This make it easier to read, easier to learn, and easier
to build processors.

* The definitions of other elements are simplified.  One need not
worry about which elements might have minOccurs and maxOccurs
attached.  (Just such a confusion triggered the analysis of `element',
given above.)

Received on Friday, 26 May 2000 14:56:05 UTC