- From: Philip Wadler <wadler@research.bell-labs.com>
- Date: Fri, 26 May 2000 14:55:22 -0400
- To: www-xml-schema-comments@w3.org
The structure of XML Schema --------------------------- A language is easier to analyze if it has a clear structure. XML Schema Part 1 makes a valiant effort to separate abstract syntax from concrete syntax, the former being treated in Section 3 and the latter in Section 4. Nonetheless, there are places where the chosen concrete syntax has affected the abstract syntax, we believe in a way that obscures the structure of XML Schema. We give two examples below. I am also asking the XML Query working group to support changes along these lines, but in writing this letter I am not acting as a representative of XML Query. Yours sincerely, Philip Wadler, Lucent A. Element declarations ------------------------ An `element' element serves three distinct purposes. 1. Global element declaration. At the top-level of a document, it can establish a global association between a given element name and a given type. It may also specify an equivalence class (superclass). at top level: <element name="..." type="..." equivClass="..."/> Instead of a type attribute, the type may appear as the content of the element. 2. Local element declaration. Within a model group, it can specify an element name and a type. This specifies that an element with the given name should appear with the given type. A multiplicity may also be specified. in model group: <element name="..." type="..." minoccurs="..." maxoccurs="..."/> Again, instead of a type attribute, the type may appear as the content of the element. 3. Global element reference. Within a model group, it can specify a reference to a global element declaration. This specifies that an element with the given name should appear with the associated type (as specified by the global element declaration). A multiplicity may also be specified. in model group: <element ref="..." minoccurs="..." maxoccurs="..."/> The element may have no content. More or less, a global element reference may always be replaced by a local element declaration, where the name and type are that of the referenced global element declaration. (The main exception is that an equivalence class (subclass) can be declared for a global element but not for a local element, which is unfortunate.) Clearly, these represent three distinct (though related) functionalities, as indicated by the fact that the legal set of attributes differs for all three. Suggested fixes: * The three distinct functionalities should be listed separately in Sections 3 and 4 of XML Schema Part 1. A similar attempt to separate functionality from syntax should be made for all of XML Schema Part 1. * It may also be helpful to use different element names to distinguish the different element functionalities. In that case, the Schema for Schemas could enforce the constraints on which attributes may appear with which functionalities. B. Multiplicities ------------------ The grammar of regular expressions in DTDs features three separate operators, sequence (comma), choice (bar), and repeat (star). In XML Schema, the first two of these are denoted by `sequence' and `choice' elements. However, the third does not appear separately, and instead `minOccurs' and `maxOccurs' may appear on every particle. It would better reflect the underlying structure of regular expressions to have a separate `repeat' element, with `min' and `max' attributes. For example, consider the DTD ab?(c|d)+ In the current XML Schema syntax, this is rendered as follows: <sequence> <element ref="a"/> <element ref="b" minOccurs="0" maxOccurs="1"/> <choice minOccurs="1" maxOccurs="*"> <element name="c"/> <element name="d"/> </choice> </sequence> It would be better to use a syntax along the following lines: <sequence> <element name="a"/> <repeat min="0" max="1"> <element name="b"/> </repeat> <repeat min="1" max="*"> <choice> <element name="c"/> <element name="d"/> </choice> </repeat> </sequence> One could also define <star>...</star> to abbreviate <repeat min="0" max="*">...</repeat> <plus>...</plus> to abbreviate <repeat min="1" max="*">...</repeat> <option>...</option> to abbreviate <repeat min="0" max="1">...</repeat> With these abbreviations, the above becomes <sequence> <element name="a"/> <plus> <element name="b"/> </plus> <option> <choice> <element name="c"/> <element name="d"/> </choice> </option> </sequence> This design is better for the following reasons. * The structure of the XML corresponds closely to the structure of the parse tree. This make it easier to read, easier to learn, and easier to build processors. * The definitions of other elements are simplified. One need not worry about which elements might have minOccurs and maxOccurs attached. (Just such a confusion triggered the analysis of `element', given above.)
Received on Friday, 26 May 2000 14:56:05 UTC