Re: Schema subset efforts

Kohsuke KAWAGUCHI writes:

>> My following article tries to propose a subset of XML Schema:

>> http://www.geocities.com/kohsukekawaguchi/XMLSchemaDOsAndDONTs.html

While everyone will have different opinions as to which features are best 
skipped in a subset (I would put key/keyref high on the list), I think 
that many of your suggestions are reasonable starting points either for 
novices or perhaps for others wishing to use a more restricted language. I 
do feel I should point out one aspect of your proposal that isn't quite 
right:  you suggest that use of complex types be avoided.

If you study the schema design carefully, you'll realize that this 
suggestion means that no elements can have attributes, and no elements can 
have other elements in their content.  Surely that is not what you 
intended.  The following explanation is adapted from a note I wrote 
earlier today on the same subject:

In general, you can think about every element as having a complex type, 
except in the special case where its content happens to be a simple type 
such as integer, with no attributes.  Another way to think about it:  we 
intended complex types as the types you use on elements, simple types as 
the ones you use on attributes.  Start with that assumption and you will 
be thinking right about most of the design.  That said, since all the 
content that is legal on attributes, such as integers, is also legal on 
elements, we faced a choice.  One way would have been to provide a complex 
analog for every simple type.  Had we done that, then all elements would 
have complex type, and all attributes simple.  On balance, we decided it 
would work better to just allow elements to have either simple or complex 
type.  Still, there is a sense in which complex types are the types for 
elements, and the ability to use simple types on elements is just a 
convenience.

So, what you can do if you prefer is not to separately name your complex 
types; you can do them all anonymously as part of element declarations. 
What you lose, if that's the simplification you intended, is the ability 
to model the commonality in data such as:

        <WIDTH  Units="cm">20</WIDTH>
        <HEIGHT Units="cm">40</HEIGHT>

A plausible way to declare this in XML schema is:

        <complexType name="measurementType">
          <simpleContent>
                <extension base="integer>
                        <!-- following could be enumeration
                             of cm, in, feet, etc. -->
                        <attribute name="Units" type="string"/>
                </extension>
          <simpleContent>
        </complexType>

        <element name="WIDTH" type="measurementType"/>
        <element name="HEIGHT" type="measurementType"/>

So using a named complex type in this situation correctly captures what is 
common between the two types of element.  Higher level programs may 
realize that the same Java class or C structure can be used to hold either 
a width or a height.  In general, when you generate the obvious Java 
mappings from Schema, it's one Class per complex (or simple) type, one 
member variable per element or attribute.  If the shape is a square, you 
can safely copy the data from width to height.  You can validate the same 
data without a named complex type (I.e. just define width and height 
separately), but you have to keep the definitions in sync, and you don't 
have any formal way to indicate that the structures are indeed common.  If 
you mapped to a language such as Java, you'd probably get two classes 
where one would have been more appropriate.

Whether or not you choose to use explicitly named complex types, such as 
the one in the example, you will definitely need at least anonymous 
complex types for many of your elements.  I hope this helps to clarify the 
design and the terminology.

Regarding the need for the explicit markup such as <simpleContent>.  I am 
not that fond of it.  Earlier versions of our design had less of that, but 
they suffered for having lots of optional attributes on various schema 
constructions.  So, for example, there was a base= attribute on types 
regardless of whether or not a derivation was actually being done.  The 
more verbose markup was to eliminate such optionality, and to allow 
schemas themselves to be more rigorously validated and thus more easily 
manipulated.  I do think it is a nuisance when one is writing schemas 
manually.

------------------------------------------------------------------------
Noah Mendelsohn                                    Voice: 1-617-693-4036
Lotus Development Corp.                            Fax: 1-617-693-8676
One Rogers Street
Cambridge, MA 02142
------------------------------------------------------------------------

Received on Tuesday, 26 June 2001 21:27:25 UTC