- From: <Noah_Mendelsohn@lotus.com>
- Date: Tue, 26 Jun 2001 21:22:15 -0400
- To: Kohsuke KAWAGUCHI <kohsukekawaguchi@yahoo.com>
- Cc: ochipara@cse.unl.edu, xmlschema-dev@w3.org
Kohsuke KAWAGUCHI writes:
>> My following article tries to propose a subset of XML Schema:
>> http://www.geocities.com/kohsukekawaguchi/XMLSchemaDOsAndDONTs.html
While everyone will have different opinions as to which features are best
skipped in a subset (I would put key/keyref high on the list), I think
that many of your suggestions are reasonable starting points either for
novices or perhaps for others wishing to use a more restricted language. I
do feel I should point out one aspect of your proposal that isn't quite
right: you suggest that use of complex types be avoided.
If you study the schema design carefully, you'll realize that this
suggestion means that no elements can have attributes, and no elements can
have other elements in their content. Surely that is not what you
intended. The following explanation is adapted from a note I wrote
earlier today on the same subject:
In general, you can think about every element as having a complex type,
except in the special case where its content happens to be a simple type
such as integer, with no attributes. Another way to think about it: we
intended complex types as the types you use on elements, simple types as
the ones you use on attributes. Start with that assumption and you will
be thinking right about most of the design. That said, since all the
content that is legal on attributes, such as integers, is also legal on
elements, we faced a choice. One way would have been to provide a complex
analog for every simple type. Had we done that, then all elements would
have complex type, and all attributes simple. On balance, we decided it
would work better to just allow elements to have either simple or complex
type. Still, there is a sense in which complex types are the types for
elements, and the ability to use simple types on elements is just a
convenience.
So, what you can do if you prefer is not to separately name your complex
types; you can do them all anonymously as part of element declarations.
What you lose, if that's the simplification you intended, is the ability
to model the commonality in data such as:
<WIDTH Units="cm">20</WIDTH>
<HEIGHT Units="cm">40</HEIGHT>
A plausible way to declare this in XML schema is:
<complexType name="measurementType">
<simpleContent>
<extension base="integer>
<!-- following could be enumeration
of cm, in, feet, etc. -->
<attribute name="Units" type="string"/>
</extension>
<simpleContent>
</complexType>
<element name="WIDTH" type="measurementType"/>
<element name="HEIGHT" type="measurementType"/>
So using a named complex type in this situation correctly captures what is
common between the two types of element. Higher level programs may
realize that the same Java class or C structure can be used to hold either
a width or a height. In general, when you generate the obvious Java
mappings from Schema, it's one Class per complex (or simple) type, one
member variable per element or attribute. If the shape is a square, you
can safely copy the data from width to height. You can validate the same
data without a named complex type (I.e. just define width and height
separately), but you have to keep the definitions in sync, and you don't
have any formal way to indicate that the structures are indeed common. If
you mapped to a language such as Java, you'd probably get two classes
where one would have been more appropriate.
Whether or not you choose to use explicitly named complex types, such as
the one in the example, you will definitely need at least anonymous
complex types for many of your elements. I hope this helps to clarify the
design and the terminology.
Regarding the need for the explicit markup such as <simpleContent>. I am
not that fond of it. Earlier versions of our design had less of that, but
they suffered for having lots of optional attributes on various schema
constructions. So, for example, there was a base= attribute on types
regardless of whether or not a derivation was actually being done. The
more verbose markup was to eliminate such optionality, and to allow
schemas themselves to be more rigorously validated and thus more easily
manipulated. I do think it is a nuisance when one is writing schemas
manually.
------------------------------------------------------------------------
Noah Mendelsohn Voice: 1-617-693-4036
Lotus Development Corp. Fax: 1-617-693-8676
One Rogers Street
Cambridge, MA 02142
------------------------------------------------------------------------
Received on Tuesday, 26 June 2001 21:27:25 UTC