Feature requests for XML Schema 1.1

As an editor of the Market Data Definition Language (MDDL,
http://www.mddl.org/), I would like to propose a couple of extensions to W3C XML
Schema based on my experience with MDDL.  As MDDL is for transmitting market
data information (e.g. latest share prices), it may need to be resent
frequently, and so we have developed some shorthands which help keep the size of
the files down.  I believe these would prove useful for other MLs, if W3C XML
Schema were to support them.  However, let me start by giving a couple of short
pseudo-MDDL samples to give you a feel for the nature of the shorthands.
Suppose that an MDDL document contains the high, low, and closing prices for an
instrument (e.g. a share traded on the stock market).  The fragment containing
those prices could be as short as

<trade>
  <high>82.4</high>
  <low>78.8</low>
  <close>79.5</close>
</trade>

if the producer and subscriber both know the currency in advance (perhaps the
service only provides information from one exchange or country).  If you need to
add the currency explicitly, you can do so, but as MDDL has a "no mixed content"
rule, the textual values need to be enclosed an MDDL datatyping element:

<trade>
  <high>
    <mdDecimal>82.4</mdDecimal>
    <currency>USD</currency>
  </high>
  <low>
    <mdDecimal>78.8</mdDecimal>
    <currency>USD</currency>
  </low>
  <close>
    <mdDecimal>79.5</mdDecimal>
    <currency>USD</currency>
  </close>
</trade>

Alternatively, as MDDL treats most elements as inheritable properties, and
since the currencies are all the same here, the currency can be moved up to the
"trade" element:

<trade>
  <currency>USD</currency>
  <high>82.4</high>
  <low>78.8</low>
  <close>79.5</close>
</trade>

These examples show the shorthands that MDDL allows to reduce the overhead of
inlined data that is repeated on leaf nodes.  The things that MDDL needs from
W3C XML Schema in order to support these better are:

1. Support for "unmixed content".
W3C XML Schema allows mixed="true" to allow text content interspersed among the
element content of a complex type or element.  However, for data-oriented XML,
it is common to want to enforce a "no mixed content" rules, where text-only
content is allowable, or element-only content is allowable, but text
interspersed among elements is prohibited.
If "unmixed content" of this kind is explicitly supported, it should then be
possible to define the allowable type of the text content, either by specifying
a specific content model for the text, or by specifying that a particular
element's open/close tags can be suppressed if it is the only content (e.g.
<close>79.5</close> is the same as
<close><mdDecimal>79.5</mdDecimal></close> for MDDL, although our W3C XML
Schema can only validate the data value if the latter construction is used).

2. Support for "inheritance".
MDDL defines certain elements to be "inheritable" (most of its elements,
actually).  This means that if an element is undefined for a particular parent,
but is inheritable, the value from the nearest ancestor with a matching child
element (by name) is to be used.  This is how the 2nd example above was reduced
to the 3rd example.  It is a simple rule, and easy to implement using XSLT, for
example.  That said, it would be necessary for applications to be able to
request that all inherited values are inlined before they receive a SAX-stream
or DOM tree (or XML document) for processing, but that is typically a minor
processing task.

These are the two extensions that MDDL has found very useful, and I would be
very keen to see them supported in XML Schema 1.1 if at all possible.  I believe
they have much wider applicability for data-oriented XML.  I would appreciate
your comments.

	Cheers,
		Tony.
====
Anthony B. Coates, Information & Software Architect
mailto:abcoates@TheOffice.net
MDDL Editor (Market Data Definition Language)
http://www.mddl.org/

Received on Monday, 11 November 2002 07:54:48 UTC