LC-49 Streamlining restriction of content models from C. M. Sperberg-McQueen on 2000-10-06 (www-xml-schema-comments@w3.org from October to December 2000)

From: C. M. Sperberg-McQueen <cmsmcq@acm.org>
Date: Thu, 05 Oct 2000 18:54:16 -0600
To: "Curt Arnold" <carnold@houston.rr.com>, Jane Hunter <jane@dstc.edu.au>
Cc: W3C XML Schema Comments list <www-xml-schema-comments@w3.org>
Message-Id: <4.3.2.7.1.20001005185300.0217cdf0@espanola.com>
Dear Curt Arnold and Jane Hunter:

The W3C XML Schema Working Group has spent the last several months
working through the comments received from the public on the last-call
draft of the XML Schema specification.  We thank you for the comments
you made on our specification during our last-call comment period, and
want to make sure you know that all comments received during the
last-call comment period have been recorded in our last-call issues
list (http://www.w3.org/2000/05/12-xmlschema-lcissues).

Among other issues, the two of you independently raised the point
registered as issue LC-49, which suggests that the mechanisms for
restriction of content models be changed or more fully documented.

There is a great deal of sympathy in the WG for the view that it would
be nice to have a simpler mechanism.  Unfortunately, the mechanisms
which do seem simpler also seem, in the view of the WG, to involve
unacceptable tradeoffs.  The solution space in this area appears to
have three main regions:

   - We could provide a method of pointing just at that bit of the base
     type's content model which is to be changed, so that a derived
     type can specify only what changes from the base type.  This
     involves a pointing mechanism which (in all the proposals we
     have seen or considered) is potentially confusing and error prone,
     and destroys the locality of the declaration: there is no chance
     of understanding the derived type without reference to the base
     type; a series of such derivation steps is likely to be extremely
     confusing and error prone.  So on balance it seemed better to
     provide that the declaration for the derived type fully express the
     legal content model for elements of that type.

     We could specify that the derived type provide a content model
     which is checked in parallel to that of the base type, with the
     provision that instances of the derived type must satisfy both
     content models (i.e. the effective content model of the derived
     type is the intersection of its ostensible content model and the
     content model of its parent); this is similar to the effect of
     multiple derivation steps for simple types, each using the pattern
     facet.  While there was some sympathy for this approach in the WG
     (I voted for it, myself), the lack of locality and the resulting
     need to consult the entire series of ancestor types in order to
     understand the effective content model of a derived type seemed a
     serious flaw to the WG.  Another way to put it is this: when the
     content model for the derived type seems to allow any mixture of p
     and q elements, it is likely to be confusing to the user of a
     schema (and possibly to the schema author) if owing to some
     ancestor type actually no q elements are allowed.  There was also
     some concern about the potential implementation cost of actually
     calculating the intersections among the languages generated by the
     various content models (in the worst case, for large content
     models, this can become rather expensive); this was of minor
     importance to some WG members, but of fairly major importance to
     others.

   - We could specify (as we did) that the derived type must provide a
     content model fully expressing the legal set of instances of that
     type.  Here, the majority of the WG was concerned about the
     implications of allowing any legal formulation of that set and
     thus requiring the schema processor to incur a high worst-case
     cost in checking to see that the language generated by the content
     model of the derived type was a subset of the language generated
     by the ancestor type's content model.  The WG was unwilling to
     relax the rule that says the schema processor must check to ensure
     that restricted types are true restrictions of the base type; in
     order to help keep the cost of checking down, we imposed instead
     the rules that say, in effect, that the content model of the
     derived type must be isomorphic to that of the base type.  We are
     aware that the restrictions imposed do not make the easiest of
     reading; we hope, however, that in the light of the design issues
     just outlined you will find them clearer than on first reading.

It would be helpful to us to know whether you are satisfied with the
decision taken by the WG on this issue, or wish your dissent from the
WG's decision to be recorded for consideration by the Director of
the W3C.

best regards,

-C. M. Sperberg-McQueen
  World Wide Web Consortium
  Co-chair, W3C XML Schema WG
Received on Thursday, 5 October 2000 15:25:28 UTC