Re: LC-16 ( LC-132 ): Allow arbitrary order with occurrence > 1 from C. M. Sperberg-McQueen on 2000-10-11 (www-xml-schema-comments@w3.org from October to December 2000)

From: C. M. Sperberg-McQueen <cmsmcq@acm.org>
Date: Wed, 11 Oct 2000 09:47:04 -0600
To: "Martin J. Duerst" <duerst@w3.org>
Cc: "Martin Gudgin" <marting@develop.com>, "Schema Comments" <www-xml-schema-comments@w3.org>, "Dan Rupe" <Dan_Rupe@go.com>
Message-Id: <4.3.2.7.1.20001011090517.00b50848@espanola.com>
At 2000-10-11 02:40, Martin J. Duerst wrote:
>Hello Martin,
>
>In summary, I have to say that I'm not at all satisfied with the
>decision of the WG, and even less by the justification given below.

I'm sorry to hear that, but thank you for letting us know.

>>1.    complexity for schema processors
>
>It's a simple matter of counting, isn't it? I don't understand why
>this should be difficult. For the current version of the all group,
>a bit vector is needed to check that each element does occur according
>to the occurrence constraints. This has to be bumped up to a vector of
>integers. My guess is that this would take a few minutes in XSV, in fact
>it may be easier to implement from scratch, because there are no special
>restrictions on minOccurs and maxOccurs.

A bit vector is one way (I believe a fairly common one) of implementing
the and-connector; it is, however, not the only way.

Any formalism for defining languages is better at some things than
at others; adding ad hoc rules for what are thought to be special cases
is not usually thought to be the way to improve a system.  Is there a
reason to think that counting occurrencs in the way you suggest will be
an exception to the general rule?  Is there a general rule that suggests
a reason why we ought to expect this to be a common construct?  Could
you give a concrete use case for allowing an arbitrary sequence of
a, b, c, and d elements where (a) the sequence of the elements is
significant, (b) each element must occur some distinct number of times
(a one to four times, b exactly once, c ten to thirty times, and d
exactly three times)?  I have no trouble imagining users who say that
is what they want; I am having trouble imagining a case where they
are right.

>If there is something I have missed, please tell me.

Only the general principle that ad hoc solutions lead to odd hack
systems.

>>2.    the fact that the interpretation usually desired is incompatible with
>>that of SGML's ampersand connector
>
>I'm not sure I understand that. The all group is already different from
>SGML '&' anyway. And the interpretation is straightforward. The main
>simplification is provided by the fact that an all group can only
>occur directly in an element, without any children groups.
>I'm not at all suggesting to change that.

Every all group currently legal has a straightforward translation into
an SGML ampersand group which has exactly the same interpretation.
This is not true of the construct you propose.

>>3.    the feeling on the part of some WG members that this is not a pattern
>>of document design to be recommended or supported.
>
>There are definitely many cases where such a pattern is not desired.
>But there are definitely also cases where it's very helpful to have
>them. A typical example is metadata, e.g. the HTML <head> element.
>There, the <title> element can appear only once, the <meta> element
>can appear many times, and so on. The same thing can be expressed
>without this feature, but the resulting content models get clumsy
>and error-prone. For an example, please see
>http://lists.w3.org/Archives/Public/xmlschema-dev/2000Aug/0017.html.

With respect, the correct content model here does not seem to me
clumsy, and once the notion of deterministic content models is
clearly understood it is not hard to write, either:

   <element name='A'>
     <complexType content='elementOnly'>
       <sequence>
         <element ref='test:B' minOccurs='0' maxOccurs='unbounded'/>
         <sequence minOccurs="0" maxOccurs="1">
           <element ref='test:C' minOccurs='1' maxOccur="1"/>
           <element ref='test:B' minOccurs='0' maxOccurs='unbounded'/>
         </sequence>
       </sequence>
     </complexType>
   </element>

Or more compactly:  <!ELEMENT A (B*, (C, B*)?) >

A language which accepts a sequence of A, B, and C elements, with
at most one A and at most one B is a bit more complex, but not too
hard to work out.

   (c*, ((a, c*, (b, c*)?) | (b, c*, (a, c*)?))?)

The translation into regular expressions becomes tedious if there are
more than two items for which the maximum cardinality is bounded but
larger than, say, three.  If I were aware of lots of cases where such
languages were The Right Thing, I would be working a lot harder to
find good ways to integrate support for them into languages like
XML DTDs and XML Schema.  But so far I don't know any serious examples
and so I am left cold by the argument that writing a regular expression
which counts up to various numbers for various child elements is
too hard.

>For another example, please see
>http://slow1.w3.org/TR/xhtml-modularization/dtd_module_defs.html#a_module_Base:
><!ENTITY % head.content
>     "( %HeadOpts.mix;,
>      ( ( %title.qname;, %HeadOpts.mix;, ( %base.qname;, %HeadOpts.mix; )? )
>      | ( %base.qname;, %HeadOpts.mix;, ( %title.qname;, %HeadOpts.mix; ))))"
> >

A nice example of precisely the pattern shown above.  I don't think this
is hard to understand; do you?

I agree that it would be simpler to write and the result would be easier to
understand if the rules against non-deterministic content models were
eliminated.  But those rules have, in the view of the WG, compensating
advantages (they enable a guarantee that any schema language can be
written as an LL(1) language, for example, which means that recursive
descent parsers are easy to write).

>It is obvious that such things can be avoided for new designs, but
>it is questionable that this is always desirable, because it is
>a burden for an user to learn an arbitrary element sequence.

I agree that it is unpleasant for users to have to learn arbitrary
sequences of elements.  But this is necessary only when using tools
which have no support for syntax-directed editing.  Any SGML or
XML editor with schema awareness will remove the necessity for the
user to learn an arbitrary sequence of elements.

>Also, it is not clear to me why the current all group is considered
>a recommended or supportable design, whereas the changes I propose
>are not.

The current all-group closely models the rules for dumping or loading
rows in a relational table; this is one place where arbitrary order
has been most consistently desired by users.

>>It would be helpful to us to know whether you are satisfied with the
>>decision taken by the WG on this issue, or wish your dissent from the
>>WG's decision to be recorded for consideration by the Director of
>>the W3C.
>
>I not only wish the dissent to be recorded, I wish the decision to
>be better explained and if possible reverted.

Your dissent has been recorded.  I hope the paragraphs above have
made the decision clearer.

-Michael Sperberg-McQueen
Received on Wednesday, 11 October 2000 12:13:19 UTC