Re: LC-16 ( LC-132 ): Allow arbitrary order with occurrence > 1 from Ivan Kurmanov on 2000-10-14 (www-xml-schema-comments@w3.org from October to December 2000)

From: Ivan Kurmanov <ivan@tm.minsk.by>
Date: Sat, 14 Oct 2000 14:50:00 +0300
To: www-xml-schema-comments@w3.org
Message-ID: <15618.001014@tm.minsk.by>
Dear Sirs,

With respect to the XML Schema working group and to the SGML traditions,
I have several questions about the Last-Call issue LC-16 resolution.

>>>1.    complexity for schema processors

Implementing the all group with occurrence > 1 validation is much more
simple than implementing element sequence validation.  I tried both.

>>It's a simple matter of counting, isn't it? I don't understand why
>>this should be difficult. For the current version of the all group,
>>a bit vector is needed to check that each element does occur according
>>to the occurrence constraints. This has to be bumped up to a vector of
>>integers. My guess is that this would take a few minutes in XSV, in fact
>>it may be easier to implement from scratch, because there are no special
>>restrictions on minOccurs and maxOccurs.
> 
> A bit vector is one way (I believe a fairly common one) of implementing
> the and-connector; it is, however, not the only way.
> 
> Any formalism for defining languages is better at some things than
> at others; adding ad hoc rules for what are thought to be special cases
> is not usually thought to be the way to improve a system.  Is there a
> reason to think that counting occurrencs in the way you suggest will be
> an exception to the general rule?  Is there a general rule that suggests
> a reason why we ought to expect this to be a common construct?

I understand this as: you saying the proposed change (maxOccur > 1) will
bring more complexity to the processor than benefits to the users.
Benefits are difficult to estimate, but the complexity introduced is
really minimal.  Do you agree?

> Could you give a concrete use case for allowing an arbitrary sequence
> of a, b, c, and d elements where (a) the sequence of the elements is
> significant, (b) each element must occur some distinct number of times
> (a one to four times, b exactly once, c ten to thirty times, and d
> exactly three times)? I have no trouble imagining users who say that is
> what they want; I am having trouble imagining a case where they are
> right.

I'm representing a project, with metadata interests (www.repec.org) which
is in a need to model all groups with unbounded maxOccur.
<paper>   <!-- a working paper description with two authors,
          consisting of three files -->
   <author />
   <author />
   <title />
   <abstract />
   <file />
   <file />
   <file />
   <note />
   <length />
   <price />
   <classification />
   <published-as />
</paper>

I have no idea about a general rule which will help to establish some
certain order of such elements

> 
>>If there is something I have missed, please tell me.
> 
> Only the general principle that ad hoc solutions lead to odd hack
> systems.
>
...


>>>2.    the fact that the interpretation usually desired is incompatible with
>>>that of SGML's ampersand connector
>>
>>I'm not sure I understand that. The all group is already different from
>>SGML '&' anyway. And the interpretation is straightforward. The main
>>simplification is provided by the fact that an all group can only
>>occur directly in an element, without any children groups.
>>I'm not at all suggesting to change that.
> 
> Every all group currently legal has a straightforward translation into
> an SGML ampersand group which has exactly the same interpretation.
> This is not true of the construct you propose.

Was compatibility with SGML one of the XML Schema design goals?

> 
>>>3.    the feeling on the part of some WG members that this is not a pattern
>>>of document design to be recommended or supported.
>>
>>There are definitely many cases where such a pattern is not desired.
>>But there are definitely also cases where it's very helpful to have
>>them. A typical example is metadata, e.g. the HTML <head> element.
>>There, the <title> element can appear only once, the <meta> element
>>can appear many times, and so on. The same thing can be expressed
>>without this feature, but the resulting content models get clumsy
>>and error-prone. For an example, please see
>>http://lists.w3.org/Archives/Public/xmlschema-dev/2000Aug/0017.html.
> 
> With respect, the correct content model here does not seem to me
> clumsy, and once the notion of deterministic content models is
> clearly understood it is not hard to write, either:
> 
>    <element name='A'>
>      <complexType content='elementOnly'>
>        <sequence>
>          <element ref='test:B' minOccurs='0' maxOccurs='unbounded'/>
>          <sequence minOccurs="0" maxOccurs="1">
>            <element ref='test:C' minOccurs='1' maxOccur="1"/>
>            <element ref='test:B' minOccurs='0' maxOccurs='unbounded'/>
>          </sequence>
>        </sequence>
>      </complexType>
>    </element>
> 
> Or more compactly:  <!ELEMENT A (B*, (C, B*)?) >
> 
> A language which accepts a sequence of A, B, and C elements, with
> at most one A and at most one B is a bit more complex, but not too
> hard to work out.
> 
>    (c*, ((a, c*, (b, c*)?) | (b, c*, (a, c*)?))?)
> 
> The translation into regular expressions becomes tedious if there are
> more than two items for which the maximum cardinality is bounded but
> larger than, say, three.  If I were aware of lots of cases where such
> languages were The Right Thing, I would be working a lot harder to
> find good ways to integrate support for them into languages like
> XML DTDs and XML Schema.  But so far I don't know any serious examples
> and so I am left cold by the argument that writing a regular expression
> which counts up to various numbers for various child elements is
> too hard.
> 
>>For another example, please see
>>http://slow1.w3.org/TR/xhtml-modularization/dtd_module_defs.html#a_module_Base:
>><!ENTITY % head.content
>>     "( %HeadOpts.mix;,
>>      ( ( %title.qname;, %HeadOpts.mix;, ( %base.qname;, %HeadOpts.mix; )? )
>>      | ( %base.qname;, %HeadOpts.mix;, ( %title.qname;, %HeadOpts.mix; ))))"
>> >
> 
> A nice example of precisely the pattern shown above.  I don't think this
> is hard to understand; do you?
> 
> I agree that it would be simpler to write and the result would be easier to
> understand if the rules against non-deterministic content models were
> eliminated.  But those rules have, in the view of the WG, compensating
> advantages (they enable a guarantee that any schema language can be
> written as an LL(1) language, for example, which means that recursive
> descent parsers are easy to write).
> 
>>It is obvious that such things can be avoided for new designs, but
>>it is questionable that this is always desirable, because it is
>>a burden for an user to learn an arbitrary element sequence.
> 
> I agree that it is unpleasant for users to have to learn arbitrary
> sequences of elements.  But this is necessary only when using tools
> which have no support for syntax-directed editing.  Any SGML or
> XML editor with schema awareness will remove the necessity for the
> user to learn an arbitrary sequence of elements.
> 
>>Also, it is not clear to me why the current all group is considered
>>a recommended or supportable design, whereas the changes I propose
>>are not.
> 
> The current all-group closely models the rules for dumping or loading
> rows in a relational table; this is one place where arbitrary order
> has been most consistently desired by users.
> 
>>>It would be helpful to us to know whether you are satisfied with the
>>>decision taken by the WG on this issue, or wish your dissent from the
>>>WG's decision to be recorded for consideration by the Director of
>>>the W3C.
>>
>>I not only wish the dissent to be recorded, I wish the decision to
>>be better explained and if possible reverted.
> 
> Your dissent has been recorded.  I hope the paragraphs above have
> made the decision clearer.
> 
> -Michael Sperberg-McQueen


Ivan Kurmanov <ivan@tm.minsk.by>.
Received on Saturday, 14 October 2000 14:35:58 UTC