- From: Ivan Kurmanov <ivan@tm.minsk.by>
- Date: Sat, 14 Oct 2000 14:50:00 +0300
- To: www-xml-schema-comments@w3.org
Dear Sirs,
With respect to the XML Schema working group and to the SGML traditions,
I have several questions about the Last-Call issue LC-16 resolution.
>>>1. complexity for schema processors
Implementing the all group with occurrence > 1 validation is much more
simple than implementing element sequence validation. I tried both.
>>It's a simple matter of counting, isn't it? I don't understand why
>>this should be difficult. For the current version of the all group,
>>a bit vector is needed to check that each element does occur according
>>to the occurrence constraints. This has to be bumped up to a vector of
>>integers. My guess is that this would take a few minutes in XSV, in fact
>>it may be easier to implement from scratch, because there are no special
>>restrictions on minOccurs and maxOccurs.
>
> A bit vector is one way (I believe a fairly common one) of implementing
> the and-connector; it is, however, not the only way.
>
> Any formalism for defining languages is better at some things than
> at others; adding ad hoc rules for what are thought to be special cases
> is not usually thought to be the way to improve a system. Is there a
> reason to think that counting occurrencs in the way you suggest will be
> an exception to the general rule? Is there a general rule that suggests
> a reason why we ought to expect this to be a common construct?
I understand this as: you saying the proposed change (maxOccur > 1) will
bring more complexity to the processor than benefits to the users.
Benefits are difficult to estimate, but the complexity introduced is
really minimal. Do you agree?
> Could you give a concrete use case for allowing an arbitrary sequence
> of a, b, c, and d elements where (a) the sequence of the elements is
> significant, (b) each element must occur some distinct number of times
> (a one to four times, b exactly once, c ten to thirty times, and d
> exactly three times)? I have no trouble imagining users who say that is
> what they want; I am having trouble imagining a case where they are
> right.
I'm representing a project, with metadata interests (www.repec.org) which
is in a need to model all groups with unbounded maxOccur.
<paper> <!-- a working paper description with two authors,
consisting of three files -->
<author />
<author />
<title />
<abstract />
<file />
<file />
<file />
<note />
<length />
<price />
<classification />
<published-as />
</paper>
I have no idea about a general rule which will help to establish some
certain order of such elements
>
>>If there is something I have missed, please tell me.
>
> Only the general principle that ad hoc solutions lead to odd hack
> systems.
>
...
>>>2. the fact that the interpretation usually desired is incompatible with
>>>that of SGML's ampersand connector
>>
>>I'm not sure I understand that. The all group is already different from
>>SGML '&' anyway. And the interpretation is straightforward. The main
>>simplification is provided by the fact that an all group can only
>>occur directly in an element, without any children groups.
>>I'm not at all suggesting to change that.
>
> Every all group currently legal has a straightforward translation into
> an SGML ampersand group which has exactly the same interpretation.
> This is not true of the construct you propose.
Was compatibility with SGML one of the XML Schema design goals?
>
>>>3. the feeling on the part of some WG members that this is not a pattern
>>>of document design to be recommended or supported.
>>
>>There are definitely many cases where such a pattern is not desired.
>>But there are definitely also cases where it's very helpful to have
>>them. A typical example is metadata, e.g. the HTML <head> element.
>>There, the <title> element can appear only once, the <meta> element
>>can appear many times, and so on. The same thing can be expressed
>>without this feature, but the resulting content models get clumsy
>>and error-prone. For an example, please see
>>http://lists.w3.org/Archives/Public/xmlschema-dev/2000Aug/0017.html.
>
> With respect, the correct content model here does not seem to me
> clumsy, and once the notion of deterministic content models is
> clearly understood it is not hard to write, either:
>
> <element name='A'>
> <complexType content='elementOnly'>
> <sequence>
> <element ref='test:B' minOccurs='0' maxOccurs='unbounded'/>
> <sequence minOccurs="0" maxOccurs="1">
> <element ref='test:C' minOccurs='1' maxOccur="1"/>
> <element ref='test:B' minOccurs='0' maxOccurs='unbounded'/>
> </sequence>
> </sequence>
> </complexType>
> </element>
>
> Or more compactly: <!ELEMENT A (B*, (C, B*)?) >
>
> A language which accepts a sequence of A, B, and C elements, with
> at most one A and at most one B is a bit more complex, but not too
> hard to work out.
>
> (c*, ((a, c*, (b, c*)?) | (b, c*, (a, c*)?))?)
>
> The translation into regular expressions becomes tedious if there are
> more than two items for which the maximum cardinality is bounded but
> larger than, say, three. If I were aware of lots of cases where such
> languages were The Right Thing, I would be working a lot harder to
> find good ways to integrate support for them into languages like
> XML DTDs and XML Schema. But so far I don't know any serious examples
> and so I am left cold by the argument that writing a regular expression
> which counts up to various numbers for various child elements is
> too hard.
>
>>For another example, please see
>>http://slow1.w3.org/TR/xhtml-modularization/dtd_module_defs.html#a_module_Base:
>><!ENTITY % head.content
>> "( %HeadOpts.mix;,
>> ( ( %title.qname;, %HeadOpts.mix;, ( %base.qname;, %HeadOpts.mix; )? )
>> | ( %base.qname;, %HeadOpts.mix;, ( %title.qname;, %HeadOpts.mix; ))))"
>> >
>
> A nice example of precisely the pattern shown above. I don't think this
> is hard to understand; do you?
>
> I agree that it would be simpler to write and the result would be easier to
> understand if the rules against non-deterministic content models were
> eliminated. But those rules have, in the view of the WG, compensating
> advantages (they enable a guarantee that any schema language can be
> written as an LL(1) language, for example, which means that recursive
> descent parsers are easy to write).
>
>>It is obvious that such things can be avoided for new designs, but
>>it is questionable that this is always desirable, because it is
>>a burden for an user to learn an arbitrary element sequence.
>
> I agree that it is unpleasant for users to have to learn arbitrary
> sequences of elements. But this is necessary only when using tools
> which have no support for syntax-directed editing. Any SGML or
> XML editor with schema awareness will remove the necessity for the
> user to learn an arbitrary sequence of elements.
>
>>Also, it is not clear to me why the current all group is considered
>>a recommended or supportable design, whereas the changes I propose
>>are not.
>
> The current all-group closely models the rules for dumping or loading
> rows in a relational table; this is one place where arbitrary order
> has been most consistently desired by users.
>
>>>It would be helpful to us to know whether you are satisfied with the
>>>decision taken by the WG on this issue, or wish your dissent from the
>>>WG's decision to be recorded for consideration by the Director of
>>>the W3C.
>>
>>I not only wish the dissent to be recorded, I wish the decision to
>>be better explained and if possible reverted.
>
> Your dissent has been recorded. I hope the paragraphs above have
> made the decision clearer.
>
> -Michael Sperberg-McQueen
Ivan Kurmanov <ivan@tm.minsk.by>.
Received on Saturday, 14 October 2000 14:35:58 UTC