- From: Ivan Kurmanov <ivan@tm.minsk.by>
- Date: Sat, 14 Oct 2000 14:50:00 +0300
- To: www-xml-schema-comments@w3.org
Dear Sirs, With respect to the XML Schema working group and to the SGML traditions, I have several questions about the Last-Call issue LC-16 resolution. >>>1. complexity for schema processors Implementing the all group with occurrence > 1 validation is much more simple than implementing element sequence validation. I tried both. >>It's a simple matter of counting, isn't it? I don't understand why >>this should be difficult. For the current version of the all group, >>a bit vector is needed to check that each element does occur according >>to the occurrence constraints. This has to be bumped up to a vector of >>integers. My guess is that this would take a few minutes in XSV, in fact >>it may be easier to implement from scratch, because there are no special >>restrictions on minOccurs and maxOccurs. > > A bit vector is one way (I believe a fairly common one) of implementing > the and-connector; it is, however, not the only way. > > Any formalism for defining languages is better at some things than > at others; adding ad hoc rules for what are thought to be special cases > is not usually thought to be the way to improve a system. Is there a > reason to think that counting occurrencs in the way you suggest will be > an exception to the general rule? Is there a general rule that suggests > a reason why we ought to expect this to be a common construct? I understand this as: you saying the proposed change (maxOccur > 1) will bring more complexity to the processor than benefits to the users. Benefits are difficult to estimate, but the complexity introduced is really minimal. Do you agree? > Could you give a concrete use case for allowing an arbitrary sequence > of a, b, c, and d elements where (a) the sequence of the elements is > significant, (b) each element must occur some distinct number of times > (a one to four times, b exactly once, c ten to thirty times, and d > exactly three times)? I have no trouble imagining users who say that is > what they want; I am having trouble imagining a case where they are > right. I'm representing a project, with metadata interests (www.repec.org) which is in a need to model all groups with unbounded maxOccur. <paper> <!-- a working paper description with two authors, consisting of three files --> <author /> <author /> <title /> <abstract /> <file /> <file /> <file /> <note /> <length /> <price /> <classification /> <published-as /> </paper> I have no idea about a general rule which will help to establish some certain order of such elements > >>If there is something I have missed, please tell me. > > Only the general principle that ad hoc solutions lead to odd hack > systems. > ... >>>2. the fact that the interpretation usually desired is incompatible with >>>that of SGML's ampersand connector >> >>I'm not sure I understand that. The all group is already different from >>SGML '&' anyway. And the interpretation is straightforward. The main >>simplification is provided by the fact that an all group can only >>occur directly in an element, without any children groups. >>I'm not at all suggesting to change that. > > Every all group currently legal has a straightforward translation into > an SGML ampersand group which has exactly the same interpretation. > This is not true of the construct you propose. Was compatibility with SGML one of the XML Schema design goals? > >>>3. the feeling on the part of some WG members that this is not a pattern >>>of document design to be recommended or supported. >> >>There are definitely many cases where such a pattern is not desired. >>But there are definitely also cases where it's very helpful to have >>them. A typical example is metadata, e.g. the HTML <head> element. >>There, the <title> element can appear only once, the <meta> element >>can appear many times, and so on. The same thing can be expressed >>without this feature, but the resulting content models get clumsy >>and error-prone. For an example, please see >>http://lists.w3.org/Archives/Public/xmlschema-dev/2000Aug/0017.html. > > With respect, the correct content model here does not seem to me > clumsy, and once the notion of deterministic content models is > clearly understood it is not hard to write, either: > > <element name='A'> > <complexType content='elementOnly'> > <sequence> > <element ref='test:B' minOccurs='0' maxOccurs='unbounded'/> > <sequence minOccurs="0" maxOccurs="1"> > <element ref='test:C' minOccurs='1' maxOccur="1"/> > <element ref='test:B' minOccurs='0' maxOccurs='unbounded'/> > </sequence> > </sequence> > </complexType> > </element> > > Or more compactly: <!ELEMENT A (B*, (C, B*)?) > > > A language which accepts a sequence of A, B, and C elements, with > at most one A and at most one B is a bit more complex, but not too > hard to work out. > > (c*, ((a, c*, (b, c*)?) | (b, c*, (a, c*)?))?) > > The translation into regular expressions becomes tedious if there are > more than two items for which the maximum cardinality is bounded but > larger than, say, three. If I were aware of lots of cases where such > languages were The Right Thing, I would be working a lot harder to > find good ways to integrate support for them into languages like > XML DTDs and XML Schema. But so far I don't know any serious examples > and so I am left cold by the argument that writing a regular expression > which counts up to various numbers for various child elements is > too hard. > >>For another example, please see >>http://slow1.w3.org/TR/xhtml-modularization/dtd_module_defs.html#a_module_Base: >><!ENTITY % head.content >> "( %HeadOpts.mix;, >> ( ( %title.qname;, %HeadOpts.mix;, ( %base.qname;, %HeadOpts.mix; )? ) >> | ( %base.qname;, %HeadOpts.mix;, ( %title.qname;, %HeadOpts.mix; ))))" >> > > > A nice example of precisely the pattern shown above. I don't think this > is hard to understand; do you? > > I agree that it would be simpler to write and the result would be easier to > understand if the rules against non-deterministic content models were > eliminated. But those rules have, in the view of the WG, compensating > advantages (they enable a guarantee that any schema language can be > written as an LL(1) language, for example, which means that recursive > descent parsers are easy to write). > >>It is obvious that such things can be avoided for new designs, but >>it is questionable that this is always desirable, because it is >>a burden for an user to learn an arbitrary element sequence. > > I agree that it is unpleasant for users to have to learn arbitrary > sequences of elements. But this is necessary only when using tools > which have no support for syntax-directed editing. Any SGML or > XML editor with schema awareness will remove the necessity for the > user to learn an arbitrary sequence of elements. > >>Also, it is not clear to me why the current all group is considered >>a recommended or supportable design, whereas the changes I propose >>are not. > > The current all-group closely models the rules for dumping or loading > rows in a relational table; this is one place where arbitrary order > has been most consistently desired by users. > >>>It would be helpful to us to know whether you are satisfied with the >>>decision taken by the WG on this issue, or wish your dissent from the >>>WG's decision to be recorded for consideration by the Director of >>>the W3C. >> >>I not only wish the dissent to be recorded, I wish the decision to >>be better explained and if possible reverted. > > Your dissent has been recorded. I hope the paragraphs above have > made the decision clearer. > > -Michael Sperberg-McQueen Ivan Kurmanov <ivan@tm.minsk.by>.
Received on Saturday, 14 October 2000 14:35:58 UTC