- From: C. M. Sperberg-McQueen <cmsmcq@acm.org>
- Date: Wed, 11 Oct 2000 09:47:04 -0600
- To: "Martin J. Duerst" <duerst@w3.org>
- Cc: "Martin Gudgin" <marting@develop.com>, "Schema Comments" <www-xml-schema-comments@w3.org>, "Dan Rupe" <Dan_Rupe@go.com>
At 2000-10-11 02:40, Martin J. Duerst wrote: >Hello Martin, > >In summary, I have to say that I'm not at all satisfied with the >decision of the WG, and even less by the justification given below. I'm sorry to hear that, but thank you for letting us know. >>1. complexity for schema processors > >It's a simple matter of counting, isn't it? I don't understand why >this should be difficult. For the current version of the all group, >a bit vector is needed to check that each element does occur according >to the occurrence constraints. This has to be bumped up to a vector of >integers. My guess is that this would take a few minutes in XSV, in fact >it may be easier to implement from scratch, because there are no special >restrictions on minOccurs and maxOccurs. A bit vector is one way (I believe a fairly common one) of implementing the and-connector; it is, however, not the only way. Any formalism for defining languages is better at some things than at others; adding ad hoc rules for what are thought to be special cases is not usually thought to be the way to improve a system. Is there a reason to think that counting occurrencs in the way you suggest will be an exception to the general rule? Is there a general rule that suggests a reason why we ought to expect this to be a common construct? Could you give a concrete use case for allowing an arbitrary sequence of a, b, c, and d elements where (a) the sequence of the elements is significant, (b) each element must occur some distinct number of times (a one to four times, b exactly once, c ten to thirty times, and d exactly three times)? I have no trouble imagining users who say that is what they want; I am having trouble imagining a case where they are right. >If there is something I have missed, please tell me. Only the general principle that ad hoc solutions lead to odd hack systems. >>2. the fact that the interpretation usually desired is incompatible with >>that of SGML's ampersand connector > >I'm not sure I understand that. The all group is already different from >SGML '&' anyway. And the interpretation is straightforward. The main >simplification is provided by the fact that an all group can only >occur directly in an element, without any children groups. >I'm not at all suggesting to change that. Every all group currently legal has a straightforward translation into an SGML ampersand group which has exactly the same interpretation. This is not true of the construct you propose. >>3. the feeling on the part of some WG members that this is not a pattern >>of document design to be recommended or supported. > >There are definitely many cases where such a pattern is not desired. >But there are definitely also cases where it's very helpful to have >them. A typical example is metadata, e.g. the HTML <head> element. >There, the <title> element can appear only once, the <meta> element >can appear many times, and so on. The same thing can be expressed >without this feature, but the resulting content models get clumsy >and error-prone. For an example, please see >http://lists.w3.org/Archives/Public/xmlschema-dev/2000Aug/0017.html. With respect, the correct content model here does not seem to me clumsy, and once the notion of deterministic content models is clearly understood it is not hard to write, either: <element name='A'> <complexType content='elementOnly'> <sequence> <element ref='test:B' minOccurs='0' maxOccurs='unbounded'/> <sequence minOccurs="0" maxOccurs="1"> <element ref='test:C' minOccurs='1' maxOccur="1"/> <element ref='test:B' minOccurs='0' maxOccurs='unbounded'/> </sequence> </sequence> </complexType> </element> Or more compactly: <!ELEMENT A (B*, (C, B*)?) > A language which accepts a sequence of A, B, and C elements, with at most one A and at most one B is a bit more complex, but not too hard to work out. (c*, ((a, c*, (b, c*)?) | (b, c*, (a, c*)?))?) The translation into regular expressions becomes tedious if there are more than two items for which the maximum cardinality is bounded but larger than, say, three. If I were aware of lots of cases where such languages were The Right Thing, I would be working a lot harder to find good ways to integrate support for them into languages like XML DTDs and XML Schema. But so far I don't know any serious examples and so I am left cold by the argument that writing a regular expression which counts up to various numbers for various child elements is too hard. >For another example, please see >http://slow1.w3.org/TR/xhtml-modularization/dtd_module_defs.html#a_module_Base: ><!ENTITY % head.content > "( %HeadOpts.mix;, > ( ( %title.qname;, %HeadOpts.mix;, ( %base.qname;, %HeadOpts.mix; )? ) > | ( %base.qname;, %HeadOpts.mix;, ( %title.qname;, %HeadOpts.mix; ))))" > > A nice example of precisely the pattern shown above. I don't think this is hard to understand; do you? I agree that it would be simpler to write and the result would be easier to understand if the rules against non-deterministic content models were eliminated. But those rules have, in the view of the WG, compensating advantages (they enable a guarantee that any schema language can be written as an LL(1) language, for example, which means that recursive descent parsers are easy to write). >It is obvious that such things can be avoided for new designs, but >it is questionable that this is always desirable, because it is >a burden for an user to learn an arbitrary element sequence. I agree that it is unpleasant for users to have to learn arbitrary sequences of elements. But this is necessary only when using tools which have no support for syntax-directed editing. Any SGML or XML editor with schema awareness will remove the necessity for the user to learn an arbitrary sequence of elements. >Also, it is not clear to me why the current all group is considered >a recommended or supportable design, whereas the changes I propose >are not. The current all-group closely models the rules for dumping or loading rows in a relational table; this is one place where arbitrary order has been most consistently desired by users. >>It would be helpful to us to know whether you are satisfied with the >>decision taken by the WG on this issue, or wish your dissent from the >>WG's decision to be recorded for consideration by the Director of >>the W3C. > >I not only wish the dissent to be recorded, I wish the decision to >be better explained and if possible reverted. Your dissent has been recorded. I hope the paragraphs above have made the decision clearer. -Michael Sperberg-McQueen
Received on Wednesday, 11 October 2000 12:13:19 UTC