Re: LC-16 ( LC-132 ): Allow arbitrary order with occurrence > 1 from Martin J. Duerst on 2000-10-15 (www-xml-schema-comments@w3.org from October to December 2000)

From: Martin J. Duerst <duerst@w3.org>
Date: Sun, 15 Oct 2000 18:34:25 +0900
To: "C. M. Sperberg-McQueen" <cmsmcq@acm.org>
Cc: "Martin Gudgin" <marting@develop.com>, "Schema Comments" <www-xml-schema-comments@w3.org>, "Dan Rupe" <Dan_Rupe@go.com>
Message-Id: <4.2.0.58.J.20001012160859.03903830@sh.w3.mag.keio.ac.jp>
Hello Michael,

Many thanks for your quick and clear answer. Sorry to be a bit
behind.

At 00/10/11 09:47 -0600, C. M. Sperberg-McQueen wrote:
>At 2000-10-11 02:40, Martin J. Duerst wrote:

>>>1.    complexity for schema processors
>>
>>It's a simple matter of counting, isn't it? I don't understand why
>>this should be difficult. For the current version of the all group,
>>a bit vector is needed to check that each element does occur according
>>to the occurrence constraints. This has to be bumped up to a vector of
>>integers. My guess is that this would take a few minutes in XSV, in fact
>>it may be easier to implement from scratch, because there are no special
>>restrictions on minOccurs and maxOccurs.
>
>A bit vector is one way (I believe a fairly common one) of implementing
>the and-connector; it is, however, not the only way.

What are the others? Straightforward finite state machines don't
do the job, as I explained in the message to Henry.


>Any formalism for defining languages is better at some things than
>at others; adding ad hoc rules for what are thought to be special cases
>is not usually thought to be the way to improve a system.  Is there a
>reason to think that counting occurrencs in the way you suggest will be
>an exception to the general rule?

I'm a bit confused here. I think you want to ask whether there is
a reason to think that counting occurrences as I suggest can be
justified as being an exception to the rule that ad hoc rules
for special cases are not desirable.

I have serious problems with answering such a question, because I don't
see the current state, namely the rule that elements in all groups only
can have zero or one occurrence where otherwise elements can have
arbitrary occurrence constraints, as the general case. I rather
see the current state as an ad-hoc rule, and would like to
ask the question back. (you give some of the answers below,
you don't have to repeat these)


>Could
>you give a concrete use case for allowing an arbitrary sequence of
>a, b, c, and d elements where (a) the sequence of the elements is
>significant,

Did you want to write 'insignificant'? That's what both the current
all groups and my proposal are about.


>(b) each element must occur some distinct number of times
>(a one to four times, b exactly once, c ten to thirty times, and d
>exactly three times)?  I have no trouble imagining users who say that
>is what they want; I am having trouble imagining a case where they
>are right.

The very general case is probably extremely rare. But the
'unbounded' case for some of the elements is not that rare.
This is extremely similar to the other places where occurrence
indicators are used: 0, 1, and unbounded are the most frequent
cases, any other actual numbers are quite rare.


>>If there is something I have missed, please tell me.
>
>Only the general principle that ad hoc solutions lead to odd hack
>systems.

As said above, I have difficulties seeing the request to allow
the same occurrence constraints on elements in all groups as
they are allowed for other cases of elements as an odd hack.
It's rather the removal of an odd hack I'm asking for.


>>>2.    the fact that the interpretation usually desired is incompatible with
>>>that of SGML's ampersand connector
>>
>>I'm not sure I understand that. The all group is already different from
>>SGML '&' anyway. And the interpretation is straightforward. The main
>>simplification is provided by the fact that an all group can only
>>occur directly in an element, without any children groups.
>>I'm not at all suggesting to change that.
>
>Every all group currently legal has a straightforward translation into
>an SGML ampersand group which has exactly the same interpretation.
>This is not true of the construct you propose.

I guess there is some point in that, but I guess you agree that the
priority of conversions is probably:

1. XML DTD -> XML Schema
2. XML Schema -> XML DTD
3. SGML DTD -> XML Schema
4. XML Schema -> SGML DTD

In another comment (in the context of character encodings and Unicode),
you have said that conversion back to legacy systems isn't that
important because we want things to move on. Do you see a difference
between non-Unicode systems and non-XML systems in that respect?

With respect to straightforward translation, please see below.


>>>3.    the feeling on the part of some WG members that this is not a pattern
>>>of document design to be recommended or supported.
>>
>>There are definitely many cases where such a pattern is not desired.
>>But there are definitely also cases where it's very helpful to have
>>them. A typical example is metadata, e.g. the HTML <head> element.
>>There, the <title> element can appear only once, the <meta> element
>>can appear many times, and so on. The same thing can be expressed
>>without this feature, but the resulting content models get clumsy
>>and error-prone. For an example, please see
>>http://lists.w3.org/Archives/Public/xmlschema-dev/2000Aug/0017.html.
>
>With respect, the correct content model here does not seem to me
>clumsy, and once the notion of deterministic content models is
>clearly understood it is not hard to write, either:
>
>   <element name='A'>
>     <complexType content='elementOnly'>
>       <sequence>
>         <element ref='test:B' minOccurs='0' maxOccurs='unbounded'/>
>         <sequence minOccurs="0" maxOccurs="1">
>           <element ref='test:C' minOccurs='1' maxOccur="1"/>
>           <element ref='test:B' minOccurs='0' maxOccurs='unbounded'/>
>         </sequence>
>       </sequence>
>     </complexType>
>   </element>
>
>Or more compactly:  <!ELEMENT A (B*, (C, B*)?) >
>
>A language which accepts a sequence of A, B, and C elements, with
>at most one A and at most one B is a bit more complex, but not too
>hard to work out.
>
>   (c*, ((a, c*, (b, c*)?) | (b, c*, (a, c*)?))?)
>
>The translation into regular expressions becomes tedious if there are
>more than two items for which the maximum cardinality is bounded but
>larger than, say, three.  If I were aware of lots of cases where such
>languages were The Right Thing, I would be working a lot harder to
>find good ways to integrate support for them into languages like
>XML DTDs and XML Schema.  But so far I don't know any serious examples
>and so I am left cold by the argument that writing a regular expression
>which counts up to various numbers for various child elements is
>too hard.

You are arguing here that the increasing difficulty of writing
the regular expressions corresponds to the increasing rarity and
undesirability of the patterns. This is correct, with the very
important exception that for the average schema creator/user,
the point where things become difficult appears much earlier
than for you (and me if I'm up to speed).

But even if we assume that all (potential) schema writers are
fluent in regular expressions and deterministic content models,
I very strongly doubt that any of them (including you and me)
will ever answer with a regular expression if asked to describe
e.g. the HTML <head> content model. We don't think about it in
terms of regular expressions, we think about it in terms of
simple occurrences as they could easily be expressed in an
extended all group. This is just the core of my all group
proposal: allow people to write things down the way they
think about it, and let the machines do the rest of the work;
they are much better at it.


>>For another example, please see
>>http://slow1.w3.org/TR/xhtml-modularization/dtd_module_defs.html#a_module_ 
>>Base:
>><!ENTITY % head.content
>>     "( %HeadOpts.mix;,
>>      ( ( %title.qname;, %HeadOpts.mix;, ( %base.qname;, %HeadOpts.mix; )? )
>>      | ( %base.qname;, %HeadOpts.mix;, ( %title.qname;, %HeadOpts.mix; ))))"
>> >
>
>A nice example of precisely the pattern shown above.  I don't think this
>is hard to understand; do you?

I understand that you are good at regular expressions, and I admit
that I also like them and use them when appropriate. But because
I don't use them regularly, it usually takes me some time to get
back into it.

But I also very much understand that regular expressions are not
everybody's speciality, and I think that many people who will want
to use XML Schema won't be experts in regular expressions, and
shouldn't have to try to become experts.


>I agree that it would be simpler to write and the result would be easier to
>understand if the rules against non-deterministic content models were
>eliminated.  But those rules have, in the view of the WG, compensating
>advantages (they enable a guarantee that any schema language can be
>written as an LL(1) language, for example, which means that recursive
>descent parsers are easy to write).

Are you saying that the above regular expressions would be simpler?
I didn't request that these be provided (they are anyway), I requested
that the all groups be changed. I do not see how a changed all group
could make it more difficult to write a recursive descent parser.


>>It is obvious that such things can be avoided for new designs, but
>>it is questionable that this is always desirable, because it is
>>a burden for an user to learn an arbitrary element sequence.
>
>I agree that it is unpleasant for users to have to learn arbitrary
>sequences of elements.  But this is necessary only when using tools
>which have no support for syntax-directed editing.  Any SGML or
>XML editor with schema awareness will remove the necessity for the
>user to learn an arbitrary sequence of elements.

Well, in many cases, it will just say 'element x not allowed here',
which won't really help the user, because it means moving around
the cursor and checking until the right location is found.


>>Also, it is not clear to me why the current all group is considered
>>a recommended or supportable design, whereas the changes I propose
>>are not.
>
>The current all-group closely models the rules for dumping or loading
>rows in a relational table; this is one place where arbitrary order
>has been most consistently desired by users.

The extended all group would cover many cases of metadata, where
again arbitrary order is desired by users (and at least unbounded
cardinality is needed in addition to 0 and 1).

(By the way, I would probably be happy with just these three
cardinalities, except that it would still be an exception from
the general pattern.)


>>>It would be helpful to us to know whether you are satisfied with the
>>>decision taken by the WG on this issue, or wish your dissent from the
>>>WG's decision to be recorded for consideration by the Director of
>>>the W3C.
>>
>>I not only wish the dissent to be recorded, I wish the decision to
>>be better explained and if possible reverted.
>
>Your dissent has been recorded.  I hope the paragraphs above have
>made the decision clearer.

They have cleared up some points in earlier arguments, but they
haven't helped me to understand the decision better.


Regards,   Martin.
Received on Sunday, 15 October 2000 17:51:13 UTC