Re: Attribute declarations after complex type definitions from Martin J. Duerst on 2000-06-21 (www-xml-schema-comments@w3.org from April to June 2000)

From: Martin J. Duerst <duerst@w3.org>
Date: Wed, 21 Jun 2000 17:55:50 +0900
To: "C. M. Sperberg-McQueen" <cmsmcq@acm.org>, www-xml-schema-comments@w3.org
Message-Id: <4.2.0.58.J.20000621152350.0301d920@sh.w3.mag.keio.ac.jp>
Hello Michael,

There seems to be a misunderstanding. I don't think
the WG understood my arguments, and I'm not convinced
by your arguments. Please see below.

At 00/06/20 21:08 -0600, C. M. Sperberg-McQueen wrote:
>At 22:14 00/05/21 +0900, Martin J. Duerst wrote:
> >This is a last call comment to XML Schema: Structures.
> >
> >Currently, all attribute-related stuff in an element
> >decl. has to come after all content-related stuff.
> >This seems not very well motivated and and definitely
> >inconvenient, and should be changed to allow either
> >complete mixture or having these both at the start and
> >at the end (and just add those at the start and those at
> >the end together), or if that's not possible, preferably
> >at the start rather than at the end.
> >
> >It is more natural to have them at the start, because that's
> >how they appear in the instance. DTD syntax made most people
> >have attribute decl. after element content decls., but there
> >is no need to do so for Schemas.

>It would indeed be possible to allow free intermixing of
>attribute declarations and 'content model' declarations, at
>least syntactically.  But, as you indirectly suggest, it
>might be difficult to provide a simple, coherent semantics
>for the interleaved form -- either the interleaving carries
>some meaning (in which case what DOES it mean?) or else it
>carries none.
>
>The experience of DTD designers has been, in general, that
>if there is no particular reason to choose between two
>orders, or between an arbitrary fixed order and a free
>order that has no significance, it is usually better to
>choose a single fixed order arbitrarily than to allow
>free ordering.  By hypothesis, the free ordering carries no
>extra information, but it makes the job of parsers and
>user interface designers harder.  So choosing a fixed order,
>as is done in XML Schema, seems to us the correct choice.

This is a valid point. I wouldn't mind the WG to choose
to have only one order, or to only allow attributes
before and/or after elements, and not intermixed.
Given the current content model of <complexType>:

A)
<!ELEMENT %complexType; ((%annotation;)?,
                           ((%facet;)*|
                            ((%element;| %mgs; | %group; | %any;)*,
                             (%attribute;| %attributeGroup;)*,
                             (%anyAttribute;)?)))>

Having attributes before content models would amount to:

B)
<!ELEMENT %complexType; ((%annotation;)?,
                           ((%facet;)*|
                            ((%attribute;| %attributeGroup;)*,
                             (%element;| %mgs; | %group; | %any;)*,
                             (%anyAttribute;)?)))>
(probably the %anyAttribute; may have to move, too)

Allowing attributes before and after content models would
amount to

C)
<!ELEMENT %complexType; ((%annotation;)?,
                           ((%facet;)*|
                            ((%attribute;| %attributeGroup;)*,
                             (%element;| %mgs; | %group; | %any;)*,
                             (%attribute;| %attributeGroup;)*,
                             (%anyAttribute;)?)))>
(with some additional tweaks to get a deterministic model)

allowing attributes either before or after content models
(but not both) would looks something like

D)
<!ELEMENT %complexType; ((%annotation;)?,
                           ((%facet;)*|
                            (((%attribute;| %attributeGroup;)+,
                              (%element;| %mgs; | %group; | %any;)*) |
                             ((%element;| %mgs; | %group; | %any;)+,
                              (%attribute;| %attributeGroup;)*)?,
                             (%anyAttribute;)?)))>

and allowing arbitrary sequences would amount to something like

E)
<!ELEMENT %complexType; ((%annotation;)?,
                           ((%facet;)*|
                            ((%element;| %mgs; | %group; | %any; |
                              %attribute;| %attributeGroup;)*,
                             (%anyAttribute;)?)))>

Something like E) may indeed have some problems as indicated,
but D) just provides two ways to do things, and C) provides
for mixing these two ways, which just comes close, according
to your argumentation (see below) to the possibilities for
DTDs. Both D) and C) would be fine with me (I guess I currently
have a slight preference for C), and B) would still be better
than A) in my opinion.


>Now the question become 'which order?', and here there is
>both a factual point to make, and a judgement call.
>
>Factually, we believe you are right that many DTD authors
>follow the pattern of making an ATTLIST declaration immediately
>follow the ELEMENT declaration for the relevant element.
>But it is a mistake to believe that this sequence is required
>by DTD syntax.  DTD syntax makes no such requirement; the
>sequence ELEMENT, ATTLIST is not imposed by syntax, but
>chosen freely by DTD authors.
>
>It appears to us (based partly on introspection by the DTD
>authors in the WG, and partly on our knowledge of other
>DTD designers) that many DTD authors have chosen this
>sequence because it appears to them (though not, admittedly,
>to you) more natural than the alternative.

This is wrong. I agree with you and the DTD authors in the WG
that for DTDs, it is more natural to write

a)
<!ELEMENT FOO (A, B, C) >
<!ATTLIST FOO x CDATA implied
               y CDATA implied >

than the other way round, namely

b)
<!ATTLIST FOO  >
<!ELEMENT FOO (A, B, C) >

b) would be strange, because it's quite natural to first say
that something is an element (in which case, due to the DTD
syntax, you have to give the content model on the spot) and
then list the attributes.

However, I think that the above does not at all imply that
XML Schema designers would prefer to write

A)
<element name='foo'>
   <complexType>
     <sequence>
       <element name='A' />
       <element name='B' />
       <element name='C' />
     </sequence>
     <attribute name='x' />
     <attribute name='y' />
   </complexType>
</element>

instead of

B)
<element name='foo'>
   <complexType>
     <attribute name='x' />
     <attribute name='y' />
     <sequence>
       <element name='A' />
       <element name='B' />
       <element name='C' />
     </sequence>
   </complexType>
</element>

(the syntax is probably not correct/complete, but I hope you get the idea).
Quite to the contrary, my guess is that the later, if allowed, would be
used more frequently. The important difference to the DTD syntax is of
course that saying that 'foo' is an element occurs before listing the
attributes.


>So to the extent that there is a reason to choose between
>the current sequence and the inverted sequence you propose,
>the empirical evidence available to us seems to favor
>retaining our current choice.

I disagree. The evidence you have cited (there may be other
evidence, but I'm not aware of) is that authors given the
choice of

a)
this-is-an-element-and-it-has-this-content-model
these-are-the-attributes-that-go-with-this-element

and

b)
these-are-the-attributes-that-go-with-this-element
this-is-an-element-and-it-has-this-content-model

prefer a). I claim that this does not give you any evidence
on how authors would choose between

A)
this-is-an-element
and-here-is-its-content-model
and-here-are-its-attributes

and

B)
this-is-an-element
and-here-are-its-attributes
and-here-is-its-content-model

One important evidence that I have that authors prefer B) over
A) is the XML Schema spec, Part 1, itself. In the ad-hoc notation
in Section 4, attributes appear before elements. You may argue
that this is due to the fact that this mimics XML instance
syntax, but this is exactly the point I'm making: B) is much
closer to XML instance syntax than A), and should therefore
be preferred.


>It is worth mentioning, however,
>that some members of the WG express a certain degree of
>skepticism that judgments of 'naturalness' in questions like
>this are at all reliable; such judgements, these WG members
>contend, appear to be mutable, and to be based in the first
>instance on experience with one sequence or the other, and
>to a lesser extent on analogies with natural-language word
>order (which itself pretty well destroys the notion that
>any particular sequence is inherently more natural than
>any other).

Having attributes before child elements in XML syntax is also
to quite some degree arbitrary. It would just be nice if this
arbitrariness were the same way round both times, rather than
different.


>So the upshot is that while the WG is grateful for the
>suggestion, we do not believe it would be an improvement
>to the language.  We hope the paragraphs above provide
>an explanation of that decision sufficient to persuade you
>to agree with us; let us know if you do (or don't).

As explained above, I do not agree; the reason why most
people keep ATTLIST after ELEMENT in DTDs is not a reason
for keeping attributes after content models in XML Schemas.


Regards,  Martin.
Received on Wednesday, 21 June 2000 04:46:45 UTC