Re: Motivations for restricting the "all" group

On 28 May 2009, at 03:06 , Dag Hovland wrote:

> Hi!
>
> Thank you for the clear answer. I realize that finding the "reasons"  
> are
> probably hard. My intention was also to discover what is the
> unrestricted form of xs:all. Is it the "&"-operator from SGML? Or is  
> it
> "interleaving" as described in regular language theory?

The restrictions on the XSD 1.0 all-group have the consequence
that XSD 1.0 all-groups can be interpreted either as a
restricted form of the SGML all-group or as a restricted form
of interleave.  That was not a stated goal of the design -- I'm
not sure I had heard of any academic studies of the interleave
operator at that time, in the context of regular languages or
elsewhere, and if other WG members had, they didn't talk about
it then.  But the fact that XSD 1.0 all-groups satisfy the
rules of both SGML all-groups and the interleave operator
does have the side effect that when XSD 1.1 relaxed some
of the restrictions, the WG was faced with the choice of
making XSD 1.1 all-groups behave like SGML or like the
interleave operator studied by computer scientists.

Historically, the all-group can be viewed (and was viewed by
at least some in the WG that introduced it) as a partial
resurrection of the SGML all-group, which had been eliminated
from XML 1.0 DTDs because it complicated the analysis of
content models and the implementation of validators, without
extending the expressive power of the language.  The request
to revive the all-group was motivated by scenarios like
dumping rows from SQL tables, where the sequence of columns
in the row has no significance for the SQL processor, and
the language offers no guarantee that columns will be produced
in any particular order, unless an explicit order is specified
in the SQL statement.  There was opposition to restoring the
all-group, however, for the same reasons it was omitted from
XML DTDs.  The compromise reached was based on the observation
that a restricted all-group sufficient for dumping the columns
of a row in a relational table would also be too weak to
cause the kind of problems known from SGML.  The pessimists in
the WG warned that having any kind of all-group at all was
just asking for trouble, and talked about slippery slopes.
The persistent requests from inside and outside the WG for just
a tweak here and a tweak there to all-groups have seemed to
confirm the view of the pessimists of ten years ago.

>  Are there any
> special problems in treating the full &-operator? I could not find  
> much
> information about this operator, or why it was/is seen as hard to  
> treat.

The main difficulty in handling SGML all-groups is that
in the general case, the language defined by a content model
with an all-group cannot conveniently be recognized by
constructing a simple finite-state automaton.  An FSA can
always be constructed, but no special effort is needed to
write SGML content models for which the FSA becomes
impractically large.  To see the effect, try an FSA for

   (a? & b? & c? & d? & e? & f? & g? & h)+

sometime.  Now imagine rewriting that + as {21,30}, and
replacing individual members with arbitrarily complex
model groups.

I have been told on good authority that virtually all SGML
implementations eventually found a way to handle all-groups
without combinatorial explosion, but the people I know
with commercial connections have been unable to tell
me how, because their companies regarded it as a trade
secret.  Certainly it's not something one finds treated in
standard texts on automata theory -- at least not the ones
on my shelf.

I hope this helps.


-- 
****************************************************************
* C. M. Sperberg-McQueen, Black Mesa Technologies LLC
* http://www.blackmesatech.com
* http://cmsmcq.com/mib
* http://balisage.net
****************************************************************

Received on Saturday, 30 May 2009 19:49:09 UTC