- From: C. M. Sperberg-McQueen <cmsmcq@blackmesatech.com>
- Date: Sat, 30 May 2009 13:48:31 -0600
- To: Dag Hovland <dag.hovland@uib.no>
- Cc: "C. M. Sperberg-McQueen" <cmsmcq@blackmesatech.com>, xmlschema-dev@w3.org
On 28 May 2009, at 03:06 , Dag Hovland wrote: > Hi! > > Thank you for the clear answer. I realize that finding the "reasons" > are > probably hard. My intention was also to discover what is the > unrestricted form of xs:all. Is it the "&"-operator from SGML? Or is > it > "interleaving" as described in regular language theory? The restrictions on the XSD 1.0 all-group have the consequence that XSD 1.0 all-groups can be interpreted either as a restricted form of the SGML all-group or as a restricted form of interleave. That was not a stated goal of the design -- I'm not sure I had heard of any academic studies of the interleave operator at that time, in the context of regular languages or elsewhere, and if other WG members had, they didn't talk about it then. But the fact that XSD 1.0 all-groups satisfy the rules of both SGML all-groups and the interleave operator does have the side effect that when XSD 1.1 relaxed some of the restrictions, the WG was faced with the choice of making XSD 1.1 all-groups behave like SGML or like the interleave operator studied by computer scientists. Historically, the all-group can be viewed (and was viewed by at least some in the WG that introduced it) as a partial resurrection of the SGML all-group, which had been eliminated from XML 1.0 DTDs because it complicated the analysis of content models and the implementation of validators, without extending the expressive power of the language. The request to revive the all-group was motivated by scenarios like dumping rows from SQL tables, where the sequence of columns in the row has no significance for the SQL processor, and the language offers no guarantee that columns will be produced in any particular order, unless an explicit order is specified in the SQL statement. There was opposition to restoring the all-group, however, for the same reasons it was omitted from XML DTDs. The compromise reached was based on the observation that a restricted all-group sufficient for dumping the columns of a row in a relational table would also be too weak to cause the kind of problems known from SGML. The pessimists in the WG warned that having any kind of all-group at all was just asking for trouble, and talked about slippery slopes. The persistent requests from inside and outside the WG for just a tweak here and a tweak there to all-groups have seemed to confirm the view of the pessimists of ten years ago. > Are there any > special problems in treating the full &-operator? I could not find > much > information about this operator, or why it was/is seen as hard to > treat. The main difficulty in handling SGML all-groups is that in the general case, the language defined by a content model with an all-group cannot conveniently be recognized by constructing a simple finite-state automaton. An FSA can always be constructed, but no special effort is needed to write SGML content models for which the FSA becomes impractically large. To see the effect, try an FSA for (a? & b? & c? & d? & e? & f? & g? & h)+ sometime. Now imagine rewriting that + as {21,30}, and replacing individual members with arbitrarily complex model groups. I have been told on good authority that virtually all SGML implementations eventually found a way to handle all-groups without combinatorial explosion, but the people I know with commercial connections have been unable to tell me how, because their companies regarded it as a trade secret. Certainly it's not something one finds treated in standard texts on automata theory -- at least not the ones on my shelf. I hope this helps. -- **************************************************************** * C. M. Sperberg-McQueen, Black Mesa Technologies LLC * http://www.blackmesatech.com * http://cmsmcq.com/mib * http://balisage.net ****************************************************************
Received on Saturday, 30 May 2009 19:49:09 UTC