- From: Paul Prescod <paul@prescod.net>
- Date: Fri, 20 Aug 1999 11:56:38 -0400
- To: w3c-xml-schema-ig@w3.org, www-xml-schema-comments@w3.org, xml-dev <xml-dev@ic.ac.uk>
I had this idea on the plane back from Metastructures: I understand why people want the ampersand operator. I understand why people do not want it. I think that there is a middle ground. The ampersand operator is most often used to approximate "properties" of objects when you are using XML to transmit objects. XML's underlying data model is linguistic, not property/value ("object oriented") so this causes problems. At first it does not seem that it would be difficult to implement the ampersand operator. Instead of using a state machine you would use a bitmap. Bitmaps are no larger than the runtime portion of state machines (a single integer either way). So what's the problem? The problem is when you mix the modes. (a&(b,c)&(d|(e&f))). Now you can't just use a bitmap OR a state machine. You must do both. And in fact you must switch back and forth between them. And you can't optimize your state machine using "the usual algorithms." etc. It is demonstrably the case that the problem only comes from mixing: attributes are "object-oriented" but they don't cause a problem because they can't be mixed. The solution -- the compromise -- is just to make this mixing back and forth illegal. An element can either have ampersands (in which case it is, rough speaking, "object oriented") or it can have a content model (in which case it is, roughly speaking, a container). As you validate the document you check whether it falls into the first category or the second and set up a bitmap or a state machine. This model can even be integrated (with a little difficulty) with the hedge automata model. Before applying a hedge automata to a document, you would use the schema to reorder &-defined elements into a consistent order. This model can, with very little difficulty, be made more powerful by allowing a specific combination of ampersand groups and content models. As long as the ampersand group is required to precede the content model then all we need to do is associate with each element-level a bitmap AND a state machine (two integers instead of one). it is easy to know when we have shifted from the bitmap-flipping part of the content to the state-machine part: when the bitmap is filled you switch to the state machine -- or trigger an error if you get to an "unknown" node without filling the bitmap. There are a variety of syntactic mechanisms that could be used to enforce this separation in the schema. I'll leave that up to the working group. Paul Prescod
Received on Friday, 20 August 1999 13:36:24 UTC