& compromise

I had this idea on the plane back from Metastructures:

I understand why people want the ampersand operator. I understand why
people do not want it. I think that there is a middle ground.

The ampersand operator is most often used to approximate "properties" of
objects when you are using XML to transmit objects. XML's underlying
data model is linguistic, not property/value ("object oriented") so this
causes problems.

At first it does not seem that it would be difficult to implement the
ampersand operator. Instead of using a state machine you would use a
bitmap. Bitmaps are no larger than the runtime portion of state machines
(a single integer either way). So what's the problem?

The problem is when you mix the modes. (a&(b,c)&(d|(e&f))). Now you
can't just use a bitmap OR a state machine. You must do both. And in
fact you must switch back and forth between them. And you can't optimize
your state machine using "the usual algorithms." etc. It is demonstrably
the case that the problem only comes from mixing: attributes are
"object-oriented" but they don't cause a problem because they can't be
mixed.

The solution -- the compromise -- is just to make this mixing back and
forth illegal. An element can either have ampersands (in which case it
is, rough speaking, "object oriented") or it can have a content model
(in which case it is, roughly speaking, a container). As you validate
the document you check whether  it falls into the first category or the
second and set up a bitmap or a state machine.

This model can even be integrated (with a little difficulty) with the
hedge automata model. Before applying a hedge automata to a document,
you would use the schema to reorder &-defined elements into a consistent
order.

This model can, with very little difficulty, be made more powerful by
allowing a specific combination of ampersand groups and content models.
As long as the ampersand group is required to precede the content model
then all we need to do is associate with each element-level a bitmap AND
a state machine (two integers instead of one).

it is easy to know when we have shifted from the bitmap-flipping part of
the content to the state-machine part: when the bitmap is filled you
switch to the state machine -- or trigger an error if you get to an
"unknown" node without filling the bitmap.

There are a variety of syntactic mechanisms that could be used to
enforce this separation in the schema. I'll leave that up to the working
group.

 Paul Prescod

Received on Friday, 20 August 1999 13:36:24 UTC