Re: Non-deterministic content model

At 00/08/24 11:44 -0500, Dan Connolly wrote:
>Yuichi Koike wrote:

> > Let me show the real example, which is a part of P3P schema file.
> > Is there any way to write a `right` schema, which is
> > compatible with the following example?
>
>In short, no, not exactly, but you can probably get
>close enough.

Well, for the example, actually yes.


> >  <element name='DISPUTES'>
> >   <complexType content='elementOnly'>
> >    <element ref='p3p:EXTENSION' minOccurs='0' maxOccurs='unbounded'/>
> >    <element ref='p3p:LONG-DESCRIPTION' minOccurs='0' maxOccurs='1'/>
> >    <element ref='p3p:IMG' minOccurs='0' maxOccurs='1'/>
> >    <element ref='p3p:REMEDIES' minOccurs='0' maxOccurs='1'/>
> >    <element ref='p3p:EXTENSION' minOccurs='0' maxOccurs='unbounded'/>
> >   </complexType>
> >  </element>

The trick here is to split this up into various variants that are
all deterministic. The current thing is:

(extension*, long?, img?, remedies?, extension*)

In a first step, change this to

( (extension+, long?, img?, remedies?, extension*)
  |(long?, img?, remedies?, extension*)
)

In the next step, we change it to:

( (extension+,( (long, img?, remedies?, extension*)
                |(img?, remedies?, extension*)
               )
   )
  |(long, img?, remedies?, extension*)
  |(img?, remedies?, extension*)
)

Then we go to:

( (extension+,( (long, img?, remedies?, extension*)
                |(img, remedies?, extension*)
                |(remedies?, extension*)
               )
   )
  |(long, img?, remedies?, extension*)
  |(img?, remedies?, extension*)
)

And finally, we end up with:

( (extension+,( (long, img?, remedies?, extension*)
                |(img, remedies?, extension*)
                |(remedies, extension*)
               )
   )
  |(long, img?, remedies?, extension*)
  |(img, remedies?, extension*)
  |(remedies, extension*)
)

which is clearly deterministic. It is possible to do similar
things for many if not most practical cases, but there are
cases where it doesn't work. Here is an example:

((a | b)*, a, (a | b))

[from some paper of A. Bruggemann-Klein, via Makoto Murata]

What it is saying is: a sequence of 'a' and 'b', at least two of
them, and the second-to-last has to be an 'a'. For a finite-size
deterministic automaton, there is no way to know it's at the
second-to-last.


Regards,  Martin.

Received on Friday, 25 August 2000 01:34:45 UTC