Re: Ambiguous content models -- allowed or disallowed by XSDL? from Tom Moog on 2002-06-13 (xmlschema-dev@w3.org from June 2002)

From: Tom Moog <tmoog@sarvega.com>
Date: Thu, 13 Jun 2002 09:32:00 -0500
CC: xmlschema-dev@w3.org
Message-ID: <3D08ACE0.7349E400@sarvega.com>

Having a parser which recognizes a sequence is valid is not
enough to indicate how it should be processed.   One also
needs to know the parse tree in many circumstances.   There
may be semantic actions associated with particular elements
and sequences.

For example, 1+2*3 could be either (1+2)*3 or 1+(2*3) depending
on the way it is parsed.

By building a DFA to recognize the content, the answer of
whether it is valid or not is, in a sense, not known until the
very end when it reaches an accepte/reject state.  This
means that no semantic actions could be invoked until
the entire sequence is accepted.

Unique parse would make it an LL(1) gramar ?

> >   <xs:sequence minOccurs="2" maxOccurs="2">
> >     <xs:element name="a" minOccurs="1" maxOccurs="2" />
> >     <xs:element name="b" minOccurs="0" />
> >   </xs:sequence>
> >
> > which fulfils the unique attribution constraint since there is only
> > one particle for each of the two elements a and b, but is ambiguous
> > because if you have:
> >
> >   <a /><a /><b />
>
> First note that <!ELEMENT foo ((a+,b?)*)> is valid XML/SGML.
>
> Here's the finite state machine XSV produces to get all and only the
> correct parses:
>
>        inputs
> state  a    b
>
>   1    2
>   2    7    3
>   3    4
>   4*   6    5
>   5*
>   6*        5
>   7*   9    8
>   8*   4
>   9*   6    5
>
> *s are final states
>
> What you can see is that the loop has been unfolded.  That's why
> numeric exponents are a pain at compilation time!
>
> ht

Received on Thursday, 13 June 2002 10:33:21 UTC