Re: Ambiguous content models -- allowed or disallowed by XSDL?

Jeni Tennison <jeni@jenitennison.com> writes:

> Hi Henry,
> 
> > Summary: Ambiguity and unique attribution are different -- the *ML
> > family have never ruled out the former, always required the latter.
> 
> Thanks for that clear summary. The example that's been buzzing around
> in the back of my head is:
> 
>   <xs:sequence minOccurs="2" maxOccurs="2">
>     <xs:element name="a" minOccurs="1" maxOccurs="2" />
>     <xs:element name="b" minOccurs="0" />
>   </xs:sequence>
> 
> which fulfils the unique attribution constraint since there is only
> one particle for each of the two elements a and b, but is ambiguous
> because if you have:
> 
>   <a /><a /><b />
> 
> then you don't know whether you've got to the end of the content model
> (the first a comes from the first occurrence of the sequence, the
> remainder from the second occurrence of that sequence) or if you're
> still within the first occurrence of the sequence.
> 
> For my education, could you explain (or point me to something that
> explains) how parsers manage to accept the sequence a, a, b without
> backtracking?

First note that <!ELEMENT foo ((a+,b?)*)> is valid XML/SGML.

Here's the finite state machine XSV produces to get all and only the
correct parses:

       inputs
state  a    b

  1    2
  2    7    3
  3    4
  4*   6    5
  5*
  6*        5
  7*   9    8
  8*   4
  9*   6    5

*s are final states

What you can see is that the loop has been unfolded.  That's why
numeric exponents are a pain at compilation time!

ht
-- 
  Henry S. Thompson, HCRC Language Technology Group, University of Edinburgh
          W3C Fellow 1999--2002, part-time member of W3C Team
     2 Buccleuch Place, Edinburgh EH8 9LW, SCOTLAND -- (44) 131 650-4440
	    Fax: (44) 131 650-4587, e-mail: ht@cogsci.ed.ac.uk
		     URL: http://www.ltg.ed.ac.uk/~ht/
 [mail really from me _always_ has this .sig -- mail without it is forged spam]

Received on Thursday, 13 June 2002 08:00:07 UTC