- From: Henry S. Thompson <ht@cogsci.ed.ac.uk>
- Date: 13 Jun 2002 13:00:04 +0100
- To: Jeni Tennison <jeni@jenitennison.com>
- Cc: xmlschema-dev@w3.org, Ian Stokes-Rees <ijs@decisionsoft.com>
Jeni Tennison <jeni@jenitennison.com> writes:
> Hi Henry,
>
> > Summary: Ambiguity and unique attribution are different -- the *ML
> > family have never ruled out the former, always required the latter.
>
> Thanks for that clear summary. The example that's been buzzing around
> in the back of my head is:
>
> <xs:sequence minOccurs="2" maxOccurs="2">
> <xs:element name="a" minOccurs="1" maxOccurs="2" />
> <xs:element name="b" minOccurs="0" />
> </xs:sequence>
>
> which fulfils the unique attribution constraint since there is only
> one particle for each of the two elements a and b, but is ambiguous
> because if you have:
>
> <a /><a /><b />
>
> then you don't know whether you've got to the end of the content model
> (the first a comes from the first occurrence of the sequence, the
> remainder from the second occurrence of that sequence) or if you're
> still within the first occurrence of the sequence.
>
> For my education, could you explain (or point me to something that
> explains) how parsers manage to accept the sequence a, a, b without
> backtracking?
First note that <!ELEMENT foo ((a+,b?)*)> is valid XML/SGML.
Here's the finite state machine XSV produces to get all and only the
correct parses:
inputs
state a b
1 2
2 7 3
3 4
4* 6 5
5*
6* 5
7* 9 8
8* 4
9* 6 5
*s are final states
What you can see is that the loop has been unfolded. That's why
numeric exponents are a pain at compilation time!
ht
--
Henry S. Thompson, HCRC Language Technology Group, University of Edinburgh
W3C Fellow 1999--2002, part-time member of W3C Team
2 Buccleuch Place, Edinburgh EH8 9LW, SCOTLAND -- (44) 131 650-4440
Fax: (44) 131 650-4587, e-mail: ht@cogsci.ed.ac.uk
URL: http://www.ltg.ed.ac.uk/~ht/
[mail really from me _always_ has this .sig -- mail without it is forged spam]
Received on Thursday, 13 June 2002 08:00:07 UTC