- From: Henry S. Thompson <ht@cogsci.ed.ac.uk>
- Date: 13 Jun 2002 13:00:04 +0100
- To: Jeni Tennison <jeni@jenitennison.com>
- Cc: xmlschema-dev@w3.org, Ian Stokes-Rees <ijs@decisionsoft.com>
Jeni Tennison <jeni@jenitennison.com> writes: > Hi Henry, > > > Summary: Ambiguity and unique attribution are different -- the *ML > > family have never ruled out the former, always required the latter. > > Thanks for that clear summary. The example that's been buzzing around > in the back of my head is: > > <xs:sequence minOccurs="2" maxOccurs="2"> > <xs:element name="a" minOccurs="1" maxOccurs="2" /> > <xs:element name="b" minOccurs="0" /> > </xs:sequence> > > which fulfils the unique attribution constraint since there is only > one particle for each of the two elements a and b, but is ambiguous > because if you have: > > <a /><a /><b /> > > then you don't know whether you've got to the end of the content model > (the first a comes from the first occurrence of the sequence, the > remainder from the second occurrence of that sequence) or if you're > still within the first occurrence of the sequence. > > For my education, could you explain (or point me to something that > explains) how parsers manage to accept the sequence a, a, b without > backtracking? First note that <!ELEMENT foo ((a+,b?)*)> is valid XML/SGML. Here's the finite state machine XSV produces to get all and only the correct parses: inputs state a b 1 2 2 7 3 3 4 4* 6 5 5* 6* 5 7* 9 8 8* 4 9* 6 5 *s are final states What you can see is that the loop has been unfolded. That's why numeric exponents are a pain at compilation time! ht -- Henry S. Thompson, HCRC Language Technology Group, University of Edinburgh W3C Fellow 1999--2002, part-time member of W3C Team 2 Buccleuch Place, Edinburgh EH8 9LW, SCOTLAND -- (44) 131 650-4440 Fax: (44) 131 650-4587, e-mail: ht@cogsci.ed.ac.uk URL: http://www.ltg.ed.ac.uk/~ht/ [mail really from me _always_ has this .sig -- mail without it is forged spam]
Received on Thursday, 13 June 2002 08:00:07 UTC