From: Lehmann, Steen <slehmann@silverstream.com>

Date: Thu, 13 Jun 2002 12:36:18 +0200

Message-ID: <FD8A740186525243807FBCDB7E739D0585F3A4@EXBE1.silverstream.be>

To: "'Jeni Tennison'" <jeni@jenitennison.com>

Cc: xmlschema-dev@w3.org

Date: Thu, 13 Jun 2002 12:36:18 +0200

Message-ID: <FD8A740186525243807FBCDB7E739D0585F3A4@EXBE1.silverstream.be>

To: "'Jeni Tennison'" <jeni@jenitennison.com>

Cc: xmlschema-dev@w3.org

> <xs:sequence minOccurs="2" maxOccurs="2"> > <xs:element name="a" minOccurs="1" maxOccurs="2" /> > <xs:element name="b" minOccurs="0" /> > </xs:sequence> > > which fulfils the unique attribution constraint since there is only > one particle for each of the two elements a and b, but is ambiguous > because if you have: > > <a /><a /><b /> > > then you don't know whether you've got to the end of the content model > (the first a comes from the first occurrence of the sequence, the > remainder from the second occurrence of that sequence) or if you're > still within the first occurrence of the sequence. > > For my education, could you explain (or point me to something that > explains) how parsers manage to accept the sequence a, a, b without > backtracking? A parser I've created translates each complexType into a non-deterministic graph representation (an "NFA"), which is then turned into a deterministic version (a "DFA") by a standard, well-known algorithm. I suspect this may be overkill because of the UPA, but was kind of fun to do (I don't get out much :). The NFA created from the innards of the <sequence> above would have five states with transitions between them, each transition would be labelled a, b or (empty), meaning that when the parser is in a state it will change to another state if there is a transition between the two states matching the current symbol it is trying to parse (or if there is a transition for (empty) between the two states). It may look like this, numbering the states and labelling empty transitions E. Computation starts in state 0 and is successful if it ends in state 4. 0 --a--> 1 1 --a--> 2 1 --b--> 3 1 --E--> 3 2 --b--> 3 2 --E--> 4 3 --b--> 3 3 --E--> 4 To also include the containing <sequence> simply repeat the whole graph twice, copying every transition out of state 0 to state 4, resulting in states 5 to 8 (copies of 1 to 4) Now, to turn this NFA into a corresponding DFA we "collapse" the empty transitions from each state and copy transitions from states we can reach by following empty transitions back to the original state (known as evaluating lambda closure). The resulting DFA has no empty transitions, so at each state we know immediately if a parsed symbol is legal (there is a transition somewhere for that symbol) without a need for backtracking. Here it is: 0 --a--> 1 1 --a--> 3 1 --b--> 2 2 --a--> 4 2 --b--> 2 3 --a--> 4 3 --b--> 2 4 --a--> 6 4 --b--> 5 5 --b--> 5 6 --b--> 5 The parsed input is correct if evaluation terminates in state 5 or 6. Hope this was educational. I'm sure there are more efficient ways of doing it, but the theorietical basis for this approach is well-understood and fairly easy to apply. Cheers, -- Steen /** * Steen Lehmann - <mailto:slehmann@silverstream.com> * Senior Software Engineer (R&D), SilverStream Software * <http://www.silverstream.com> */Received on Thursday, 13 June 2002 06:36:52 UTC

*
This archive was generated by hypermail 2.3.1
: Wednesday, 7 January 2015 14:55:57 UTC
*