From: Lehmann, Steen <slehmann@silverstream.com>

Date: Thu, 13 Jun 2002 14:00:16 +0200

Message-ID: <FD8A740186525243807FBCDB7E739D0585F3A6@EXBE1.silverstream.be>

To: "'Ian Stokes-Rees'" <ijs@decisionsoft.com>

Cc: xmlschema-dev@w3.org

Date: Thu, 13 Jun 2002 14:00:16 +0200

Message-ID: <FD8A740186525243807FBCDB7E739D0585F3A6@EXBE1.silverstream.be>

To: "'Ian Stokes-Rees'" <ijs@decisionsoft.com>

Cc: xmlschema-dev@w3.org

>> Now, to turn this NFA into a corresponding DFA we "collapse" the empty >> transitions from each state and copy transitions from states we can reach by >> following empty transitions back to the original state (known as evaluating >> lambda closure). The resulting DFA has no empty transitions, so at each >> state we know immediately if a parsed symbol is legal (there is a transition >> somewhere for that symbol) without a need for backtracking. > Without working it out in detail, my guess is that you get exponential > growth in the number of required states if you have multiple nested > ambiguous content models. In practical terms, while it would be > theoretically possible, it is my guess that either the > implementation or > the operation (execution) of a parser which allowed this would be > difficult. Is that a fair assessment? The worst case scenario for a complex expression is indeed an exponential number of states, the text book example being the expression (a|b)* a (a|b)(a|b)(a|b) ... (a|b) where (a|b) is repeated N times - this requires 2^N states. Fortunately there are ways around this, lazily constructing the DFA being one of them. A good reference to parsing algorithms and optimization is: "Compilers: Principles, Techniques and Tools" by Aho, Sethi and Ullman (referred to as the "dragon" book). Whether or not this is difficult is a matter of taste, I suppose. If you have previous experience with parsing you may find it easy because it uses well understood algorithms, on the other hand I would not want to use this particular approach if I had to write a validator using a scripting language. -- SteenReceived on Thursday, 13 June 2002 08:00:49 GMT

*
This archive was generated by hypermail 2.2.0+W3C-0.50
: Tuesday, 11 January 2011 00:14:31 GMT
*