- From: John Cowan <cowan@mercury.ccil.org>
- Date: Mon, 12 Dec 2011 11:16:59 -0500
- To: "Grosso, Paul" <pgrosso@ptc.com>
- Cc: public-xml-core-wg@w3.org
bacchi raffaele scripsit: > I think that rule [20] (and other similar) are wrong: > > CData ::= (Char* - (Char* ']]>' Char*)) > > The purpose of the rule is to match (reduce) any Char sequence not > containing ']]>'. But this result is not achieved since the Char > definition includes ']' and '>' so the exception part of the rule: > > -(Char* ']]>' Char*) > > is ambiguous. Most parsers solve the ambiguity by applying the rule > "reduce as soon, as much as possible" thus the rule will always > mismatch because the first Char* reduces also the sequence ']]>' > and the next terminal ']]>' will never match. There is no ambiguity here. A - B matches if A matches, provided B does not also match what A matches. The regular expression (in conventional notation) /^*]]>*$/ matches any string that contains at least one ']]>'. It is ambiguous in the sense that if there are multiple tokens of ']]>' in the string, different matchers will match ']]>' in the pattern against the first or the last. But that makes no difference to the meaning of the pattern. Specifically, a leftmost-longest matcher will first match the first Char* against the whole string, then attempt to match ']' and fail. It will then reduce the Char* by one character and try again to match ']'. Iff there is a ']]>' in the string, it will eventually be matched as a result of the shortening of the first Char*; the second Char* will then match whatever is left. If there is more than one, the rightmost will be the one that matches. Per contra, a DFA matcher will match the leftmost occurrence of ']]>'. But as stated, exactly which ']]>' is matched is simply irrelevant. > I think the rule (and other similar) should be written: > > Cdata ::= ( Char - ']]>' )* This, of course, is nonsense, since it says to match a single character which is not a three-character sequence. No single character can be three characters, so it will match every character. -- Man has no body distinct from his soul, John Cowan for that called body is a portion of the soul cowan@ccil.org discerned by the five senses, http://www.ccil.org/~cowan the chief inlets of the soul in this age. --William Blake
Received on Monday, 12 December 2011 16:17:28 UTC