- From: Grosso, Paul <pgrosso@ptc.com>
- Date: Wed, 21 Dec 2011 11:44:13 -0500
- To: "bacchi raffaele" <bacchi_raffaele@lycos.com>, <xml-editor@w3.org>
> -----Original Message----- > From: bacchi raffaele [mailto:bacchi_raffaele@lycos.com] > Sent: Monday, 2011 December 12 3:45 > To: xml-editor@w3.org > Subject: XML grammar error? > > Hi, > I think that rule [20] (and other similar) are wrong: > CData ::= (Char* - (Char* ']]>' Char*)) > The purpose of the rule is to match (reduce) any Char sequence not > containing ']]>'. > But this result is not achieved since the Char definition includes ']' > and '>' so the exception part of the rule: > -(Char* ']]>' Char*) > is ambiguous. Most parsers solve the ambiguity by applying the rule > "reduce as soon, as much as possible" > thus the rule will always mismatch because the first Char* reduces also > the sequence ']]>' and the next terminal ']]>' will never match. There is no ambiguity here. A - B matches if A matches, provided B does not also match what A matches. The regular expression (in conventional notation) /^.*]]>.*$/ matches any string that contains at least one ']]>'. It is ambiguous in the sense that if there are multiple tokens of ']]>' in the string, different matchers will match ']]>' in the pattern against the first or the last. But that makes no difference to the meaning of the pattern. Specifically, a leftmost-longest matcher will first match the first Char* against the whole string, then attempt to match ']' and fail. It will then reduce the Char* by one character and try again to match ']'. Iff there is a ']]>' in the string, it will eventually be matched as a result of the shortening of the first Char*; the second Char* will then match whatever is left. If there is more than one, the rightmost will be the one that matches. By way of contrast, a DFA matcher will match the leftmost occurrence of ']]>'. But as stated, exactly which ']]>' is matched is irrelevant. > I think the rule (and other similar) should be written: > Cdata ::= ( Char - ']]>' )* This will not work since it says to match a single character which is not a three-character sequence. No single character can be three characters, so it will match every character. Paul Grosso for the XML Core WG
Received on Wednesday, 21 December 2011 16:44:34 UTC