- From: Daniel van Vugt <vanvugt@gmail.com>
- Date: Thu, 03 Nov 2011 17:14:13 +0800
- To: xml-editor@w3.org
And here is another ambiguity in the standard grammar related to S: [3] S ::= (#x20 | #x9 | #xD | #xA)+ [22] prolog ::= XMLDecl? Misc* (doctypedecl Misc*)? [27] Misc ::= Comment | PI | S If the prolog contains multiple consecutive space characters (S), then it's ambiguous how that should match some number of S, or some number of Misc containing S. The fix for this ambiguity is similar to the previous one... [27] Misc ::= Comment | PI | S1 [3a] S1 ::= #x20 | #x9 | #xD | #xA [3b] S ::= S1+ Regards, Daniel van Vugt On 21/10/11 14:14, Daniel van Vugt wrote: > Unrelated to my previous email about an ambiguity, I have found another > one in section 2.8 this time... > > [28a] DeclSep ::= PEReference | S > [28b] intSubset ::= (markupdecl | DeclSep)* > [3] S ::= (#x20 | #x9 | #xD | #xA)+ > > intSubset is ambiguous because it allows repetitions (*) of DeclSep. And > DeclSep can match the S rule which also allows repetitions (+). > > Therefore the offset and length of each S used to make up DeclSep in > intSubset is ambiguous. There are many different solutions if intSubset > is given a string of multiple whitespace characters. > > I suggest the following correction, which appears to eliminate the > ambiguity: > > [3a] S1 ::= #x20 | #x9 | #xD | #xA > [3b] S ::= S1+ > [28a] DeclSep ::= PEReference | S1 > [28b] intSubset ::= (markupdecl | DeclSep)* > > I have confirmed this fix resolves the ambiguity using my own parser. > > Regards, > > Daniel van Vugt
Received on Thursday, 3 November 2011 09:17:39 UTC