Re: Ambiguity in section 2.8 of XML 1.0 Fifth Edition from Daniel van Vugt on 2011-11-03 (xml-editor@w3.org from October to December 2011)

From: Daniel van Vugt <vanvugt@gmail.com>
Date: Thu, 03 Nov 2011 17:14:13 +0800
To: xml-editor@w3.org
Message-ID: <4EB25B65.1040404@gmail.com>

And here is another ambiguity in the standard grammar related to S:

[3] S ::= (#x20 | #x9 | #xD | #xA)+
[22] prolog ::= XMLDecl? Misc* (doctypedecl Misc*)?
[27] Misc ::= Comment | PI | S

If the prolog contains multiple consecutive space characters (S), then 
it's ambiguous how that should match some number of S, or some number of 
Misc containing S.

The fix for this ambiguity is similar to the previous one...

[27] Misc ::= Comment | PI | S1
[3a] S1 ::= #x20 | #x9 | #xD | #xA
[3b] S ::= S1+

Regards,

Daniel van Vugt


On 21/10/11 14:14, Daniel van Vugt wrote:
> Unrelated to my previous email about an ambiguity, I have found another
> one in section 2.8 this time...
>
> [28a] DeclSep ::= PEReference | S
> [28b] intSubset ::= (markupdecl | DeclSep)*
> [3] S ::= (#x20 | #x9 | #xD | #xA)+
>
> intSubset is ambiguous because it allows repetitions (*) of DeclSep. And
> DeclSep can match the S rule which also allows repetitions (+).
>
> Therefore the offset and length of each S used to make up DeclSep in
> intSubset is ambiguous. There are many different solutions if intSubset
> is given a string of multiple whitespace characters.
>
> I suggest the following correction, which appears to eliminate the
> ambiguity:
>
> [3a] S1 ::= #x20 | #x9 | #xD | #xA
> [3b] S ::= S1+
> [28a] DeclSep ::= PEReference | S1
> [28b] intSubset ::= (markupdecl | DeclSep)*
>
> I have confirmed this fix resolves the ambiguity using my own parser.
>
> Regards,
>
> Daniel van Vugt

Received on Thursday, 3 November 2011 09:17:39 UTC