FW: Ambiguity in section 2.8 of XML 1.0 Fifth Edition

Another comment on (parsing) ambiguity.

Did we ever say or imply that the productions in the spec
were non-ambiguous?  Is the right response to these issues
simply that we never said the productions were non-ambiguous,
and if a parser writer wants or needs to translate them into
equivalent non-ambiguous versions, that's fine, but there is
nothing wrong with the productions in the spec?

paul

-----Original Message-----
From: xml-editor-request@w3.org [mailto:xml-editor-request@w3.org] On
Behalf Of Daniel van Vugt
Sent: Friday, 2011 October 21 1:15
To: xml-editor@w3.org
Subject: Ambiguity in section 2.8 of XML 1.0 Fifth Edition

Unrelated to my previous email about an ambiguity, I have found another 
one in section 2.8 this time...

[28a] DeclSep ::= PEReference | S
[28b] intSubset	::= (markupdecl | DeclSep)*
[3] S ::=  (#x20 | #x9 | #xD | #xA)+

intSubset is ambiguous because it allows repetitions (*) of DeclSep. And

DeclSep can match the S rule which also allows repetitions (+).

Therefore the offset and length of each S used to make up DeclSep in 
intSubset is ambiguous. There are many different solutions if intSubset 
is given a string of multiple whitespace characters.

I suggest the following correction, which appears to eliminate the 
ambiguity:

[3a] S1 ::=  #x20 | #x9 | #xD | #xA
[3b] S ::=  S1+
[28a] DeclSep ::= PEReference | S1
[28b] intSubset	::= (markupdecl | DeclSep)*

I have confirmed this fix resolves the ambiguity using my own parser.

Regards,

Daniel van Vugt

Received on Friday, 21 October 2011 15:03:25 UTC